OpenLineage with Airflow, Marquez, and CrateDB

hernanc · November 22, 2022, 11:00pm

About

OpenLineage is an open source industry standard framework for data lineage. It standardizes the definition of data lineage, the metadata that makes up lineage data, and the approach for collecting lineage data from external systems. Marquez is OpenLineage’s lineage repository reference implementation.

OpenLineage integrates well with Apache Airflow to collect DAG lineage metadata so that inter-DAG dependencies are easily maintained and viewable via a lineage graph, while also keeping a catalog of historical runs of DAGs.

Tutorial

This tutorial demonstrates how to use Marquez to run Airflow DAGs against a CrateDB database and view lineage data.

For general information about data lineage, see Data Lineage - CrateDB: Guide.

Topic		Replies	Views
Automating export of CrateDB data to S3 using Apache Airflow Tutorials integration , orchestration , data-transfer	0	5031	November 19, 2021
How to connect your CrateDB data to LLM with LlamaIndex and Azure OpenAI Tutorials integration	0	863	September 29, 2023
Overview of CrateDB integration tutorials Integrations integration , getting-started	2	3980	September 6, 2023
Automating stock data collection and storage with CrateDB and Apache Airflow Tutorials integration , orchestration	0	2555	January 19, 2022
Building Seamless Data Pipelines Made Easy: Combining Prefect and CrateDB Tutorials integration , orchestration	0	571	August 3, 2023

OpenLineage with Airflow, Marquez, and CrateDB

About

Tutorial

Related topics