About
OpenLineage is an open source industry standard framework for data lineage. It standardizes the definition of data lineage, the metadata that makes up lineage data, and the approach for collecting lineage data from external systems. Marquez is OpenLineage’s lineage repository reference implementation.
OpenLineage integrates well with Apache Airflow to collect DAG lineage metadata so that inter-DAG dependencies are easily maintained and viewable via a lineage graph, while also keeping a catalog of historical runs of DAGs.
Tutorial
This tutorial demonstrates how to use Marquez to run Airflow DAGs against a CrateDB database and view lineage data.
For general information about data lineage, see Data Lineage - CrateDB: Guide.