We are currently unlocking data loading into CrateDB using the excellent ingestr toolkit, based on dlt[1]. This topic informs about the progress and gives everyone the chance to participate early in the development.
Prerequisites
For executing the commands in this walkthrough, you need a working installation of Docker or Podman and a Python installation on your machine. For installing Python packages, we recommend to use the uv package manager [2].
Call for support
Because relevant data adapters are still in their infancy, we will very much appreciate receiving feedback in form of bug reports, suggestions for improvements, or success notes.
Both are open table formats that build upon Apache Parquet data files, a free and open-source column-oriented data storage format, effectively succeeding and superseding Apache Hive use cases from the Hadoop era.
CrateDB Toolkit now provides adapters to import and export data into/from those open table formats. Please let us know if you can discover any flaws and don’t hesitate to share any ideas for improvement. Thank you in advance.
Iceberg is a specification and high-performance format for huge analytic tables, making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Apache Iceberg is its reference implementation. ↩︎