»Loading data into CrateDB« weekly edition

amotl · July 7, 2025, 10:00pm

Introduction

We are currently unlocking data loading into CrateDB using the excellent ingestr toolkit, based on dlt ^[1]. This topic informs about the progress and gives everyone the chance to participate early in the development.

Prerequisites

For executing the commands in this walkthrough, you need a working installation of Docker or Podman and a Python installation on your machine. For installing Python packages, we recommend to use the uv package manager ^[2].

Call for support

Because relevant data adapters are still in their infancy, we will very much appreciate receiving feedback in form of bug reports, suggestions for improvements, or success notes.

Other options

CrateDB also provides integrations for many other ETL applications and frameworks.

The CrateDB destination adapter for ingestr uses dlt via dlt-cratedb. ↩︎
The uv package manager can easily be installed using pip or pipx, e.g. pipx install uv. It also offers other installation methods. ↩︎

amotl · July 9, 2025, 8:29pm

Loading data from Amazon Kinesis

Synopsis

ctk load \
    "kinesis:?aws_access_key_id=test&aws_secret_access_key=test&region_name=us-east-1&table=demo" \
    "crate://crate:crate@localhost:4200/testdrive/kinesis"

Documentation

amotl · July 8, 2025, 8:56pm

Loading data from Apache Kafka

Synopsis

ctk load \
    "kafka:?bootstrap_servers=localhost:9092&group_id=test&table=demo" \
    "crate://crate:crate@localhost:4200/testdrive/kafka"

Documentation

amotl · February 8, 2026, 2:00am

Loading data from Databricks SQL warehouses

Synopsis

ctk load \
    "databricks://token:<access_token>@<instance>.cloud.databricks.com:443/?http_path=/sql/1.0/warehouses/<warehouse>&catalog=samples&table=accuweather.forecast_hourly_metric" \
    "crate://crate:crate@localhost:4200/testdrive/accuweather_forecast_hourly_metric"

Documentation

amotl · February 8, 2026, 9:13pm

Loading data from SAP HANA

Synopsis

ctk load \
    "hana://SYSTEM:HXEHana1@localhost:39017/SYSTEMDB?table=sys.adapters" \
    "crate://crate:crate@localhost:4200/testdrive/hana_sys_adapters"

Documentation

amotl · March 4, 2026, 12:31pm

Apache Iceberg and Delta Lake (load and save)

Hi again. We recently added I/O adapters for Apache Iceberg tables ^[1] and Delta Lake tables ^[2] following our aims to enhance interoperability with open table formats.

Both are open table formats that build upon Apache Parquet data files, a free and open-source column-oriented data storage format, effectively succeeding and superseding Apache Hive use cases from the Hadoop era.

CrateDB Toolkit now provides adapters to import and export data into/from those open table formats. Please let us know if you can discover any flaws and don’t hesitate to share any ideas for improvement. Thank you in advance.

Synopsis

uv tool install --upgrade 'cratedb-toolkit[iceberg,deltalake]'

ctk load \
    "s3+iceberg://bucket1/demo/taxi-tiny/metadata/00003-dd9223cb-6d11-474b-8d09-3182d45862f4.metadata.json?s3.access-key-id=<your_access_key_id>&s3.secret-access-key=<your_secret_access_key>&s3.endpoint=<endpoint_url>&s3.region=<s3-region>" \
    "crate://crate:crate@localhost:4200/demo/taxi-tiny"

ctk load \
    "s3+deltalake://bucket1/demo/taxi-tiny?AWS_ACCESS_KEY_ID=<your_access_key_id>&AWS_SECRET_ACCESS_KEY=<your_secret_access_key>&AWS_ENDPOINT=<endpoint_url>&AWS_REGION=<s3-region>" \
    "crate://crate:crate@localhost:4200/demo/taxi-tiny"

Documentation

Iceberg is a specification and high-performance format for huge analytic tables, making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Apache Iceberg is its reference implementation. ↩︎
Delta Lake (paper) is the optimized storage layer that provides the foundation and default format for all table operations on Databricks. It was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale. ↩︎

amotl · March 5, 2026, 2:10am

Loading data from Elasticsearch

Synopsis

ctk load \
    "elasticsearch://localhost:9200?secure=false&table=taxi_details" \
    "crate://crate:na@localhost:4200/testdrive/taxi_details"

Topic		Replies	Views
Overview of CrateDB integration tutorials Integrations integration , getting-started	2	4138	September 6, 2023
Migration InfluxDB to CrateDB? (outflux-like) CrateDB	6	1033	August 4, 2021
About the Integrations category Integrations	0	874	November 26, 2018
Quickly starting CrateDB with 2.5M records of the NYC Yellowcab dataset Tutorials	0	71	June 29, 2022
Using CrateDB Toolkit to sync MongoDB with CrateDB CrateDB integration	25	172	March 25, 2026

»Loading data into CrateDB« weekly edition

Introduction

Prerequisites

Call for support

Other options

Loading data from Amazon Kinesis

Synopsis

Documentation

Loading data from Apache Kafka

Synopsis

Documentation

Loading data from Databricks SQL warehouses

Synopsis

Documentation

Loading data from SAP HANA

Synopsis

Documentation

Apache Iceberg and Delta Lake (load and save)

Synopsis

Documentation

Loading data from Elasticsearch

Synopsis

Documentation

Related topics