Loading data from Apache Kafka

Introduction

We are currently unlocking data loading from Apache Kafka into CrateDB using the excellent ingestr toolkit. This topic informs about the progress on different workbenches [1], and gives everyone the chance to participate early in the development.

For executing the commands in this walkthrough, you need a working installation of Docker or Podman and Python on your machine. For installing Python packages, we recommend to use the uv package manager. [2]

Install

Install crash and a preview version of ingestr.

uv tool install --upgrade crash 'ingestr @ git+https://github.com/crate-workbench/ingestr.git@kafka-decoder'

Tutorial

Services

Run Apache Kafka and CrateDB using Docker or Podman.

docker run --rm --name=kafka \
  --publish=9092:9092 docker.io/apache/kafka:4.0.0
docker run --rm --name=cratedb \
  --publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
  docker.io/crate:5.10 -Cdiscovery.type=single-node

Load data

Publish two events to a Kafka topic using kcat. [3]

echo '{"sensor_id":1,"ts":"2025-06-01 10:00","reading":42.42}' | \
  kcat -P -b localhost -t demo

echo '{"sensor_id":2,"ts":"2025-06-01 11:00","reading":451.00}' | \
  kcat -P -b localhost -t demo

Verify events are present by subscribing to the Kafka topic.

kcat -C -b localhost -t demo

Transfer data

Use ingestr to load data from Kafka topic into CrateDB table.

ingestr ingest --yes \
  --source-uri "kafka://?bootstrap_servers=localhost:9092&group_id=test&value_type=json&select=value" \
  --source-table "demo" \
  --dest-uri "cratedb://crate:crate@localhost:5432/?sslmode=disable" \
  --dest-table "doc.kafka_demo"

Query data

Submit queries to CrateDB using crash.

crash -c "SELECT count(*) FROM kafka_demo;"
crash -c "SELECT * FROM kafka_demo WHERE sensor_id>1;"

Appendix

Caveats

  • The ecosystem and the CrateDB adapters are still in their infancy, so they need more exposure and feedback from people willing to take them for a test drive and report back about their experience and wishes. Thank you!

Other options

  • For loading data from Apache Kafka, use Apache Flink, Debezium, Kafka Connect, Risingwave, or choose any other suitable solution which fits your technology stack. Backlog: Link to the corresponding tutorials.

  1. The CrateDB destination adapter for ingestr uses dlt per dlt-cratedb package. ↩︎

  2. The uv package manager can easily be installed using pip or pipx, e.g. pipx install uv. It also offers other installation methods. ↩︎

  3. You can install the Apache Kafka producer and consumer tool command kcat using {apt,brew} install kcat. ↩︎