Introduction
We are currently unlocking data loading from Apache Kafka into CrateDB using the excellent ingestr toolkit. This topic informs about the progress on different workbenches [1], and gives everyone the chance to participate early in the development.
For executing the commands in this walkthrough, you need a working installation of Docker or Podman and Python on your machine. For installing Python packages, we recommend to use the uv package manager. [2]
Install
Install crash and a preview version of ingestr.
uv tool install --upgrade crash 'ingestr @ git+https://github.com/crate-workbench/ingestr.git@kafka-decoder'
Tutorial
Services
Run Apache Kafka and CrateDB using Docker or Podman.
docker run --rm --name=kafka \
--publish=9092:9092 docker.io/apache/kafka:4.0.0
docker run --rm --name=cratedb \
--publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
docker.io/crate:5.10 -Cdiscovery.type=single-node
Load data
Publish two events to a Kafka topic using kcat. [3]
echo '{"sensor_id":1,"ts":"2025-06-01 10:00","reading":42.42}' | \
kcat -P -b localhost -t demo
echo '{"sensor_id":2,"ts":"2025-06-01 11:00","reading":451.00}' | \
kcat -P -b localhost -t demo
Verify events are present by subscribing to the Kafka topic.
kcat -C -b localhost -t demo
Transfer data
Use ingestr to load data from Kafka topic into CrateDB table.
ingestr ingest --yes \
--source-uri "kafka://?bootstrap_servers=localhost:9092&group_id=test&value_type=json&select=value" \
--source-table "demo" \
--dest-uri "cratedb://crate:crate@localhost:5432/?sslmode=disable" \
--dest-table "doc.kafka_demo"
Query data
Submit queries to CrateDB using crash.
crash -c "SELECT count(*) FROM kafka_demo;"
crash -c "SELECT * FROM kafka_demo WHERE sensor_id>1;"
Appendix
Caveats
- The ecosystem and the CrateDB adapters are still in their infancy, so they need more exposure and feedback from people willing to take them for a test drive and report back about their experience and wishes. Thank you!
Other options
- For loading data from Apache Kafka, use Apache Flink, Debezium, Kafka Connect, Risingwave, or choose any other suitable solution which fits your technology stack. Backlog: Link to the corresponding tutorials.
The CrateDB destination adapter for ingestr uses dlt per dlt-cratedb package. ↩︎
The
uvpackage manager can easily be installed usingpiporpipx, e.g.pipx install uv. It also offers other installation methods. ↩︎You can install the Apache Kafka producer and consumer tool command
kcatusing{apt,brew} install kcat. ↩︎