How we scaled ingestion to one million rows per second

hammerhead · March 29, 2022, 9:50am

Scaling ingestion throughput is a common requirement when using CrateDB. To gain first-hand practical experience on what challenges one can meet on the way, we set ourselves a goal: Find a realistic use case, start with a single node, and scale it until reaching a throughput of one million rows per second.

To find out what we learned, visit our latest blog post:

hammerhead · August 2, 2023, 12:07pm

We have updated the blog post with more recent numbers, reflecting the latest developments in all components used as part of the benchmarks.

What has changed in the setup?

CrateDB version: Updated from 4.7 to 5.4

There have been substantial improvements in ingest performance in CrateDB 5.3.
EC2 instance type: Switched from m6g.4xlarge (ARM) to m6in.4xlarge (x64)

Mainly due to architecture restrictions in 3rd party monitoring tools, but m6in instances also provide higher network throughput beneficial for node-to-node communication.

The number of CPUs and amount of RAM remained identical.
Operating system: Updated from Amazon Linux 2 to Amazon Linux 2023

What changed in the results?

1 million records per second throughput is now achieved with five nodes instead of ten
On scaling, the observed overhead factor was reduced from 25% to 20%
Measurements including the usage of replicas have been added

Topic		Replies	Views
Scaling CrateDB cluster for increased read and write throughput - Emmanuel Katto CrateDB integration	1	28	October 8, 2024
CrateDB Cloud Throttling CrateDB Cloud	2	378	November 8, 2022
Client interface to use for faster data ingestion CrateDB	1	661	May 13, 2020
Inserting billions of rows the hard way CrateDB	15	2243	April 6, 2021
New CrateDB stable release: CrateDB 5.3 Community	1	464	April 19, 2023

How we scaled ingestion to one million rows per second

Related topics