Scaling ingestion throughput is a common requirement when using CrateDB. To gain first-hand practical experience on what challenges one can meet on the way, we set ourselves a goal: Find a realistic use case, start with a single node, and scale it until reaching a throughput of one million rows per second.
To find out what we learned, visit our latest blog post:
We have updated the blog post with more recent numbers, reflecting the latest developments in all components used as part of the benchmarks.
What has changed in the setup?
CrateDB version: Updated from 4.7 to 5.4
There have been substantial improvements in ingest performance in CrateDB 5.3.
EC2 instance type: Switched from m6g.4xlarge (ARM) to m6in.4xlarge (x64)
Mainly due to architecture restrictions in 3rd party monitoring tools, but m6in instances also provide higher network throughput beneficial for node-to-node communication.
The number of CPUs and amount of RAM remained identical.
Operating system: Updated from Amazon Linux 2 to Amazon Linux 2023
What changed in the results?
- 1 million records per second throughput is now achieved with five nodes instead of ten
- On scaling, the observed overhead factor was reduced from 25% to 20%
- Measurements including the usage of replicas have been added