HTTP endpoint insert size

djbestenergy · May 13, 2025, 2:32pm

Hi
Does anyone know when using the http endpoint, if its more efficient for a 3 node crate cluster to accept more bulk data infrequently or smaller bulk data with more frequent insertions.
So for instance I might insert 242 rows totalling 326KB, compared to 30 rows at 3.5KB.

I just want to see if I can fine tune our ingestion, to work better with the CrateDB cluster.

Many thanks
David

karynsaz · June 12, 2025, 1:59pm

Hi David,

In general, we would recommend testing with different batch sizes to check the performance for your specific workload. Sometimes you can even make it more flexible by using the monitoring data of your cluster to reduce/increase the batch size according to the resources available.

For your specific case, we don’t see a meaningful difference between 30 rows vs 242 rows since both are still pretty much in the same order of magnitude. In general, we would advice avoiding many small batches to prevent the repeated query parsing, planning, etc.

Depending on your use case, it is understandable to have such small batches to simulate a continues data stream by batching data that arrives in a second interval, for instance. Or even just to keep the insert computational cost low (which would be the case with both 30 or 242 rows).

Have you observed a big performance difference between the batch sizes? Feel free to share further details so we can work on a more tailored answer.

Lastly, there is a nice tip about choosing the batch size here: DataFrame operations - SQLAlchemy Dialect

Thank you,
Karyn

amotl · June 12, 2025, 2:06pm

Hi David,

thank you for writing in. I’d like to second what Karyn said.

When reading those particular numbers, I think you should experiment with much larger batch sizes, of course also depending on the “width” of the records in your dataset.

Citing a particular spot from the document referenced by Karyn conveys the very gist in this regard:

I think using either 30 or otherwise 242 records per batch is a too small number to make any significant difference.

With kind regards,
Andreas.

Topic		Replies	Views
Guide to efficient data ingestion to CrateDB with pandas Tutorials integration	0	1395	July 10, 2023
Client interface to use for faster data ingestion CrateDB	1	660	May 13, 2020
Inserting billions of rows the hard way CrateDB	15	2238	April 6, 2021
Partition requires significantly a lot more space than the others CrateDB	10	1058	October 26, 2021
Updating data performance CrateDB	0	233	October 31, 2023

HTTP endpoint insert size

Related topics