I’m testing an out of the box crate installation locally and when running a query I get this message
CircuitBreakingException[[query] Data too large, data for [collect: 0] would be [1288497342/1.2gb], which is larger than the limit of [1288490188/1.1gb]]
Is this a configuration issue ? i.e. I’ve not set a parameter correctly.
The table is a test table with 1.1M rows taking about 1.1GB of space.
query circuit breaker terminates queries that use more than 60% of the heap size.
Seeing that the error happens during the
collect phase, it looks like the query is reading more than 1,1 GB of data (after decompression) from the table.
Besides increasing the heap size, we documented possible approaches in this article:
As a distributed database system with support for the standard SQL query language, CrateDB is great to run aggregations server-side, working on huge datasets, and getting summarized results back; there are however cases where we may still want to retrieve lots of data from CrateDB, to train a machine learning model for instance.
CrateDB collects results in memory before sending them back to the clients, so trying to run a SELECT statement that returns a very large result set in one go can tri…
With CrateDB 5.2 (to be released shortly), SQL commands for using cursors will be further extended, allowing to also scroll backward.
I had the exact same problem yesterday while loading 2.3 millions rows file.
I’m running a single instance of CrateDB in a docker container, solved changing the environment variable
CRATE_HEAP_SIZE increasing the value to 4 Giga.
Great, thanks for the answer. I must admit I missed the 5.1.0 release announcement regarding CURSORS, which is great news.
I’ve upped my heap as I want all the data back in my test.