SQL Without LIMIT unoptimized ? doc Schema performance different compared to other schemas?

MikeMax · March 28, 2021, 7:56pm

Hi everyone

I have two strange behaviours. (i’m testing in nightly Build 4.5, and i will test on previous version)

On the same CrateDB Server, i have two schemas :

Doc schema
Bonjour schema

Both have the same tables and datas (table parameters are the same).

But performances are strangely different depending on schema and also if LIMIT clause is present or not :

in Red : On schema “Bonjour”
In Blue : On schema “Doc”

Any idea ?

amotl · March 29, 2021, 10:28am

Dear Mike,

thanks for your report. Is the behavior reproducible in a way that reissuing both statements without LIMIT clauses once more will also yield a long query duration?

The reason for the initial (longer) duration might be some initialization overhead. May I ask what hardware and environment you are running CrateDB on and how your configuration looks like? Having enough memory available for CrateDB is crucial for reasonable operation.

That the first query took 30 seconds for answering might indicate something into the direction that the environment CrateDB is running on needs some time to get ready. It might also indicate that the number of configured shards is too high for the fair amount of (~100000?) records as this would definitively induce a significant overhead.

With kind regards,
Andreas.

MikeMax · March 30, 2021, 4:37am

Hi Andreas, it seems after re-importing my 100 000 Rows into my two different schemas, performances are now the same for both.

However, even after re-importing, the SELECT * from users (containing 100 001 rows) and SELECT * from users LIMIT 1000000 (which returns the same result because i have only 100 001 rows) still does a different thing (done multiple times) :

In CrateDB WebAdmin UI Console, i can’t reproduce due to LIMIT security (but with a LIMIT, i have the same duration results as in my CrateDBAdmin .Net client).

MikeMax · March 31, 2021, 11:35am

I have also tried with a Where clause (just to be sure) :

And i have tried with one result only … The limit clause at 1 performs about 10x faster than without the limit clause …

So, in conclusion … we would count() results before systematically, and re-request with the Count() result in the LIMIT clause …

Something is boosting requests with LIMIT clause, even if the limit is over the number of resulted rows.

This behaviour is totally hidden in Web Admin UI Console due to LIMIT security clause (but in custom applications … it may not be the case …)

I’m’ using CrateDB 4.5 in a Windows Test Machine (Core I7, 16GB RAM) with nothing running on it …

But a good info : on the same machine, i have switched for the CrateDB 4.4.2 and 4.1.4 and i can’t reproduce the problem (with same tables and data and config… only one node)…

You can reproduce that behaviour on version 4.5 on your side ?

amotl · March 31, 2021, 12:37pm

Dear Mike,

thank you for investigating further.

Sure, that is obvious, right?

This is the thing I would like to follow up on and dig deeper.

So, according to your observations, you believe it happened somewhere in between 4.4.2 and 4.5.0? There is one other release before 4.5.0 happened, namely 4.4.3. Maybe you can also check that one in order to narrow down your observations further?

With kind regards,
Andreas.

MikeMax · March 31, 2021, 12:58pm

You have a quick link for the 4.4.3 Windows x64 package please ?

proddata · March 31, 2021, 1:02pm

Index of /downloads/releases/cratedb/x64_windows/

MikeMax · March 31, 2021, 1:02pm

yes i found it sorry

MikeMax · March 31, 2021, 1:15pm

I can reproduce it with the 4.4.3 version (but not for the LIMIT 1 with a where clause as the 4.5 version) :

MikeMax · March 31, 2021, 1:22pm

No because i had a where clause which returned only one result with or without the limit clause but this usecase is not reproductible in 4.4.3

amotl · March 31, 2021, 1:39pm

Hi Mike,

thank you again.

All right, so the observed performance regression on the with or without LIMIT clause thing might have slipped in between 4.4.2 and 4.4.3, right?

All right, so the observed performance regression on the with a WHERE clause only returning one record, even adding a LIMIT 1 will gain more performance thing only happened between 4.4.3 and 4.5.0, right?

With kind regards,
Andreas.

MikeMax · March 31, 2021, 1:49pm

That’s right. You’ve well summarized

MikeMax · March 31, 2021, 3:17pm

Ok, other test :

I have installed the 4.5 version on a Linux Debian Stretch on a dedicated server. I have the same behaviour on LIMIT Clauses but not on the WHERE + LIMIT 1 usecase (i think we can forget it as it does not have the same behaviour on linux (which is OK) and windows (not OK)).

Another info, my table “users” have a column “id” which is a PRIMARY KEY. Table rows have been initially imported (by JSON files) on a disordered manner (thanks to multi-threading ! :p)

So if i do a SELECT * from users ORDER BY id , performances are now equal to the same request with a LIMIT 1000000 (but without the ORDER BY) …

I can’t see the link between the ORDER BY [primary key] and the LIMIT clause (which should takes place after the query results) but optimization has certainly multiple paths (and passes :p) :

Finally… i don’t know what to think about all of that

Topic		Replies	Views
Limited performance during query CrateDB	15	1177	May 26, 2021
CrateDB/postgres_fdw/LIMIT parameter CrateDB	14	711	April 17, 2023
How to remove default "LIMIT 100" in SQL Console? CrateDB	5	660	March 25, 2021
Why Crate Sql is slower than elasticsearch http API SQL	1	978	July 30, 2019
Crate Column Limit & Performance Impact CrateDB	3	831	January 29, 2019

SQL Without LIMIT unoptimized ? doc Schema performance different compared to other schemas?

Related topics