I have a table with billions of records, that looks something like:
CREATE TABLE IF NOT EXISTS "public"."locations" (
"last_response" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
"id" BIGINT DEFAULT CAST(current_timestamp AS bigint) NOT NULL,
--details elided
"part_hour" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('hour', "last_response"),
"part_day" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('day', "last_response"),
"part_week" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('week', "last_response") NOT NULL,
"part_month" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('month', "last_response"),
PRIMARY KEY ("id", "part_week")
)
CLUSTERED INTO 6 SHARDS
PARTITIONED BY ("part_week")
This query
SELECT MAX(last_response) from public.locations WHERE last_response >= '2021-01-23 00:00:00.000000' AND last_response < '2022-01-01 00:00:00.000000';
and takes a long time (more than a minute) to complete.
As the table is partitioned on part_week, which is in turn based on last_response, I would have expected the logic to be “identify shard with highest part_week, return max(last_response) from that”, and it should return nearly instantly.
AND last_response>= (
SELECT MAX(values['part_week'])
FROM information_schema.table_partitions
WHERE values['part_week'] <= date_trunc('week','2022-01-01 00:00:00.000000')
AND table_name='locations'
AND table_schema='public'
)
Appreciate that the ANALYZE output is super verbose. But I suspect the bottom line (with my inexperienced eye) is that every partition is indeed being considered.
I wonder if you could advise if my underlying assumption (that the partition key being one that is calculated from a given column, means that queries on that original column should in theory eliminate partitions) seems correct on the face of it?
Or would you always expect additional clauses to be needed if its a calculated column?
(I’m trying to more or less lift and shift existing postgres app, so changing queries at all is undesirable)
Hi,
The 3 execution plans you shared show indeed multiple PartitionIdent being considered.
However on my repro environment on CrateDB 5.9.4 , the query goes straight to the right partition, even without introducing the part_week = clause .
would you always expect additional clauses to be needed if its a calculated column?
My config is pretty much default. So it was the default stats values, including enabled. I set it to collect stats every 10 mins just in case, but no change. Is there a more definitive way to be very sure the stats is indeed collected?
I have never deleted any rows from this table.
Experimenting with reducing the range in the query - so that it’s looking at a narrow range of last_response - the query time come down seemingly linearly, and the ANALYZE shows that it is correctly only considering partitions in that range.
But each of the MAX queries on each of those partitions is taking a very long time - about a second. The partitions are largish - 50 gig - with 212m records say. Is that duration expected? Or should the indexing of this TIMESTAMP WITHOUT TIME ZONE field be able to return the max immediately? If so, then that is perhaps the root problem here?
Appreciate if anyone is able to comment on my last query. It feels like something is broken regarding indexing - I would presume that queries like MAX(X) where X is a timestamp should be immediately satisfied for a shard in almost no time. Is that in fact not the case?
Are there any tips as to how I can proceed with diagnosing the root cause? I’m very reluctant to give up on Crate.
This is not currently the case, what you could maybe do here is try a different number of shards to see if a different level of parallelism may help.
Also you will probably see that if you narrow the search space further within the partition identifying the max is much faster, so with some additional logic, if you know the general area where the last_response may be, with some trial and error/binary search kind of strategy you may get results quicker.