CrateDB partitioned table vs. TimescaleDB Hypertable

Tass · February 1, 2024, 3:11pm

Hello everyone,

I’m starting to discover Crate DB and considering using it in a project where I initially planned to use timescaleDB.

This is a very timeserie intensive project, so I went with TimescaleDB which I’m familiar with. I also need versionning and metadata associated with the timeserie so InfluxDB was out of the picture.

Now I saw CrateDB had partitioned tables where you can use a splitting dimension to it. The example in documentation is extracting days or month out of a date and using it as the partitioning dimension.

In timescaleDB, I use hypertables, which is a form of partitioned table for timeseries, where the partitioning dimension is always time (that can be combined with a secondary one). The difference is I can directly use my timestamp with timezone field as the partitioning dimension and specify the partitioning period (2 days, 1 week, 3 months…).

This is convenient because my partitioning column is just my main time dimension, and is natively optimized for querying, aggregation etc.

I’m concerned that I won’t be able to achieve the same performances on crateDB, or at best I will at the cost of having much more complex query structure, which is not a trivial thing in this R&D project.

So my question are :

Anyone with experience on timescaleDB as well can tell me if my concerns are valid or not?
If yes, is there a roadmap for CrateDB to maybe have partitioned table supporting timestamp + period based partitioning dimension?
I focused primarily on the way tables are partitioned, but timescaleDB is really performant at optimizing complez queries on timeseries. Things like aggregating, grouping, creating cases, or casting data is very easy to do. I’m worried I will loose a lot of that on CrateDB

hernanc · February 1, 2024, 3:21pm

Hi,

is there a roadmap for CrateDB to maybe have partitioned table supporting timestamp + period based partitioning dimension?

This is already possible with CrateDB:

CREATE TABLE partitionedtable (
   ts timestamp
   ,fortnight GENERATED ALWAYS AS date_bin(INTERVAL '14' DAY,ts,0)
   ,othercolums TEXT
) PARTITIONED BY (fortnight);

hernanc · February 1, 2024, 3:24pm

Just to clarify, you do not need to reference this fortnight columns in your queries, when you filter by ts CrateDB automatically knows which partitions it needs to check.

From Version 5.4.0 — CrateDB: Reference

Improved the partition filtering logic to also narrow partitions if the partition is based on a generated column using the date_bin scalar.

Tass · February 1, 2024, 5:48pm

Thank you for clarifying.

That was my suspision, and I’m afraid this is a redibitory blocker. This forces me to :
-Have a column dedicated specifically for partitioning
-Use this column in all queries with this table

This will increase the workload to construct queries significantly, as the logic is very different, especially when joining with other time-related tables.

What I was asking is if you plan to allow me to have as partitioning dimension my date_time column directly. In timescale I would use parameters like time_column => ‘dt’, chunk_time_interval =>INTERVAL ‘14 days’ to create an equivalent partitioning to your example and can then use my dt field as the partitioning column in the queries.

proddata · February 1, 2024, 5:59pm

No, you don’t. CrateDBs query engine is smart enough to optimize for generated columns used for partitioning.

Topic		Replies	Views
PostgreSQL time-related queries faster than the same queries in CrateDB SQL	1	900	December 1, 2021
Partition requires significantly a lot more space than the others CrateDB	10	1060	October 26, 2021
Optimizing storage for historic time-series data Tutorials performance , data-storage	10	3370	June 22, 2022
Starting out with cratedb CrateDB	5	1025	May 31, 2022
First results using CrateDB with Tableau Integrations integration	1	57	December 25, 2024

CrateDB partitioned table vs. TimescaleDB Hypertable

Related topics