CrateDB partitioned table vs. TimescaleDB Hypertable

Hello everyone,

I’m starting to discover Crate DB and considering using it in a project where I initially planned to use timescaleDB.

This is a very timeserie intensive project, so I went with TimescaleDB which I’m familiar with. I also need versionning and metadata associated with the timeserie so InfluxDB was out of the picture.

Now I saw CrateDB had partitioned tables where you can use a splitting dimension to it. The example in documentation is extracting days or month out of a date and using it as the partitioning dimension.

In timescaleDB, I use hypertables, which is a form of partitioned table for timeseries, where the partitioning dimension is always time (that can be combined with a secondary one). The difference is I can directly use my timestamp with timezone field as the partitioning dimension and specify the partitioning period (2 days, 1 week, 3 months…).

This is convenient because my partitioning column is just my main time dimension, and is natively optimized for querying, aggregation etc.

I’m concerned that I won’t be able to achieve the same performances on crateDB, or at best I will at the cost of having much more complex query structure, which is not a trivial thing in this R&D project.

So my question are :

  • Anyone with experience on timescaleDB as well can tell me if my concerns are valid or not?
  • If yes, is there a roadmap for CrateDB to maybe have partitioned table supporting timestamp + period based partitioning dimension?
  • I focused primarily on the way tables are partitioned, but timescaleDB is really performant at optimizing complez queries on timeseries. Things like aggregating, grouping, creating cases, or casting data is very easy to do. I’m worried I will loose a lot of that on CrateDB
1 Like


is there a roadmap for CrateDB to maybe have partitioned table supporting timestamp + period based partitioning dimension?

This is already possible with CrateDB:

CREATE TABLE partitionedtable (
   ts timestamp
   ,fortnight GENERATED ALWAYS AS date_bin(INTERVAL '14' DAY,ts,0)
   ,othercolums TEXT
) PARTITIONED BY (fortnight);
1 Like

Just to clarify, you do not need to reference this fortnight columns in your queries, when you filter by ts CrateDB automatically knows which partitions it needs to check.

From Version 5.4.0 — CrateDB: Reference

  • Improved the partition filtering logic to also narrow partitions if the partition is based on a generated column using the date_bin scalar.
1 Like

Thank you for clarifying.

That was my suspision, and I’m afraid this is a redibitory blocker. This forces me to :
-Have a column dedicated specifically for partitioning
-Use this column in all queries with this table

This will increase the workload to construct queries significantly, as the logic is very different, especially when joining with other time-related tables.

What I was asking is if you plan to allow me to have as partitioning dimension my date_time column directly. In timescale I would use parameters like time_column => ‘dt’, chunk_time_interval =>INTERVAL ‘14 days’ to create an equivalent partitioning to your example and can then use my dt field as the partitioning column in the queries.

1 Like

No, you don’t. CrateDBs query engine is smart enough to optimize for generated columns used for partitioning.

1 Like