Hello everyone,
I’m starting to discover Crate DB and considering using it in a project where I initially planned to use timescaleDB.
This is a very timeserie intensive project, so I went with TimescaleDB which I’m familiar with. I also need versionning and metadata associated with the timeserie so InfluxDB was out of the picture.
Now I saw CrateDB had partitioned tables where you can use a splitting dimension to it. The example in documentation is extracting days or month out of a date and using it as the partitioning dimension.
In timescaleDB, I use hypertables, which is a form of partitioned table for timeseries, where the partitioning dimension is always time (that can be combined with a secondary one). The difference is I can directly use my timestamp with timezone field as the partitioning dimension and specify the partitioning period (2 days, 1 week, 3 months…).
This is convenient because my partitioning column is just my main time dimension, and is natively optimized for querying, aggregation etc.
I’m concerned that I won’t be able to achieve the same performances on crateDB, or at best I will at the cost of having much more complex query structure, which is not a trivial thing in this R&D project.
So my question are :
- Anyone with experience on timescaleDB as well can tell me if my concerns are valid or not?
- If yes, is there a roadmap for CrateDB to maybe have partitioned table supporting timestamp + period based partitioning dimension?
- I focused primarily on the way tables are partitioned, but timescaleDB is really performant at optimizing complez queries on timeseries. Things like aggregating, grouping, creating cases, or casting data is very easy to do. I’m worried I will loose a lot of that on CrateDB