Advice for table/partitioning/sharding strategies

Monthly was just an initial suggestion, not knowing exact data volumes and expected growth. But yes, the assumption was also that yearly data is much less than the per minute data.

You could also have different columns :wink:
CrateDB doesn’t really care for sparse data structures.

CREATE TABLE IF NOT EXISTS device_data (
    device_id TEXT,
    ts TIMESTAMP,
    ts_g GENERATED ALWAYS AS DATE_TRUNC('month', ts),
    val_minute DOUBLE,
    val_hour DOUBLE,
    ....
) 
CLUSTERED BY (device_id)
PARTITIONED BY (ts_g);

… or even keep them in an OBJECT …

CREATE TABLE IF NOT EXISTS device_data (
    device_id TEXT,
    ts TIMESTAMP,
    ts_g GENERATED ALWAYS AS DATE_TRUNC('month', ts),
    val OBJECT AS (
         "minute" DOUBLE,
         "hour" DOUBLE,
       ...  
   )
) 
CLUSTERED BY (device_id)
PARTITIONED BY (ts_g);

that would potentially even make filtering faster.

I would always first go with the simpler solution and don’t preemptively optimize. Be aware we have users with multiple 10s to 100s of TiB in single tables.


you might also want to look into this:

1 Like