Partition requires significantly a lot more space than the others

proddata · October 20, 2021, 1:14pm

Just for a general reference oversharding / overpartitioning is considered bad practice. Typically a single shard can easily hold between 10 to 50 GiBs of data without any significant performance impacts (also see Sharding and Partitioning Guide for Time Series Data)

→ a partition of 6 shards is fine to hold ~300 GiBs of data.

Also specifying CLUSTERED BY(hod) probably prevents data to be probably distributed across shards. If less shards are used, probably also the performance is worse.

I would suggest to do the following:

CLUSTERED INTO 10 SHARDS PARTITIONED BY (year)

10 shards, so that each node ideally gets assigned 2 and partition only by year

Topic		Replies	Views
Why cratedb creating more than 1000 shards on partitioned table? CrateDB	1	768	September 8, 2020
Partitioning strategy CrateDB fundamentals	3	57	February 6, 2025
Missing records during ingestion via telegraf CrateDB	13	964	September 2, 2022
Client interface to use for faster data ingestion CrateDB	1	661	May 13, 2020
HTTP endpoint insert size CrateDB	2	46	June 12, 2025

Partition requires significantly a lot more space than the others

Related topics