Partitioning strategy

bfmcneill · August 14, 2024, 8:43pm

Hello,

I have ~30 Million records that could be naturally partitioned into months. that would create around36 partitions. I am running crate on a 2 node cluster. each node has 32gb ram and 2 cpu w/ 8 cores per cpu.

is it a bad id idea to do 2 shards per partition on that table?

the 32 million is my fact table but the dimension tables are much smaller, do i give those 1 shard and not partitioning?

hernanc · August 15, 2024, 7:18am

Hi,
If shards are too small performance will not be optimal, so this depends a bit on how large your typical records are, but it sounds a bit like you would end up with too many shards for the size of the table.
Also please note that a 2 nodes cluster will not give you HA, for HA we need a minimum of 3 nodes.
Take a look at Sharding and Partitioning (cratedb.com) for further details.

bfmcneill · August 15, 2024, 8:00am

The entire dataset is 10GB of json files. When loaded on to crate the table shows 30 Million records ~ 6GB. Perhaps I only want 1 shard?

Regarding HA thanks for the remark. I am a bit new.

I noticed that queries are really fast on a single table but joins are really slow so I’m trying to see if there is an option to speed it up.

hernanc · February 6, 2025, 2:42pm

Hi, yes, for 6 GB you may want 1 shard.

joins are really slow

Could you share some details on the kind of JOINs you are doing? we got significant improvements to JOINs performance in CrateDB 5.8

Topic		Replies	Views
Why cratedb creating more than 1000 shards on partitioned table? CrateDB	1	768	September 8, 2020
Large number of tables CrateDB	1	513	May 10, 2023
Sharding and partitioning guide for time-series data Tutorials sql , fundamentals , getting-started , performance	0	4882	July 2, 2021
Index size limit CrateDB	1	1124	October 24, 2019
Partition requires significantly a lot more space than the others CrateDB	10	1050	October 26, 2021

Partitioning strategy

Related topics