Hello,
I have ~30 Million records that could be naturally partitioned into months. that would create around36 partitions. I am running crate on a 2 node cluster. each node has 32gb ram and 2 cpu w/ 8 cores per cpu.
is it a bad id idea to do 2 shards per partition on that table?
the 32 million is my fact table but the dimension tables are much smaller, do i give those 1 shard and not partitioning?
Hi,
If shards are too small performance will not be optimal, so this depends a bit on how large your typical records are, but it sounds a bit like you would end up with too many shards for the size of the table.
Also please note that a 2 nodes cluster will not give you HA, for HA we need a minimum of 3 nodes.
Take a look at Sharding and Partitioning (cratedb.com) for further details.
The entire dataset is 10GB of json files. When loaded on to crate the table shows 30 Million records ~ 6GB. Perhaps I only want 1 shard?
Regarding HA thanks for the remark. I am a bit new.
I noticed that queries are really fast on a single table but joins are really slow so I’m trying to see if there is an option to speed it up.