I have a use case where the customer requires data to be stored in separate tables. Each table is dynamically created and eventually deleted after a week or so. As a result, there are typically ~20000 tables in the system at any given time. I know that this is not good database design, but, for various reasons, it cannot be changed. My question is: If CrateDB is clustered and the data is not sharded or partitioned, will it spread the tables tables across the cluster rather than putting all the tables on the a single node. Each table is not that big ~100000 rows
Hi,
Different tables will be on different shards, so CrateDB will spread the tables across the cluster,
but please review Sharding and partitioning guide for time-series data - Tutorials - CrateDB Community
In particular we recommend a maximum of 1000 shards per node so for 20 thousand tables you would be looking at 20 nodes minimum.
2 other approaches you may want to consider are:
- if the tables can be grouped in some way you may want to have multiple CrateDB clusters instead of a single one with so many tables
- perhaps you could combine data in less tables but present it as if it were on separate “tables” using views?