How add partitions to existing table?

cocoa · July 25, 2024, 3:49pm

Hello everyone,

I have a table created with this configuration :

schema	table_name	num_shards	num_reps	c_b	p_b	blobs_path
mars	data_log	6	2	_id	NULL	NULL

I would like to bring the number of replicas to 4 and partition it on a monthly basis.

Is it possible to do this on an existing table?

What is the number of shards and replicas to be set so that you have the same number of shards on each node?

The purpose is to have all the nodes in the cluster with the exact same replication, so that each node in the cluster has replicated the entire table and is totally independent, that is, in the event of a crash, if only one in five nodes were to survive, the surviving node has the entire table.

Thank you.

matriv · July 26, 2024, 1:25pm

You cannot add partitions to an existing table, you’d need to recreate it with a partitioned by(<col>) clause. After you create it you can insert data from your existing table into a new one with insert into t_parted select * from t_old and then use: https://cratedb.com/docs/crate/reference/en/latest/sql/statements/alter-cluster.html#swap-table to swap the two tables.

For a partitioned table the number of shards and replicas are applied to each partition, so if `CLUSTERED INTO 3 shards WITH(number_of_replicas = 3), for a simple table you would have a total of 12 shards, 3 primaries, 9 replicas. For a partitioned table with let’s say 10 partitions you’ll end up with 120 shards in total (12 shards per partition).

matriv · July 26, 2024, 1:29pm

If you have 5 nodes and you want to sustain 4 of them going offline, and all your data is available, you’d need 4 replicas to be configured (for each shard).

cocoa · July 27, 2024, 6:13am

Hi matriv,
Thank you so much for your help.

I still have a question :

If something goes wrong during the ‘swap’ phase (the table in question has over 30 million records) does CrateDB keep a copy or is it able to roll back to the situation before the swap command?

cocoa · July 27, 2024, 6:22am

To be sure, what I need to do would be :

ALTER TABLE mars.data_log SET (numbers_of_replicas = 4);

Is this the correct way to have 4 replicas for each shard ?

Thx.

proddata · July 27, 2024, 7:08am

It’s essential to differentiate between cluster availability and data availability. A cluster always needs a quorum in the voting configuration to avoid a split-brain scenario. Without a quorum, data cannot be accessed or modified. In a 5-node cluster, a maximum of 3 nodes can leave, but only if they do so sequentially, not simultaneously (simultaneously max 2). The voting configuration can automatically adjust but will never shrink below 3 nodes. This means that if there are 3 nodes in the voting configuration and one becomes unavailable, they all remain in the voting configuration, requiring a quorum of 2 nodes.

So, if you don’t use any availability zoning setup, you will need at least n-1 copies of a shard (i.e., 1 primary + n-2 replicas) in an n-node cluster. If you expect your cluster to only lose nodes in terms of, e.g., an availability zone, this can be further reduced.

regardless if you wan’t your data to be replicated across all nodes you can use the shortcut

(number_of_replicas ='0-all');

If something goes wrong during the ‘swap’ phase (the table in question has over 30 million records) does CrateDB keep a copy or is it able to roll back to the situation before the swap command?

It is always advisable to have a snapshot of your data ready, but this operation should either succeed or fail completely.

cocoa · July 27, 2024, 7:50am

Thank you, your answer about availability really hit the mark: difference between cluster availability and data availability. Cratedb cluster behavior understandable now ! (at least for me ).

amotl · July 27, 2024, 8:03am

Hi there,

Thanks for this excellent conversation. Always trying to improve our documentation, do you think it makes sense to add corresponding content to our new page in the CrateDB Guide about this topic?

The page is trying to explain relevant details about the clustering features of CrateDB, accompanied by likewise relevant educational and guiding material how to operate it well. In this spirit, both of your questions and relevant responses could yield valuable improvements.

With kind regards,
Andreas.

Topic		Replies	Views
Is it possible to partition a table after it's created and data inserted into it? CrateDB	1	875	February 24, 2022
How to delete empty partitions? CrateDB	2	1053	August 31, 2020
Feasibility of altering table partition CrateDB	4	588	May 25, 2021
Guide to decrease the number of shards in your cluster Tutorials performance , data-storage	3	1372	June 3, 2022
When are shards relocated to other nodes? Can you clarify the behavior of CrateDB? CrateDB	4	801	January 7, 2022

How add partitions to existing table?

Related topics