Can CrateDB function normally if one node in a 3-node cluster immediately goes down?

Hello,

I have a 3-node CrateDB cluster (version 4.6.5) running on Debian GNU/Linux servers, and the table structure as can be seen at the end of this message. The table holds about 170 million records for time series data (about 27 GB).

My questions are:

  • If I immediately power down one of the CrateDB nodes, would I still be able to read and write via one of the remaining two CrateDB nodes?
  • If I power up that one node again, will it be able to join the cluster and continue as before?
CREATE TABLE IF NOT EXISTS "data"."table1" (
   "field_1" VARCHAR(512),
   "datetime" TIMESTAMP WITH TIME ZONE,
   "field_3" DOUBLE PRECISION,
   ...
   ...
   PRIMARY KEY ("field_1", "datetime")
)
CLUSTERED INTO 6 SHARDS
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

Yes, as long as there is a replica in the cluster all data is available. In your schema you have set number_of_replicas to '0-1' (default setting), which one a 3 node (or more) cluster ensures that there is a replica. If you are doing upgrades or maintenance work ensure, try to gracefully shutdown the node though. Rolling upgrade — CrateDB: How-Tos

  • If I power up that one node again , will it be able to join the cluster and continue as before?

Yes, depending on how long the node was out of the cluster it will take shorter or longer for it to have caught up again (i.e. the shards are synced). With 27 GiB of data I would expect it to happen within seconds.

1 Like