To finalize this thread: I’ve stopped and started CrateDB on the first node, waited for it to be a part of cluster and then repeated the whole process, this time with a successful graceful shutdown. Then I upgraded CrateDB Debian packages and repeated the same thing for other nodes, finishing the cluster upgrade process in about 20 minutes.
I think the reason this process (graceful shutdown on the first node) was stuck waiting is:
- After issuing the DECOMMISSION command via
crash
on the first node, I waited for about 1 minute and then Ctrl-C exited fromcrash
command line utility. - And the reason I exited (Ctrl-C) is because I thought I could do it, and also I was expecting the decommission process to finish by that time. Apparently, it takes more than a few minutes!
- In the successful case, I have seen that it takes about 7-8 minutes between the
ALTER CLUSTER DECOMMISSION 'whatever-node-name-... ;
and receivingALTER OK, 1 row affected
message.
During that process, min_availability
was PRIMARIES
(I never changed that):
select settings['cluster']['graceful_stop']['min_availability'] from sys.cluster limit 100;
settings['cluster']['graceful_stop']['min_availability']
----------------------------------------------------------
PRIMARIES
(1 row)
And the largest time series table had '0-1'
as the number of replicas:
number_of_replicas = '0-1',