We have a problem with crateDB in production environment.
We have 3 nodes in the cluster on 3 different servers. All 3 nodes are eligible to become master.
Each server is running on linux (redhat 8.3) and has 60GB of memory. But the service is started with 30GB of heap size, with the following command:
ulimit -u 4096 && CRATE_HEAP_SIZE=‘30g’ && CRATE_JAVA_OPTS=’-Xms30g -Xmx30g’ /apps/crate-4.6.4/bin/crate
We are currently using version 4.6.4 of Crate.
The problem occurs since we switched to version 4.3.
When we start the cluster, SQL via HTTP is OK.
As soon as the synchronization reaches 100% the SQL is no longer accessible.
We have activated the debug level on the logs.
We see some activity, but the HTTP service is not responding.
No error appears.
The configuration is quite simple:
cluster.name: clustertername node.name: "node1 path.data: /data/crate/clustername/ gateway.expected_nodes: 3 gateway.recover_after_nodes: 2 network.host: 10.135.x.y network.bind_host: "dns_alias_for_10.135.x.y" node.master: true discovery.seed_hosts: - "10.135.x.x" - "10.135.x.y" - "10.135.x.z" cluster.initial_master_nodes: - "10.135.x.x" - "10.135.x.y" - "10.135.x.z"
Would you have an idea to help us in solving this problem please?
In the meantime, so that the nodes are always up, we delete the /data/crate/clustername/ directory to force a synchronization…
This just keeps the HTTP SQL active.
Thanks for your help!