We used to have a 3 node Crate cluster running Crate v3
However, since upgrading to Crate v4 the cluster simply cannot be formed.
Each node insists on becoming the master of its own cluster and they won’t load the data that we previously had stored in /var/lib/crate, even though the “nodes/” folder is still there.
ach@smartvalve02:/var/log/crate$ du -sh /var/lib/crate/
4.3G /var/lib/crate/
ach@smartvalve04:/var/log/crate$ du -sh /var/lib/crate/
64G /var/lib/crate/
ach@smartvalve05:/var/log/crate$ du -sh /var/lib/crate/
90G /var/lib/crate/
CrateDB was installed via ppa on Ubuntu:
Package: crate
Version: 4.1.1-1~bionic
Priority: extra
Section: net
Maintainer: CRATE Technology GmbH <team@crate.io>
Installed-Size: 65.9 MB
Depends: default-jre-headless (>= 11), adduser
Homepage: https://crate.io/
Download-Size: 53.8 MB
APT-Manual-Installed: yes
APT-Sources: https://cdn.crate.io/downloads/deb/stable bionic/main amd64 Packages
Description: The fast, scalable, easy to use SQL database
Crate.io has built a new breed of database to serve today’s mammoth data needs.
Based on the familiar SQL syntax, CrateDB combines high availability, resiliency,
and scalability in a distributed design that allows you to query mountains of
data in realtime, not batches. We solve your data scaling problems and make
administration a breeze. Easy to scale, simple to use.
The 3 nodes in the cluster have the following IPs:
172.18.252.26 (smartvalve02)
172.18.252.28 (smartvalve04)
172.18.252.29 (smartvalve05)
ach@smartvalve02:/var/log/crate$ ifconfig
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.252.26 netmask 255.255.255.0 broadcast 172.18.252.255
inet6 fe80::20c:29ff:fe71:7308 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:71:73:08 txqueuelen 1000 (Ethernet)
RX packets 368624 bytes 40810982 (40.8 MB)
RX errors 0 dropped 131519 overruns 0 frame 0
TX packets 93100 bytes 96878675 (96.8 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 2546 bytes 204048 (204.0 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2546 bytes 204048 (204.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ach@smartvalve04:/var/log/crate$ ifconfig
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.252.28 netmask 255.255.255.0 broadcast 172.18.252.255
inet6 fe80::20c:29ff:fea9:ff54 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:a9:ff:54 txqueuelen 1000 (Ethernet)
RX packets 656318 bytes 565698242 (565.6 MB)
RX errors 0 dropped 131552 overruns 0 frame 0
TX packets 246863 bytes 36173729 (36.1 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 1213 bytes 97955 (97.9 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1213 bytes 97955 (97.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ach@smartvalve05:/var/log/crate$ ifconfig
ens160: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.252.29 netmask 255.255.255.0 broadcast 172.18.252.255
inet6 fe80::20c:29ff:fecf:50f1 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:cf:50:f1 txqueuelen 1000 (Ethernet)
RX packets 467113 bytes 292046714 (292.0 MB)
RX errors 0 dropped 131546 overruns 0 frame 0
TX packets 153541 bytes 21338417 (21.3 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 941 bytes 76172 (76.1 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 941 bytes 76172 (76.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
the configuration file crate.yml located in /etc/crate/ is as follows:
network.host: 172.18.252.26
network.publish_host: 172.18.252.26
transport.publish_port: 4300
discovery.seed_hosts:
- 172.18.252.26:4300
- 172.18.252.28:4300
- 172.18.252.29:4300
cluster.initial_master_nodes:
- 172.18.252.26
- 172.18.252.28
- 172.18.252.29
gateway:
recover_after_nodes: 3
recover_after_time: 1m
expected_nodes: 3
auth.host_based.enabled: false
cluster.name: smartvalve
node.name: node1
The entries network.host, network.publish_host, and node.name are of course changed for each node to reflect that node’s IP and particular name. The cluster.name, discovery.seed_hosts, etc. are left the same for all nodes.
After running Crate with this configuration the network ports are connected as follows:
ach@smartvalve02:/var/log/crate$ sudo lsof -i -P -n
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
systemd-r 876 systemd-resolve 12u IPv4 17299 0t0 UDP 127.0.0.53:53
systemd-r 876 systemd-resolve 13u IPv4 17300 0t0 TCP 127.0.0.53:53 (LISTEN)
sshd 1131 root 3u IPv4 18107 0t0 TCP *:22 (LISTEN)
sshd 1131 root 4u IPv6 18109 0t0 TCP *:22 (LISTEN)
sshd 2463 root 3u IPv4 27181 0t0 TCP 172.18.252.26:22->172.18.252.111:46560 (ESTABLISHED)
sshd 2545 ach 3u IPv4 27181 0t0 TCP 172.18.252.26:22->172.18.252.111:46560 (ESTABLISHED)
sshd 4837 root 3u IPv4 47086 0t0 TCP 172.18.252.26:22->172.18.252.111:47430 (ESTABLISHED)
sshd 4952 ach 3u IPv4 47086 0t0 TCP 172.18.252.26:22->172.18.252.111:47430 (ESTABLISHED)
sshd 5296 root 3u IPv4 52014 0t0 TCP 172.18.252.26:22->172.18.252.111:47440 (ESTABLISHED)
sshd 5378 ach 3u IPv4 52014 0t0 TCP 172.18.252.26:22->172.18.252.111:47440 (ESTABLISHED)
master 16084 root 13u IPv4 307955 0t0 TCP *:25 (LISTEN)
master 16084 root 14u IPv6 307956 0t0 TCP *:25 (LISTEN)
java 19497 crate 130u IPv6 332534 0t0 TCP 172.18.252.26:5432 (LISTEN)
java 19497 crate 159u IPv6 332552 0t0 TCP 172.18.252.26:4200 (LISTEN)
java 19497 crate 208u IPv6 332569 0t0 TCP 172.18.252.26:4300 (LISTEN)
java 19497 crate 210u IPv6 332620 0t0 TCP 172.18.252.26:4300->172.18.252.28:54318 (ESTABLISHED)
java 19497 crate 211u IPv6 332621 0t0 TCP 172.18.252.26:4300->172.18.252.28:54324 (ESTABLISHED)
java 19497 crate 212u IPv6 332622 0t0 TCP 172.18.252.26:4300->172.18.252.28:54326 (ESTABLISHED)
java 19497 crate 213u IPv6 332623 0t0 TCP 172.18.252.26:4300->172.18.252.28:54330 (ESTABLISHED)
java 19497 crate 214u IPv6 332587 0t0 TCP 172.18.252.26:4300->172.18.252.29:45216 (ESTABLISHED)
java 19497 crate 215u IPv6 332624 0t0 TCP 172.18.252.26:4300->172.18.252.28:54334 (ESTABLISHED)
java 19497 crate 216u IPv6 332625 0t0 TCP 172.18.252.26:4300->172.18.252.28:54338 (ESTABLISHED)
java 19497 crate 217u IPv6 338026 0t0 TCP 172.18.252.26:4300->172.18.252.28:54344 (ESTABLISHED)
java 19497 crate 218u IPv6 338027 0t0 TCP 172.18.252.26:4300->172.18.252.28:54342 (ESTABLISHED)
java 19497 crate 219u IPv6 338028 0t0 TCP 172.18.252.26:4300->172.18.252.28:54346 (ESTABLISHED)
java 19497 crate 220u IPv6 338029 0t0 TCP 172.18.252.26:4300->172.18.252.28:54350 (ESTABLISHED)
java 19497 crate 221u IPv6 338030 0t0 TCP 172.18.252.26:4300->172.18.252.28:54354 (ESTABLISHED)
java 19497 crate 222u IPv6 338031 0t0 TCP 172.18.252.26:4300->172.18.252.28:54358 (ESTABLISHED)
java 19497 crate 223u IPv6 337837 0t0 TCP 172.18.252.26:4200->172.18.252.111:47964 (ESTABLISHED)
java 19497 crate 225u IPv6 337467 0t0 TCP 172.18.252.26:4300->172.18.252.29:45174 (ESTABLISHED)
java 19497 crate 226u IPv6 337468 0t0 TCP 172.18.252.26:4300->172.18.252.29:45178 (ESTABLISHED)
java 19497 crate 227u IPv6 337469 0t0 TCP 172.18.252.26:4300->172.18.252.29:45186 (ESTABLISHED)
java 19497 crate 228u IPv6 337470 0t0 TCP 172.18.252.26:4300->172.18.252.29:45190 (ESTABLISHED)
java 19497 crate 229u IPv6 332580 0t0 TCP 172.18.252.26:4300->172.18.252.29:45194 (ESTABLISHED)
java 19497 crate 230u IPv6 332581 0t0 TCP 172.18.252.26:4300->172.18.252.29:45196 (ESTABLISHED)
java 19497 crate 231u IPv6 332582 0t0 TCP 172.18.252.26:4300->172.18.252.29:45202 (ESTABLISHED)
java 19497 crate 232u IPv6 332583 0t0 TCP 172.18.252.26:4300->172.18.252.29:45200 (ESTABLISHED)
java 19497 crate 233u IPv6 332584 0t0 TCP 172.18.252.26:4300->172.18.252.29:45206 (ESTABLISHED)
java 19497 crate 234u IPv6 332585 0t0 TCP 172.18.252.26:4300->172.18.252.29:45210 (ESTABLISHED)
java 19497 crate 235u IPv6 332586 0t0 TCP 172.18.252.26:4300->172.18.252.29:45208 (ESTABLISHED)
java 19497 crate 236u IPv6 332588 0t0 TCP 172.18.252.26:4300->172.18.252.29:45220 (ESTABLISHED)
java 19497 crate 237u IPv6 338032 0t0 TCP 172.18.252.26:4300->172.18.252.28:54364 (ESTABLISHED)
java 19497 crate 240u IPv6 337474 0t0 TCP 172.18.252.26:4200->172.18.252.111:47798 (ESTABLISHED)
java 19497 crate 241u IPv6 337477 0t0 TCP 172.18.252.26:4200->172.18.252.111:47808 (ESTABLISHED)
ach@smartvalve02:/var/log/crate$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 876/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1131/sshd
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN 16084/master
tcp6 0 0 172.18.252.26:4200 :::* LISTEN 19497/java
tcp6 0 0 172.18.252.26:4300 :::* LISTEN 19497/java
tcp6 0 0 :::22 :::* LISTEN 1131/sshd
tcp6 0 0 172.18.252.26:5432 :::* LISTEN 19497/java
tcp6 0 0 :::25 :::* LISTEN 16084/master
udp 0 0 127.0.0.53:53 0.0.0.0:* 876/systemd-resolve
As you can see, Crate in 172.18.252.26 speaks to the other 2 nodes (172.18.252.28 and 172.18.252.29) over IPv6. However upon loading the Admin UI of each cluster, they all show as belonging to smartvalve cluster, but they are alone in the cluster and they don’t see the data in /var/lib/crate to start loading it.
I though it was because of the IPv6 so I opened the file /etc/default/crate
and wrote:
CRATE_USE_IPV4=true
This correctly forced Crate to use IPv4 instead of IPv6 (as observed in the previous command). However, all nodes are still unable to see the other nodes.
The log file created upon starting the service Crate is as follows:
[2020-02-17T14:17:25,244][INFO ][o.e.e.NodeEnvironment ] [node1] using [1] data paths, mounts [[/ (/dev/sda2)]], net usable_space [360.6gb], net total_space [393.6gb], types [ext4]
[2020-02-17T14:17:25,253][INFO ][o.e.e.NodeEnvironment ] [node1] heap size [20gb], compressed ordinary object pointers [true]
[2020-02-17T14:17:25,273][INFO ][o.e.n.Node ] [node1] node name [node1], node ID [ISvG54peS42ip7QUHiO0Mg]
[2020-02-17T14:17:25,274][INFO ][o.e.n.Node ] [node1] version[4.1.1], pid[19497], build[95e20da/2020-01-30T16:22:05Z], OS[Linux/4.15.0-76-generic/amd64], JVM[Ubuntu/OpenJDK 64-Bit Server VM/11.0.6/11.0.6+10-post-Ubuntu-1ubuntu118.04.1]
[2020-02-17T14:17:25,467][INFO ][i.c.plugin ] [node1] plugins loaded: [enterpriseFunctions, lang-js, jmx-monitoring]
[2020-02-17T14:17:26,236][INFO ][o.e.p.PluginsService ] [node1] no modules loaded
[2020-02-17T14:17:26,240][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [crate-azure-discovery]
[2020-02-17T14:17:26,241][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [es-repository-hdfs]
[2020-02-17T14:17:26,241][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.plugin.BlobPlugin]
[2020-02-17T14:17:26,241][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.plugin.CrateCommonPlugin]
[2020-02-17T14:17:26,241][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.plugin.HttpTransportPlugin]
[2020-02-17T14:17:26,241][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.plugin.PluginLoaderPlugin]
[2020-02-17T14:17:26,242][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.plugin.SrvPlugin]
[2020-02-17T14:17:26,242][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [io.crate.udc.plugin.UDCPlugin]
[2020-02-17T14:17:26,242][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.analysis.common.CommonAnalysisPlugin]
[2020-02-17T14:17:26,242][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin]
[2020-02-17T14:17:26,242][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin]
[2020-02-17T14:17:26,243][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.plugin.repository.url.URLRepositoryPlugin]
[2020-02-17T14:17:26,243][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.repositories.azure.AzureRepositoryPlugin]
[2020-02-17T14:17:26,243][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.repositories.s3.S3RepositoryPlugin]
[2020-02-17T14:17:26,243][INFO ][o.e.p.PluginsService ] [node1] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
[2020-02-17T14:17:26,846][INFO ][o.e.d.DiscoveryModule ] [node1] using discovery type [zen] and seed hosts providers [settings]
[2020-02-17T14:17:27,404][INFO ][psql ] [node1] PSQL SSL support is disabled.
[2020-02-17T14:17:27,506][INFO ][i.c.p.PipelineRegistry ] [node1] HTTP SSL support is disabled.
[2020-02-17T14:17:27,548][INFO ][o.e.n.Node ] [node1] initialized
[2020-02-17T14:17:27,549][INFO ][o.e.n.Node ] [node1] starting ...
[2020-02-17T14:17:27,690][INFO ][psql ] [node1] publish_address {172.18.252.26:5432}, bound_addresses {172.18.252.26:5432}
[2020-02-17T14:17:27,703][INFO ][i.c.p.h.CrateNettyHttpServerTransport] [node1] publish_address {172.18.252.26:4200}, bound_addresses {172.18.252.26:4200}
[2020-02-17T14:17:27,715][INFO ][o.e.t.TransportService ] [node1] publish_address {172.18.252.26:4300}, bound_addresses {172.18.252.26:4300}
[2020-02-17T14:17:27,719][INFO ][o.e.b.BootstrapChecks ] [node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2020-02-17T14:17:27,844][INFO ][o.e.c.s.MasterService ] [node1] elected-as-master ([1] nodes joined)[{node1}{ISvG54peS42ip7QUHiO0Mg}{s6QTJjnoQQ-LQQtM1XDSrA}{172.18.252.26}{172.18.252.26:4300}{http_address=172.18.252.26:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 31, version: 1038418, reason: master node changed {previous [], current [{node1}{ISvG54peS42ip7QUHiO0Mg}{s6QTJjnoQQ-LQQtM1XDSrA}{172.18.252.26}{172.18.252.26:4300}{http_address=172.18.252.26:4200}]}
[2020-02-17T14:17:28,017][INFO ][o.e.c.s.ClusterApplierService] [node1] master node changed {previous [], current [{node1}{ISvG54peS42ip7QUHiO0Mg}{s6QTJjnoQQ-LQQtM1XDSrA}{172.18.252.26}{172.18.252.26:4300}{http_address=172.18.252.26:4200}]}, term: 31, version: 1038418, reason: Publication{term=31, version=1038418}
[2020-02-17T14:17:28,026][INFO ][o.e.n.Node ] [node1] started
For node2 the behavior is the same, node2 becomes the master of its own cluster:
[2020-02-17T14:19:19,362][INFO ][o.e.c.s.MasterService ] [node2] elected-as-master ([1] nodes joined)[{node2}{na73zHayR2K5b8RCIl3_VQ}{ldyMyUovS4qqUmhDl12yeQ}{172.18.252.28}{172.18.252.28:4300}{http_address=172.18.252.28:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 8, version: 27427850, reason: master node changed {previous [], current [{node2}{na73zHayR2K5b8RCIl3_VQ}{ldyMyUovS4qqUmhDl12yeQ}{172.18.252.28}{172.18.252.28:4300}{http_address=172.18.252.28:4200}]}
[2020-02-17T14:19:19,549][INFO ][o.e.c.s.ClusterApplierService] [node2] master node changed {previous [], current [{node2}{na73zHayR2K5b8RCIl3_VQ}{ldyMyUovS4qqUmhDl12yeQ}{172.18.252.28}{172.18.252.28:4300}{http_address=172.18.252.28:4200}]}, term: 8, version: 27427850, reason: Publication{term=8, version=27427850}
[2020-02-17T14:19:19,559][INFO ][o.e.n.Node ] [node2] started
And the node3 also becomes the master of its own cluster:
[2020-02-17T14:17:35,903][INFO ][o.e.c.s.MasterService ] [node3] elected-as-master ([1] nodes joined)[{node3}{rv8fGDeCSz6fju3-5wO85A}{fAN0wzbJQSiF4H4koBlfog}{172.18.252.29}{172.18.252.29:4300}{http_address=172.18.252.29:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 5, version: 27249552, reason: master node changed {previous [], current [{node3}{rv8fGDeCSz6fju3-5wO85A}{fAN0wzbJQSiF4H4koBlfog}{172.18.252.29}{172.18.252.29:4300}{http_address=172.18.252.29:4200}]}
[2020-02-17T14:17:36,082][INFO ][o.e.c.s.ClusterApplierService] [node3] master node changed {previous [], current [{node3}{rv8fGDeCSz6fju3-5wO85A}{fAN0wzbJQSiF4H4koBlfog}{172.18.252.29}{172.18.252.29:4300}{http_address=172.18.252.29:4200}]}, term: 5, version: 27249552, reason: Publication{term=5, version=27249552}
[2020-02-17T14:17:36,091][INFO ][o.e.n.Node ] [node3] started
I tried changing the crate.yml so that all nodes accept only node1 as master by doing:
cluster.initial_master_nodes:
- 172.18.252.26
However, this didn’t change anything.
This is one of the references that I followed:
https://crate.io/docs/crate/guide/en/latest/scaling/multi-node-setup.html
Please, let me know if I misconfigured something, but it was working perfectly right before changing from Crate v3 to Crate v4.