Hello,
We have a development cluster for CrateDB: it’s a 3-node cluster with CrateDB version 4.6.6. The underlying servers are Debian GNU/Linux 11.1. We install CrateDB packages via the official CrateDB APT repository.
When I tried to view the web-based admin UI, I received the following error:
/usr/share/crate/lib/site/index.html: Too many open files
This is a development cluster that has almost no data on it with minimal load.
When I tried to run an lsof
command to see open files on the first node of that development CrateDB cluster the command took a few minutes to complete and the number of open files were more than 800.000 (at least 2.5 times more than the first node of our production cluster):
$ sudo lsof | wc -l
8133395
$ sudo lsof | wc -l
8133304
Checking the status of crate.service
, I get the following:
$ sudo systemctl status crate.service
● crate.service - CrateDB Server
Loaded: loaded (/lib/systemd/system/crate.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-12-16 09:55:15 UTC; 4 weeks 0 days ago
Docs: https://crate.io/docs/
Main PID: 6204 (java)
Tasks: 124 (limit: 77076)
Memory: 13.7G
CPU: 2h 15min 53.803s
CGroup: /system.slice/crate.service
└─6204 /usr/share/crate/jdk/bin/java -Xms16G -Xmx16G -Djava.awt.headless=true -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Xlog:gc*,gc+age=trace>
Jan 13 10:15:04 dev-crate-dn-001 crate[6204]: io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
Jan 13 10:15:05 dev-crate-dn-001 crate[6204]: [2022-01-13T10:15:05,422][WARN ][i.n.c.DefaultChannelPipeline] [dev-crate-dn-001] An exceptionCaught() event was fired, and it reached at the >
Jan 13 10:15:05 dev-crate-dn-001 crate[6204]: io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
Jan 13 10:15:06 dev-crate-dn-001 crate[6204]: [2022-01-13T10:15:06,150][WARN ][o.e.c.c.ClusterFormationFailureHelper] [dev-crate-dn-001] master not discovered or elected yet, an election r>
Jan 13 10:15:06 dev-crate-dn-001 crate[6204]: [2022-01-13T10:15:06,422][WARN ][i.n.c.DefaultChannelPipeline] [dev-crate-dn-001] An exceptionCaught() event was fired, and it reached at the >
Jan 13 10:15:06 dev-crate-dn-001 crate[6204]: io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
Jan 13 10:15:07 dev-crate-dn-001 crate[6204]: [2022-01-13T10:15:07,422][WARN ][i.n.c.DefaultChannelPipeline] [dev-crate-dn-001] An exceptionCaught() event was fired, and it reached at the >
Jan 13 10:15:07 dev-crate-dn-001 crate[6204]: io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
Jan 13 10:15:08 dev-crate-dn-001 crate[6204]: [2022-01-13T10:15:08,422][WARN ][i.n.c.DefaultChannelPipeline] [dev-crate-dn-001] An exceptionCaught() event was fired, and it reached at the >
Jan 13 10:15:08 dev-crate-dn-001 crate[6204]: io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
Another check:
~$ sudo systemctl status crate.service
● crate.service - CrateDB Server
Loaded: loaded (/lib/systemd/system/crate.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-12-16 09:55:15 UTC; 4 weeks 0 days ago
Docs: https://crate.io/docs/
Main PID: 6204 (java)
Tasks: 124 (limit: 77076)
Memory: 13.7G
CPU: 2h 15min 56.001s
CGroup: /system.slice/crate.service
└─6204 /usr/share/crate/jdk/bin/java -Xms16G -Xmx16G -Djava.awt.headless=true -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Xlog:gc*,gc+age=trace>
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:452) ~[crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:131) ~[crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:698) ~[crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-server.jar:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: at java.lang.Thread.run(Thread.java:831) [?:?]
Jan 13 10:22:36 dev-crate-dn-001 crate[6204]: [2022-01-13T10:22:36,177][WARN ][o.e.c.c.ClusterFormationFailureHelper] [dev-crate-dn-001] master not discovered or elected yet, an election r>
Some more details from logs:
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:266) ~[crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:831) [?:?]
Caused by: java.nio.file.FileSystemException: /data/nodes/0/_state/write.lock: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:292) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:345) ~[?:?]
at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.elasticsearch.gateway.PersistedClusterStateService.createIndexWriter(PersistedClusterStateService.java:208) ~[crate-server.jar:?]
at org.elasticsearch.gateway.PersistedClusterStateService.createWriter(PersistedClusterStateService.java:184) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.getWriterSafe(GatewayMetaState.java:491) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.setCurrentTerm(GatewayMetaState.java:447) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.CoordinationState.handleStartJoin(CoordinationState.java:188) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.joinLeaderInTerm(Coordinator.java:427) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.ensureTermAtLeast(Coordinator.java:419) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoin(Coordinator.java:931) ~[crate-server.jar:?]
at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:499) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$7(Coordinator.java:465) ~[crate-server.jar:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:121) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:334) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:452) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:131) ~[crate-server.jar:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:266) ~[crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
at java.lang.Thread.run(Thread.java:831) ~[?:?]
[2022-01-13T10:26:26,194][WARN ][o.e.c.c.ClusterFormationFailureHelper] [dev-crate-dn-001] master not discovered or elected yet, an election requires at least 2 nodes with ids from [oCYP8FLRTcWH20fAPboXmA, NLSbhbptSNaHuVwXlv-T8g, O7KFL7W3StGLLN6mTmQB7Q], have discovered [{dev-crate-dn-001}{NLSbhbptSNaHuVwXlv-T8g}{P0zOcABzRYywaX8ivj6DNA}{192.168.239.50}{192.168.239.50:4300}{http_address=192.168.239.50:4200}, {dev-crate-dn-002}{oCYP8FLRTcWH20fAPboXmA}{FXd6ml8vQr2vLSgSHJ3Pbw}{192.168.239.51}{192.168.239.51:4300}{http_address=192.168.239.51:4200}, {dev-crate-dn-003}{O7KFL7W3StGLLN6mTmQB7Q}{nc2jAKtZRpayr75Ck-fobw}{192.168.239.52}{192.168.239.52:4300}{http_address=192.168.239.52:4200}] which is a quorum; discovery will continue using [192.168.239.51:4300, 192.168.239.52:4300] from hosts providers and [{dev-crate-dn-003}{O7KFL7W3StGLLN6mTmQB7Q}{nc2jAKtZRpayr75Ck-fobw}{192.168.239.52}{192.168.239.52:4300}{http_address=192.168.239.52:4200}, {dev-crate-dn-001}{NLSbhbptSNaHuVwXlv-T8g}{P0zOcABzRYywaX8ivj6DNA}{192.168.239.50}{192.168.239.50:4300}{http_address=192.168.239.50:4200}, {dev-crate-dn-002}{oCYP8FLRTcWH20fAPboXmA}{FXd6ml8vQr2vLSgSHJ3Pbw}{192.168.239.51}{192.168.239.51:4300}{http_address=192.168.239.51:4200}] from last-known cluster state; node term 262158, last-accepted version 582 in term 4
[2022-01-13T10:26:34,558][INFO ][o.e.c.c.JoinHelper ] [dev-crate-dn-001] failed to join {dev-crate-dn-002}{oCYP8FLRTcWH20fAPboXmA}{FXd6ml8vQr2vLSgSHJ3Pbw}{192.168.239.51}{192.168.239.51:4300}{http_address=192.168.239.51:4200} with JoinRequest{sourceNode={dev-crate-dn-001}{NLSbhbptSNaHuVwXlv-T8g}{P0zOcABzRYywaX8ivj6DNA}{192.168.239.50}{192.168.239.50:4300}{http_address=192.168.239.50:4200}, optionalJoin=Optional[Join{term=262160, lastAcceptedTerm=4, lastAcceptedVersion=582, sourceNode={dev-crate-dn-001}{NLSbhbptSNaHuVwXlv-T8g}{P0zOcABzRYywaX8ivj6DNA}{192.168.239.50}{192.168.239.50:4300}{http_address=192.168.239.50:4200}, targetNode={dev-crate-dn-002}{oCYP8FLRTcWH20fAPboXmA}{FXd6ml8vQr2vLSgSHJ3Pbw}{192.168.239.51}{192.168.239.51:4300}{http_address=192.168.239.51:4200}}]}
org.elasticsearch.transport.RemoteTransportException: [dev-crate-dn-002][192.168.239.51:4300][internal:cluster/coordination/join]
Caused by: org.elasticsearch.ElasticsearchException: java.nio.file.FileSystemException: /data/nodes/0/_state/write.lock: Too many open files
at org.elasticsearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:53) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.getWriterSafe(GatewayMetaState.java:500) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.setCurrentTerm(GatewayMetaState.java:447) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.CoordinationState.handleStartJoin(CoordinationState.java:188) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.joinLeaderInTerm(Coordinator.java:427) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.ensureTermAtLeast(Coordinator.java:419) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoin(Coordinator.java:931) ~[crate-server.jar:?]
at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:499) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$7(Coordinator.java:465) ~[crate-server.jar:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:121) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:334) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:452) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:131) ~[crate-server.jar:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:266) ~[crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
at java.lang.Thread.run(Thread.java:831) [?:?]
Caused by: java.nio.file.FileSystemException: /data/nodes/0/_state/write.lock: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:292) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:345) ~[?:?]
at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:125) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:923) ~[lucene-core-8.8.2.jar:8.8.2 a92a05e195b775b30ca410bc0a26e8e79e7b3bfb - mdrob - 2021-04-06 16:33:27]
at org.elasticsearch.gateway.PersistedClusterStateService.createIndexWriter(PersistedClusterStateService.java:208) ~[crate-server.jar:?]
at org.elasticsearch.gateway.PersistedClusterStateService.createWriter(PersistedClusterStateService.java:184) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.getWriterSafe(GatewayMetaState.java:491) ~[crate-server.jar:?]
at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.setCurrentTerm(GatewayMetaState.java:447) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.CoordinationState.handleStartJoin(CoordinationState.java:188) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.joinLeaderInTerm(Coordinator.java:427) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.ensureTermAtLeast(Coordinator.java:419) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoin(Coordinator.java:931) ~[crate-server.jar:?]
at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:499) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$7(Coordinator.java:465) ~[crate-server.jar:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:121) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:334) ~[crate-server.jar:?]
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:318) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.Coordinator.handleJoinRequest(Coordinator.java:452) ~[crate-server.jar:?]
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$0(JoinHelper.java:131) ~[crate-server.jar:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[crate-server.jar:?]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:266) ~[crate-server.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[crate-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
at java.lang.Thread.run(Thread.java:831) ~[?:?]
Any ideas why I’m getting this “Too many open files” error and how to fix this?