Getting error "Too many open files in system"

Hi,

  1. We are getting error “Too many open files in system” in our Crate 5.8.2
    cluster setup.

  2. We are not seeing large no. of files getting opened in any of crate cluster node. Here is the output of the query:
    SELECT process[‘open_file_descriptors’], process[‘max_open_file_descriptors’], FROM sys.nodes

process[‘open_file_descriptors’] process[‘max_open_file_descriptors’]
2729 262144
  1. Output for the following command (on all nodes) is not more than few thousands
    ls /proc//fd/ | wc -l

  2. Output for the the following query is: 30536
    SELECT count(*) from sys.segments

  3. We have already increased file limit on all nodes to 200000.

  4. Here is the complete exception stack trace:
    crate[4877]: Caused by: java.nio.file.FileSystemException: /data/crate/nodes/0/indices/sJjr0XxrTpCTalr1QKtYhg/0/index/_cco_CrateDBLucene90_163.dvd: Too many open files in system
    crate[4877]: #011at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
    crate[4877]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
    crate[4877]: #011at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
    crate[4877]: #011at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:261) ~[?:?]
    crate[4877]: #011at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:482) ~[?:?]
    crate[4877]: #011at java.base/java.nio.file.Files.newOutputStream(Files.java:227) ~[?:?]
    crate[4877]: #011at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:394) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:387) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:220) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:75) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.elasticsearch.index.store.ByteSizeCachingDirectory.createOutput(ByteSizeCachingDirectory.java:129) ~[crate-server-5.8.2.jar:?]
    crate[4877]: #011at org.apache.lucene.store.FilterDirectory.createOutput(FilterDirectory.java:75) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.store.LockValidatingDirectoryWrapper.createOutput(LockValidatingDirectoryWrapper.java:43) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.ConcurrentMergeScheduler$1.createOutput(ConcurrentMergeScheduler.java:301) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:41) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at io.crate.lucene.codec.CustomLucene90DocValuesConsumer.(CustomLucene90DocValuesConsumer.java:92) ~[crate-server-5.8.2.jar:?]
    crate[4877]: #011at io.crate.lucene.codec.CustomLucene90DocValuesFormat.fieldsConsumer(CustomLucene90DocValuesFormat.java:53) ~[crate-server-5.8.2.jar:?]
    crate[4877]: #011at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:225) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:140) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:180) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:300) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:142) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5293) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4761) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6582) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:660) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]
    crate[4877]: #011at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:88) ~[crate-server-5.8.2.jar:?]
    crate[4877]: #011at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:721) ~[lucene-core-9.11.1.jar:9.11.1 0c087dfdd10e0f6f3f6faecc6af4415e671a9e69 - 2024-06-23 12:31:02]

Any help is appreciated.

Regards,
Amod

Hi,

We have already increased file limit on all nodes to 200000

How did you do this?
Please review ‘ulimit -v’, ‘ulimit -m’ (both should return ‘unlimited’), and ‘sysctl vm.max_map_count’.

Thanks for the reply.

Here are the steps that we followed:

  1. Update /etc/sysctl.conf
    fs.file-max = 500000
    vm.max_map_count=262144

  2. Update /etc/security/limits.conf
    soft nofile 200000
    hard nofile 200000

  3. Reboot the system

  4. I am not sure what exactly is causing “Too many files” error.
    I mean, output of following query is giving 2729
    SELECT process[‘open_file_descriptors’] FROM sys.nodes
    Then how come file limit is reaching even after increasing file limit…

Thanks for any help.

These 2 lines seem to be missing the first column, for instance the crate user or *