I am using a production crate 5.2.8 cluster of 3 nodes (on K8S) and I am having trouble every time “automatic analyze” launches. I have set up the parameter ’ stats.service.max_bytes_per_sec’ to 10mb but it is still consuming all the bandwidth and the cluster becomes completely unresponsive, even accessing the HTTP GUI is impossible because the cluster does not respond to any query. The only way to recover from this situation without waiting for hours is to restart the pods.
The parameter is set using the following SQL query:
SET GLOBAL PERSISTENT 'stats.service.max_bytes_per_sec' = '10mb'
I can see the following log in all 3 nodes:
[INFO ][o.e.c.s.ClusterSettings ] [crate-0.crate-service.crate.svc.cluster.local] updating [stats.service.max_bytes_per_sec] from [40mb] to [10mb]
Also tried setting the value in sql to 10M, 10, 10MB… Several combinations without success
I am monitoring the cluster using prometheus and the disk read rate increases to 60MB/s which is the maximum configured for the PVC.
I have also done some testing in a development crate 5.4.0 and 5.4.2 (just upgraded now) of 1 node and despite the setting is set to 5mb it is consuming all the PVC bandwidth (in this case is 38.4 MB/s)
How can I enforce max bandwidth? For the moment I have disabled ’ stats.service.interval’ = 0
After conducting further testing, I have come to the conclusion that I had misunderstood the documentation. The documentation mentions ‘max_bytes_per_sec’, but it only applies to network bandwidth and not hard disk bandwidth.
The problem is that during ANALYZE other queries are delayed for even several seconds and some processes start to fail.
Is there anything I can do to reduce the IO usage during ANALYZE?
It would be nice to have a configuration option similar to the network option but for hard disk IO.
The setting tracks basically I/O, calculating bytes per table row read: Add a setting allowing to throttle ANALYZE by BaurzhanSakhariev · Pull Request #13046 · crate/crate · GitHub
So I’m wondering why the setting is not applied. Have you tried setting this one every nodes, yaml config file and restart the cluster?
The first tests I did were on a 3 node cluster and I can see on three node logs
updating [stats.service.max_bytes_per_sec] from [40mb] to [10mb]
The latest test I have done are with a one node cluster. I have also tried using the value in bytes instead os ‘mb’ units and set it in yaml config file. Nothing seems to work.
I’d like then to ask you to file an issue in CrateDB github repo, describing your setup, your tables, size of them, size of shards, layout of shards, etc.
I think there is already a issue in github #15072 with the same problem.