Crate Node issue

Vinayak_Katkar · May 16, 2024, 10:58am

Hi Team,

[2024-05-16T01:18:02,119][INFO ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2296923] overhead, spent [399ms] collecting in the last [1s]
[2024-05-16T04:14:46,652][ERROR][i.c.p.p.PostgresWireProtocol] [Node1] Uncaught exception:
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
[2024-05-16T05:30:20,835][WARN ][i.c.e.e.c.s.RamAccountingQueue] [Node1] Memory limit for breaker [jobs_log] was exceeded. Queue [RamAccountingQueue[420d3c03-0751-5f1e-d8b3-67d67fee7f24]] is cleared.
[2024-05-16T05:56:28,424][INFO ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313624] overhead, spent [506ms] collecting in the last [1.4s]
[2024-05-16T05:56:29,426][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313625] overhead, spent [532ms] collecting in the last [1s]
[2024-05-16T05:56:31,426][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313627] overhead, spent [513ms] collecting in the last [1s]
[2024-05-16T05:56:32,747][INFO ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313628] overhead, spent [356ms] collecting in the last [1.3s]
[2024-05-16T05:57:21,756][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313677] overhead, spent [577ms] collecting in the last [1s]
[2024-05-16T05:57:23,728][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][young][2313678][97793] duration [1s], collections [1]/[1.9s], total [1s]/[1.9h], memory [6.8gb]->[6.5gb]/[24gb], all_pools {[young] [352mb]->[0b]/[0b]}{[old] [6.2gb]->[6.5gb]/[24gb]}{[survivor] [304mb]->[64mb]/[0b]}
[2024-05-16T05:57:23,728][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2313678] overhead, spent [1s] collecting in the last [1.9s]
[2024-05-16T06:18:55,637][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][young][2314969][97868] duration [1.2s], collections [1]/[1.4s], total [1.2s]/[1.9h], memory [17.4gb]->[5.2gb]/[24gb], all_pools {[young] [13.8gb]->[0b]/[0b]}{[old] [3.3gb]->[3.4gb]/[24gb]}{[survivor] [280mb]->[1.8gb]/[0b]}
[2024-05-16T06:18:55,637][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2314969] overhead, spent [1.2s] collecting in the last [1.4s]
[2024-05-16T06:18:56,669][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][young][2314970][97869] duration [1s], collections [1]/[1s], total [1s]/[1.9h], memory [5.2gb]->[5.3gb]/[24gb], all_pools {[young] [0b]->[32mb]/[0b]}{[old] [3.4gb]->[5.2gb]/[24gb]}{[survivor] [1.8gb]->[32mb]/[0b]}
[2024-05-16T06:18:56,669][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2314970] overhead, spent [1s] collecting in the last [1s]
[2024-05-16T06:19:00,769][INFO ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2314974] overhead, spent [313ms] collecting in the last [1s]
[2024-05-16T06:19:59,021][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315032] overhead, spent [695ms] collecting in the last [1s]
[2024-05-16T06:20:00,021][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315033] overhead, spent [678ms] collecting in the last [1s]
[2024-05-16T06:24:50,959][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][young][2315323][97898] duration [1.1s], collections [1]/[1.8s], total [1.1s]/[1.9h], memory [17.4gb]->[5.4gb]/[24gb], all_pools {[young] [13.8gb]->[0b]/[0b]}{[old] [3.6gb]->[3.6gb]/[24gb]}{[survivor] [36.3mb]->[1.7gb]/[0b]}
[2024-05-16T06:24:50,960][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315323] overhead, spent [1.1s] collecting in the last [1.8s]
[2024-05-16T06:24:52,092][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][young][2315324][97899] duration [1.1s], collections [1]/[1.1s], total [1.1s]/[1.9h], memory [5.4gb]->[5.6gb]/[24gb], all_pools {[young] [0b]->[160mb]/[0b]}{[old] [3.6gb]->[5.4gb]/[24gb]}{[survivor] [1.7gb]->[32mb]/[0b]}
[2024-05-16T06:24:52,092][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315324] overhead, spent [1.1s] collecting in the last [1.1s]
[2024-05-16T06:25:25,102][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315357] overhead, spent [536ms] collecting in the last [1s]
[2024-05-16T06:25:26,102][WARN ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315358] overhead, spent [533ms] collecting in the last [1s]
[2024-05-16T06:25:32,112][INFO ][o.e.m.j.JvmGcMonitorService] [Node1] [gc][2315364] overhead, spent [385ms] collecting in the last [1s]
[2024-05-16T10:32:12,978][WARN ][i.c.e.e.c.s.RamAccountingQueue] [Node1] Memory limit for breaker [jobs_log] was exceeded. Queue [RamAccountingQueue[6e025a1f-97d6-4421-aee6-6cd908c51dd3]] is cleared.

Got an error when one node was down.
Does that mean there is a memory issue or query issue?
Please suggest

Thanks
Vinayak Katkar

hernanc · May 21, 2024, 12:24pm

Hi,
If you are referring to the Connection reset by peer message this could have happened when the node went down.
If you are instead referring to the Memory limit for breaker [jobs_log] was exceeded message, this is normal and related with the stats.breaker.log.jobs.limit setting, see Cluster-wide settings — CrateDB: Reference

Topic		Replies	Views
Connection reset by peer issue with cratedb cluster CrateDB	1	1199	June 24, 2019
What can be the root cause of a recent crash we encountered in one of our CrateDB nodes? CrateDB	0	778	January 27, 2022
CrateDB nodes constantly crashing CrateDB	2	759	September 30, 2020
I'm trying to set up a 3-node CrateDB cluster and now I can't even connect with crash CrateDB	8	1529	October 4, 2021
I can connect via crash remotely, but I can't access Admin UI via web browser, why? CrateDB	1	1381	October 4, 2021

Crate Node issue

Related topics