CrateDB database logs showing "shard is now inactive", and threads are getting blocked

Satishlk · October 5, 2023, 7:59am

Post upgrading kernel on our ubuntu 20.04 version for Crate 5.4.2 we are seeing below logs .We are using 3 node cluster in aws ec2 with 32 cores and 64GB ram.
We are seeing lot of search CPU threads getting stuck in span of few mins when we are routing traffic to new crate machines.Attaching thread dump as well.For 200 connections to crate cluster is going 100% CPU and not responding to requests.

CrateDB logs

**crate_-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [.partitioned.txn.042j4cpg70][2]shard is now inactive**
**crate_2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [partitioned.txn.042j4cpg70][3]shard is now inactive**
**crate-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [.partitioned.txn.042j4cpg70][4]shard is now inactive**
**crate-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [.partitioned.txn.042j4cpg70][5]shard is now inactive**
**crate-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [.partitioned.txn.042j4cpg70][7]shard is now inactive**
**crate-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [partitioned.txn.042j4cpg70][8]shard is now inactive**
**crate-2023-09-10.log:[2023-09-10T05:11:51,708][DEBUG][o.e.i.s.IndexShard       ] [crate03]  [partitioned.txn.042j4cpg70][9]shard is now inactive**

Thread dump logs

gc1.log.txt (2.7 MB)

Database queries

select schema_name as schema,
          table_name as t,
          id,
          partition_ident as p_i,
          num_docs as docs,
          primary,
          relocating_node as r_n,
          routing_state as r_state,
          state,
          orphan_partition as o_p
        from sys.shards where table_name = 'txn' and id = 1;

+--------+-----+----+------------+---------+---------+------+---------+---------+-------+
| schema | t   | id | p_i        |    docs | primary | r_n  | r_state | state   | o_p   |
+--------+-----+----+------------+---------+---------+------+---------+---------+-------+
| org1 | txn |  1 | 042j2cpg6o |       7 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g64 | 1112462 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g68 | 1078103 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6c | 1259447 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6g | 1120962 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6k |  746601 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6o | 1262062 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6s | 1679234 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g70 | 1740483 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g74 | 1757949 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h60 | 2401460 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h64 | 2103244 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h68 | 2445278 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg64 | 2452899 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg68 | 2390582 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6c | 2768953 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6g | 2707314 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6k | 2811312 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6o | 2817679 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6s | 3116998 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg70 | 3177159 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg74 | 3239813 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh60 | 3781695 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh64 | 3411021 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh68 | 3733551 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg64 | 3688798 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg68 | 3490641 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6c | 3973389 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6g | 3960924 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6k | 3989855 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6o | 3940076 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6s | 4177311 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg70 | 4353220 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg74 | 4037873 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cph60 |  595602 | TRUE    | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j2cpg6o |       7 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g64 | 1112462 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g68 | 1078103 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6c | 1259447 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6g | 1120962 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6k |  746601 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6o | 1262062 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6s | 1679234 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g70 | 1740483 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g74 | 1757949 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h60 | 2401460 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h64 | 2103244 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h68 | 2445278 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg64 | 2452899 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg68 | 2390582 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6c | 2768953 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6g | 2707314 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6k | 2811312 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6o | 2817679 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6s | 3116998 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg70 | 3177159 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg74 | 3239813 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh60 | 3781695 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh64 | 3411021 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh68 | 3733551 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg64 | 3688798 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg68 | 3490641 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6c | 3973389 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6g | 3960924 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6k | 3989855 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6o | 3940076 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6s | 4177311 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg70 | 4353220 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg74 | 4037873 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cph60 |  595607 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j2cpg6o |       7 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g64 | 1112462 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g68 | 1078103 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6c | 1259447 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6g | 1120962 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6k |  746601 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6o | 1262062 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g6s | 1679234 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g70 | 1740483 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9g74 | 1757949 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h60 | 2401460 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h64 | 2103244 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4c9h68 | 2445278 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg64 | 2452899 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg68 | 2390582 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6c | 2768953 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6g | 2707314 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6k | 2811312 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6o | 2817679 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg6s | 3116998 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg70 | 3177159 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chg74 | 3239813 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh60 | 3781695 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh64 | 3411021 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4chh68 | 3733551 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg64 | 3688798 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg68 | 3490641 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6c | 3973389 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6g | 3960924 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6k | 3989855 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6o | 3940076 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg6s | 4177311 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg70 | 4353220 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cpg74 | 4037873 | FALSE   | NULL | STARTED | STARTED | FALSE |
| org1 | txn |  1 | 042j4cph60 |  595607 | FALSE   | NULL | STARTED | STARTED | FALSE |
+--
SELECT 105 rows in set (0.007 sec)

hammerhead · October 5, 2023, 12:43pm

Hi @Satishlk,

the “shard is now inactive” message is printed when there haven’t been any newly written records for specific shards for a certain time. The message itself is not problematic and can occur during regular operations:

github.com

crate/crate/blob/fd4567977cf2d8be22902e2b7679b2d51c1f9181/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java#L1705


      
          
          /**
           * Called by {@link IndexingMemoryController} to check whether more than {@code inactiveTimeNS} has passed since the last
           * indexing operation, and notify listeners that we are now inactive so e.g. sync'd flush can happen.
           */
          public void checkIdle(long inactiveTimeNS) {
              Engine engineOrNull = getEngineOrNull();
              if (engineOrNull != null && System.nanoTime() - engineOrNull.getLastWriteNanos() >= inactiveTimeNS) {
                  boolean wasActive = active.getAndSet(false);
                  if (wasActive) {
                      logger.debug("shard is now inactive");
                      try {
                          indexEventListener.onShardInactive(this);
                      } catch (Exception e) {
                          logger.warn("failed to notify index event listener", e);
                      }
                  }
              }
          }
          
          public boolean isActive() {

Whenever there are any performance issues, it is advisable to check metrics accessible through the Crate JMX Exporter as described below:

From our experience, 64 GB RAM sounds relatively low in relation to the 32 vCPU cores. The JMX Exporter gives access to metrics that help identify bottlenecks. For example, GC rates and heap usage (circuit breaker utilization) can indicate if the cluster is running low on the available heap.

Therefore, if you haven’t done so already, I recommend getting the JMX Exporter set up to gain access to application-level performance metrics.

Satishlk · October 5, 2023, 1:02pm

Hi @hammerhead

We have already enabled monitoring on crate cluster not seeing any issues in terms of logs,GC ,heap and disk read writes.Attaching the same for your reference.All threads related to search,get,write is getting blocked and attaching thread dump for your reference

Thread dump

All threads blocking on same monitor for search

stackTrace:
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@**20.0.2/Native Method****)**
- parking to wait for **<0x00007f842e772e80>** (a java.util.concurrent.LinkedTransferQueue)
at java.util.concurrent.locks.LockSupport.park(java.base@**20.0.2/LockSupport.java:371****)**
at java.util.concurrent.LinkedTransferQueue$Node.block(java.base@**20.0.2/LinkedTransferQueue.java:470****)**
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@**20.0.2/ForkJoinPool.java:3780****)**
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@**20.0.2/ForkJoinPool.java:3725****)**
at java.util.concurrent.LinkedTransferQueue.awaitMatch(java.base@**20.0.2/LinkedTransferQueue.java:669****)**
at java.util.concurrent.LinkedTransferQueue.xfer(java.base@**20.0.2/LinkedTransferQueue.java:616****)**
at java.util.concurrent.LinkedTransferQueue.take(java.base@**20.0.2/LinkedTransferQueue.java:1286****)**
at org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:165)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@**20.0.2/ThreadPoolExecutor.java:1070****)**
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@**20.0.2/ThreadPoolExecutor.java:1130****)**
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@**20.0.2/ThreadPoolExecutor.java:642****)**
at java.lang.Thread.runWith(java.base@**20.0.2/Thread.java:1636****)**
at java.lang.Thread.run(java.base@**20.0.2/Thread.java:1623****)**

proddata · October 5, 2023, 1:43pm

Could you share the table schema and potentially typical queries?
Is there anything else running on the VMs?
Are there any stuck queries (sys.jobs table)?

The cpu / load metrics don’t really make a lot of sense with the rest of the metrics.
The logs you shared are completely normal

Satishlk · October 5, 2023, 2:24pm

Below is the table structure

CREATE TABLE IF NOT EXISTS "transat" (           
    "partition_date" TEXT NOT NULL,  
     99 columns apart from above,
      PRIMARY KEY ("id", "partition_date")               
 )                                                     
 CLUSTERED BY ("id") INTO 10 SHARDS                    
 PARTITIONED BY ("partition_date")                     
 WITH (                                                
    "allocation.max_retries" = 5,                      
    "blocks.metadata" = false,                         
    "blocks.read" = false,                             
    "blocks.read_only" = false,                        
    "blocks.read_only_allow_delete" = false,           
    "blocks.write" = false,                            
    codec = 'default',                                 
    column_policy = 'dynamic',                         
    "mapping.total_fields.limit" = 1000,               
    max_ngram_diff = 1,                                
    max_shingle_diff = 3,                              
    number_of_replicas = '2',                          
    refresh_interval = 1000,                           
    "routing.allocation.enable" = 'all',               
    "routing.allocation.total_shards_per_node" = -1,   
    "store.type" = 'fs',                               
    "translog.durability" = 'REQUEST',                 
    "translog.flush_threshold_size" = 536870912,       
    "translog.sync_interval" = 5000,                   
    "unassigned.node_left.delayed_timeout" = 60000,    
    "write.wait_for_active_shards" = 'ALL'             
 );

Queries

We are getting around 1-1.5k requests per minute on below queries.

SELECT
  COUNT(*) AS totalCount,
  SUM("amount") AS totalAmount,
  column1,
  column2,
  DATE_FORMAT('%Y-%m-%d','Asia/Calcutta', colum_date) AS date
FROM
  Txn
WHERE
  column1 = 'ABC' AND
  column2 = '1823674000' AND  
  column3 IN ( list of values ) AND
  id > '230801183000000' AND partition_date >= '2308' AND
  id < '230803183000000' AND partition_date <= '2308'
GROUP BY
  date, column1, column2;

There is nothing running on the machines apart from crate.

When I started looking into thread dump all search,get,refresh,write started blocking on same monitor.I was able to see all CPU intensive threads are search threads.

hernanc · October 9, 2023, 2:53pm

Hi,
Thank you for sharing those details.

Is the Txn table in the query the same as the transat table in the CREATE TABLE statement or there is a VIEW in between?

I see partition_date is defined as a string, but the query does a range comparison, is that on purpose or you intended to have numeric semantics around that comparison?

Could you try if this version of the query performs better?

WITH thedata
AS (
	SELECT COUNT(*) AS totalCount
		,SUM("amount") AS totalAmount
		,column1
		,column2
		,colum_date
	FROM "transat"
	WHERE column1 = 'ABC'
		AND column2 = '1823674000'
		AND column3 IN (  list of values )
		AND id > '230801183000000'
		AND id < '230803183000000'
		AND partition_date BETWEEN '2308' AND '2308'				
	GROUP BY colum_date
		,column1
		,column2
	)
SELECT totalCount
	,totalAmount
	,column1
	,column2
	,DATE_FORMAT('%Y-%m-%d', 'Asia/Calcutta', colum_date) AS DATE
FROM thedata;

Satishlk · October 10, 2023, 6:33pm

Yes both tables are same ,there is typo.

We are storing partition date based on month and year like 2310 in column.

I am not seeing much difference between both queries it is same in terms of performance

We are seeing the same crate instance started to work normally without any issues after we stopped sending our api traffic for couple of days. Not sure what is issue ? This has happend 2 times in last one month we are not able to figure anything in logs except CPU getting choked in few mins when had an issue and all get, search, insert are getting stuck on same thread as shared earlier.

Can you suggest something we can check to troubleshoot and understand what exactly the issue is ? We have upgraded from 5.3.2 to 5.4.2 in last month, post which we were seeing above issues. We are able to test the same query load on our performance environment on same version with 3 times the production load without any reduction in request per second and minimal response time impact. Only difference between both environments being on prod we have 1TB storage .

hernanc · October 11, 2023, 1:01pm

We are storing partition date based on month and year like 2310 in column

In that case range comparisons will work fine, but please note that ‘2310’ as a string will take around double the space that a SMALLINT column, and ‘230801183000000’ as a string can also take around double the space than a timestamp column, over millions of rows this adds up and can result on a lot of additional load on disks, CPU, and memory

amotl · October 11, 2023, 8:49pm

Dear Satish,

Satishlk:

All threads blocking on same monitor for search

stackTrace:
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@**20.0.2/Native Method****)**
- parking to wait for **<0x00007f842e772e80>** (a java.util.concurrent.LinkedTransferQueue)
at java.util.concurrent.locks.LockSupport.park(java.base@**20.0.2/LockSupport.java:371****)**
at java.util.concurrent.LinkedTransferQueue$Node.block(java.base@**20.0.2/Link
[...]

May I ask if it started to happen after the kernel upgrade (which versions?), or after upgrading from CrateDB 5.3.2 to 5.4.2, or did both things happen at the same time?

@hernanc is correct with his assessment on the data model, but still, if this is a regression, the database team may want to know about it. On the other hand, there may already by a fix for it on the development branch?

With kind regards,
Andreas.

/cc @matriv, @Michael_Kleen

Satishlk · October 12, 2023, 10:57am

@amotl They are 2 separate events , in 1st week of september we have done upgrade from 5.3.2 to 5.4.2 and Kernel upgrade happened on 29th sept.

amotl · October 12, 2023, 11:01am

Thank you for clarifying. So, the issue you are observing may have been introduced by the upgrade to CrateDB 5.4.2, and the Kernel upgrade is completely unrelated?

Satishlk · October 12, 2023, 12:41pm

@amotl Looks like unrelated ,but I am sharing logs which I got after upgrading kernel from syslogs after checking with infra team. But when CPU was going 100% we were not seeing same logs

Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,597][WARN ][o.e.t.TransportService   ] [crate03] Transport response handler not found of id [3639787]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,825][INFO ][o.e.i.c.IndicesClusterStateService] [crate03] [abc..partitioned.txn.042j4c9g74][0] global checkpoint sync failed
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: org.elasticsearch.node.NodeClosedException: node closed {crate03}{mi9WL0vkS3K02Fcqew4SAw}{5mDMTT8DTfCA5AfHBmdDCg}{10.207.97.170}{10.207.97.170:4300}{dm}{http_address=10.207.97.170:4200}
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:815) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:235) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.service.ClusterApplierService.addTimeoutListener(ClusterApplierService.java:269) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:177) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:123) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:115) ~[crate-server.jar:?]
--
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:829) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.lang.Thread.run(Thread.java:1623) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,826][INFO ][o.e.i.c.IndicesClusterStateService] [crate03] [abc..partitioned.txn.042j4c9g74][2] global checkpoint sync failed
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: org.elasticsearch.node.NodeClosedException: node closed {crate03}{mi9WL0vkS3K02Fcqew4SAw}{5mDMTT8DTfCA5AfHBmdDCg}{10.207.97.170}{10.207.97.170:4300}{dm}{http_address=10.207.97.170:4200}
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:815) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:235) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.service.ClusterApplierService.addTimeoutListener(ClusterApplierService.java:269) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:177) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:123) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:115) ~[crate-server.jar:?]
--
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:829) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.lang.Thread.run(Thread.java:1623) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,826][INFO ][o.e.i.c.IndicesClusterStateService] [crate03] [abc..partitioned.txn.042j4c9g74][5] global checkpoint sync failed
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: org.elasticsearch.node.NodeClosedException: node closed {crate03}{mi9WL0vkS3K02Fcqew4SAw}{5mDMTT8DTfCA5AfHBmdDCg}{10.207.97.170}{10.207.97.170:4300}{dm}{http_address=10.207.97.170:4200}
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:815) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:235) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.service.ClusterApplierService.addTimeoutListener(ClusterApplierService.java:269) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:177) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:123) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:115) ~[crate-server.jar:?]
--
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:829) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.lang.Thread.run(Thread.java:1623) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,826][INFO ][o.e.i.c.IndicesClusterStateService] [crate03] [abc..partitioned.txn.042j4c9g74][6] global checkpoint sync failed
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: org.elasticsearch.node.NodeClosedException: node closed {crate03}{mi9WL0vkS3K02Fcqew4SAw}{5mDMTT8DTfCA5AfHBmdDCg}{10.207.97.170}{10.207.97.170:4300}{dm}{http_address=10.207.97.170:4200}
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:815) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:235) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.service.ClusterApplierService.addTimeoutListener(ClusterApplierService.java:269) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:177) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:123) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:115) ~[crate-server.jar:?]
--
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:829) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:143) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at java.lang.Thread.run(Thread.java:1623) ~[?:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: [2023-09-29T07:09:47,826][INFO ][o.e.i.c.IndicesClusterStateService] [crate03] [abc..partitioned.txn.042j4c9g74][7] global checkpoint sync failed
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: org.elasticsearch.node.NodeClosedException: node closed {crate03}{mi9WL0vkS3K02Fcqew4SAw}{5mDMTT8DTfCA5AfHBmdDCg}{10.207.97.170}{10.207.97.170:4300}{dm}{http_address=10.207.97.170:4200}
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:815) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:235) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.service.ClusterApplierService.addTimeoutListener(ClusterApplierService.java:269) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:177) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:123) ~[crate-server.jar:?]
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:115) ~[crate-server.jar:?]
--
Sep 29 07:09:47 ip-10-207-97-170 crate[2259]: #011at org.elasticsearch.index.IndexService$AsyncGlobalCheckpointTask.runInternal(IndexService.java:829) ~[crate-server.jar:?]

Satishlk · October 23, 2023, 9:15am

@amotl Any update on above issue

matriv · October 24, 2023, 7:45am

Since you’re using a custom column (id) to cluster the table into shards, do you maybe have an imbalance in your cluster, can you post the result of this statement?

select * from sys.shards where table_name='transat';

How many are the total shards in your system?

select count(*), node from sys.shards group by node order by 1;

Satishlk · October 25, 2023, 5:49am

shards.txt (1.2 MB)

+----------+-----------------------------------------------------+
| count(*) | node                                                |
+----------+-----------------------------------------------------+
|      356 | {"id": "mi9WL0vkS3K02Fcqew4SAw", "name": "crate03"} |
|      356 | {"id": "orw-9VzTSSe9w5l2oLFddg", "name": "crate02"} |
|      356 | {"id": "MmiDa45CRCGKWMGfWLX2OQ", "name": "crate01"} |
+----------+-----------------------------------------------------+

matriv · October 25, 2023, 7:41am

From what I see you have an imbalance in your shards,
there are shards with 5,6,7,8 documents, there are others with ~1M, others with ~2M, ~3.7M, and others with >4M documents, I think that you shouldn’t have used a custom clustering scheme based on id, but leave the clause out and rely on the internal _id to get a nice balance across the shards.

Then, from your schema I see that you have configured 10 shards with 2 replicas, so that’s 30 shards in total, but from the query on sys.shards I see over 1000 shards for the table, how come? did you change the sharding scheme with ALTER TABLE after the initial creation?

Regarding the id column, I see you’re doing range comparisons on it, and it seems that it holds numeric values, but you have defined it as text (or string), this also won’t help, in terms of performance, with your queries where you filter out based on the id using, basically numeric comparisons.

Do you see excessive GC runs on your nodes?

Satishlk · October 25, 2023, 6:29pm

We have partition based on months ,we have not done any alteration after creation.By default it took 1000 shards per node.

Regarding id column it is string not numeric ,sample value for reference 231025094322694E180110056

GC looks fine on our nodes I have attached screenshot on top.

Topic		Replies	Views
CrateDB 5.4: Write thread pool getting stuck after upgrade from 5.3.2 CrateDB	4	511	October 5, 2023
What can be the root cause of a recent crash we encountered in one of our CrateDB nodes? CrateDB	0	778	January 27, 2022
3rd node of 3 Node cluster always syncing complete data CrateDB	6	461	October 18, 2022
How to handle "There are still active requests on this node, delaying graceful shutdown"? CrateDB	9	1034	December 23, 2021
Limited performance during query CrateDB	15	1176	May 26, 2021

CrateDB database logs showing "shard is now inactive", and threads are getting blocked

CrateDB logs

Thread dump logs

Database queries

Queries

Related topics