CrateDB 6.1.3 Upgrade Causing CPU Spikes on GCP

Upgrade Procedure:
5.10.14 → 6.0.5 → 6.1.3

Context:
I am also aware of the bug in 6.1.x releases that could lead to table corruption, but this issue does not appear to be related to table corruption in my case.

Issue Observed:
I have CrateDB deployed across multiple environments: on-premises, AWS, GCP, etc. I upgraded all environments to 6.1.3.

  • Symptom: Only in the GCP deployment, I notice high CPU usage even under normal load.
  • Impact: When concurrent queries increase, the CrateDB UI becomes unresponsive, and queries take significantly longer to execute.
  • Other Environments: On-prem and AWS clusters with similar data volumes and query patterns do not show these issues.

Troubleshooting Steps Taken:
According to the CrateDB 6.0.0 release notes, Lucene 10.2 opens files with MADV_RANDOM by default on Linux and macOS. If you experience increased IOPS or degraded performance, it suggests setting:

CRATE_JAVA_OPTS=-Dorg.apache.lucene.store.defaultReadAdvice=NORMAL

However, I have noticed that the JVM options already include:

CRATE_JAVA_OPTS=-Dorg.apache.lucene.store.defaultReadAdvice=NORMAL

Environment Details:

  • RPM-based CrateDB package is used.
  • Data volume and query patterns are similar across all environments.

Questions / Observations:

  • Why is this high CPU and UI unresponsiveness happening only on GCP after the upgrade to 6.1.3?
  • What could be the underlying cause?
  • Are there any GCP-specific configurations (e.g., VM type, storage performance, networking, JVM, or OS-level differences) that could trigger this behavior?
  • What further steps should I take to diagnose or mitigate this issue?
  • Will upgrading to a newer version (e.g., 6.1.4 or later) likely resolve this issue ?