Unexpected Behavior with Dynamic OBJECT Columns in Large Tables (CrateDB 6.0.3)

I’m running a CrateDB cluster (version 6.0.3) with 4 nodes, hosting several tables, some of which are 25–200 GB in size. The core CREATE TABLE statement for one of these tables is as follows:

CREATE TABLE IF NOT EXISTS "doc"."foobar" (
   "ts" TIMESTAMP WITHOUT TIME ZONE DEFAULT current_timestamp(3) NOT NULL,
   "measurement" TEXT NOT NULL,
   "tags" OBJECT(DYNAMIC),
   "fields" OBJECT(DYNAMIC),
   "partition_field" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('month', "ts")
)
CLUSTERED INTO 4 SHARDS
PARTITIONED BY ("partition_field")

Over time, new fields are added to the tags and fields objects via the HTTP API, primarily using bulk requests. However, I’ve encountered three unusual and non-deterministic issues:


1. Random Key Generation in OBJECT Columns

Occasionally, new keys are spontaneously created in the tags or fields objects. For example, if keys like "host" and "port" exist, a new key like "hort" might appear. The value for these keys is either derived from an existing key or empty. This behavior is inconsistent and cannot be reproduced intentionally.


2. Cross-Contamination Between tags and fields Objects

Sometimes, values intended for one object (e.g., tags) end up in the other (fields), or vice versa. This can also result in the key-generation issue described above.


3. Systematic Value Corruption for Specific Keys

The most puzzling issue involves the tags['host'] key, which typically contains an IP address (e.g., 10.2.251.112). After some operations, the value is altered in a specific pattern:

  • Example: 10.2.251.112 becomes 10.51.11.
  • The digit "2" is consistently removed from the value. If a "2" appears alone between two dots (e.g., 10.2.11), the entire column is dropped, including the dot.

This behavior is always tied to the digit "2" and occurs only in large tables (>20 GB). Smaller tables do not exhibit these issues.


Insert Query

The bulk-insert query is straightforward:

INSERT INTO "doc"."foobar" ("ts", "measurement", "tags", "fields") VALUES (?, ?, ?, ?)

Observations and Questions

  • The issues occur exclusively in large tables (>20 GB).
  • A PHP API client is used for all operations.
  • The insert logic is unlikely to be the root cause, as 99% of operations succeed without issues.

Has anyone encountered similar behavior? Are there known issues with dynamic OBJECT columns in CrateDB 6.0.3, particularly with large tables? Could this be related to partitioning, sharding, or bulk operations?

Any insights or suggestions would be greatly appreciated!

1 Like

I’m not aware of any known issue like you describe.

Besides the table definition you shared, can you confirm that you didn’t modify any table option default?

The hosthort is kind of suspicious since the unicodes are sequential: s 144 and r 145. Could you verify that your app logic cannot in any way create a hort key? e.g. if this data comes from logs, can you grep for the value hort?

If you can I’d recommend to log your insert statements before sending them to CrateDB, that way we can verify it’s not a client side issue.

1 Like

Thank you for your response. I will now log all raw queries and then we’ll see if the issue is on the client side. The example with “hort” is really just one case among various other variations. I’ll get back to you as soon as I find something. Thanks for working with me to investigate this bug.
Best regards,
Volker

1 Like

Additionally, I created a small test to try to reproduce this:

  • 3 x 6.0.3 nodes
  • Your table definition
  • This ingestion script:
import random
import uuid

import requests

query = """
        INSERT INTO "doc"."foobar" ("measurement", "tags", "fields")
        VALUES (?, ?, ?)
"""

def generate_data(n: int):
    data = []
    host = {'host': '10.2.251.112'}
    for _ in range(n):
        if random.randint(0, 1):
            host['port'] = str(random.randint(0, 1000))
        data.append((str(uuid.uuid4()), host, {'some_field': '234'}))
    return data


def send_bulk_data(query, data):
    response = requests.post('http://192.168.88.251:4200/_sql',
                             json={'bulk_args': data, 'stmt': query})
    return response.text

c = 0
n = 50_000

import time

now = time.time()
while True:
    try:
        send_bulk_data(query, generate_data(n))
        c+=n
    except KeyboardInterrupt:
        print(f'After {time.time() - now}s sent {c} rows')
        break

It’s now sitting at 200M records, ~24GiB and I don’t see any schema drift nor the host value being different.

SHOW CREATE TABLE FOOBAR

CREATE TABLE IF NOT EXISTS "doc"."foobar" (
   "ts" TIMESTAMP WITHOUT TIME ZONE DEFAULT current_timestamp(3) NOT NULL,
   "measurement" TEXT NOT NULL,
   "tags" OBJECT(DYNAMIC) AS (
      "host" TEXT,
      "port" TEXT
   ),
   "fields" OBJECT(DYNAMIC) AS (
      "some_field" TEXT
   ),
   "partition_field" TIMESTAMP WITHOUT TIME ZONE GENERATED ALWAYS AS date_trunc('month', "ts")
)
CLUSTERED INTO 4 SHARDS
PARTITIONED BY ("partition_field")
WITH (
   column_policy = 'strict',
   number_of_replicas = '0-1'
)

Is there anything you’d like me to change about the test I’m running to fit more your case?

1 Like

Additionally, if you find any other schema drift (new columns you were not expecting), please share them.

On another note, to help debug if it’s the client or the database you could also try to make object tags/fields NOT dynamic, so any new column will be an error that you can log. This might not be feasible if you are in production. Alternatively you could create a new table with the same schema but the objects being strict, point your ingestion code to that new table, and later if needed reconcile the data by ingesting from the new table to the old one. But please only do this if you are 100% sure in what you are doing, otherwise logging on the client side is likely the safest option.

1 Like

Hi,

That’s very interesting. Your observations feel like some hallucinations would be involved, or possibly just a faulty memory? Are you using ECC memory in your server to avoid soft errors like single-event upsets?

With kind regards,
Andreas.

The crateDB nodes (count 4) are VMs (Proxmox, KVM) and running on a Server with ECC RAM.

1 Like

That’s good to hear, thank you for sharing this information. I hope we can figure out the problem on your end, based on logging data like @surister suggested, and possibly the demo program he shared above.

Wow, this behaviour certainly has the potential to play in a similar league like the famous 500 miles email story.

https://web.mit.edu/jemorris/humor/500-miles

Disclaimer: Please don’t receive my messages too snarky, we are certainly reading your observations seriously. However, please understand your report is truly unique so far, so I am trying to apply my personal share of pattern matching to get a better understanding, which may occasionally also include a share of humor. It is certainly meant to be received in a spirit that you are not alone with such observations on other systems.

Seconding Iván, I’d like to confirm we have never heard about such an issue with CrateDB, not even remotely. While every software system occasionally includes bugs, we never observed any data corruption issues like those you are describing. We are curious to learn what might be the root cause for the problem you are observing.

Do you see any possibility to share more insights into fragments of the client-side software stack you are running, or can you come up with a solid reproducer, possibly based on Iván’s demo program?

this is my crateDB php implementation for IoT stuff: GitHub - skydiablo/ReactCrate

maybe you can see any issue?

1 Like

I’m not very good at PHP but what I understand is that whatever gets into a Measurement or MeasurementBulk is what ends up arriving at CrateDB, meaning that fields and tags just wrap up whatever dynamic object they get from the clients, so a faulty device could be sending bad data.

I’ve worked in the past with IoT devices and we had instances where a faulty device would send us into a downward spiral making us think that the backend/database didn’t work as expected. In another instance it was a development device in the lab that shouldn’t have been connected to the production backend.

In any case, I doubt this is related to CrateDB, we have tons of users with similar and bigger clusters/data sizes and so far we haven’t got a remotely similar report.

I assume this is your situation:

d1  --->
d2  --->    PHP_BACKEND ---> CrateDB cluster 
dn  --->

Assuming it’s a faulty device: What I would do is implement proper validation in your backend, at first it could be something quick and simple like sanity checking keys in fields and tags: if any list of known keys should go to tags but appear on fields, reject and log the request for example. With the objective of finding who’s at fault.

Let me know if any of this makes sense in your use case.

One bad payload can change the shape (object → scalar, null, etc.), and bulk inserts won’t say anything. You only notice when queries act strangely later on when you have big tables or partitions.

look for:

different types under the same object path
one bad IoT device sending junk
no validation on the backend

Try putting data back into a STRICT OBJECT column or running typeof() checks across partitions. It was ingestion, not CrateDB, every time I saw this.

But just let the data types remain as inconsistent as they want—at worst, the INSERT would fail. However, keys in the object shouldn’t change or be “made up” in the process, right?

Initial checks for schema drift have already confirmed that the data is being sent correctly to the database. But specifically, data that often shows errors (like tags['host']) comes from a config value that isn’t influenced by the IoT device.

Dear Volker,

even if it’s late, I’d like to wish you a happy new year.

You’ve already shared further outcomes after conducting network dumps in a private conversation. Thank you so much. To make progress also visible to others, I have three more questions / requests:

  • a) Could you also share your recent outcomes within this thread, so we can document a linear history of the debugging process?

  • b) Did you observe the weird behaviors already on CrateDB versions prior to 6.x?

  • c) Could you elaborate a bit how your storage system looks like, and how CrateDB is connected to it? I can remember you talking about a SAN solution based on a Ceph storage cluster, so we would like to learn more details about it to get a full picture. [1]

With kind regards,
Andreas.


  1. Because the fault tolerance details are usually managed by CrateDB itself in typical setups, by creating data redundancy through CrateDB’s replication features, I can’t tell anything about whether running CrateDB on top of Ceph is a good idea, or even contradictory. While I can’t imagine why it would be so detrimental to cause the schema drift, we may certainly take it into consideration. ↩︎

Moin,
sorry for the long delay, things happen :wink: so, after the problems are still increase and some other strange things are happen too with this 4 node cluster, i have decided that i create a complete new cluster (also 4 nodes) and start with new fresh database. no import from old data.
i will give a report after a while, so was it just an broken cluster or maybe something else?!

regards, volker.

Thank you Volker. Good luck, and please let us know about how it goes with the new cluster.

Some other input is this: the old cluster is running with round about 200MegaBytes/s read pressure per node, but now with the same load on inserts and queries there is only a bit on HDD load. here here are some pictures:

old stack node:



new stack node:



for sure, the tables are starting from zero but with growing the tables, the HDD load are so greedy increase?

another strange is, from the old stack, one of the four nodes are really relaxed with HDD load, RAM and CPU are same as the other nodes:

maybe the stack was just broken?

1 Like

a new approach is in this direction: Potential Issue with Simultaneous Requests Using a Single HTTP Client Instance · reactphp · Discussion #615 · GitHub

1 Like

Thanks for the update. When it comes to asynchronous driver support for PHP, please also consider leaving the HTTP protocol behind, and check out a PostgreSQL-based connectivity option like amphp/postgres.

Otherwise, if you think asynchronous support in PHP might still have flaws in their underlying software components, maybe dial back to synchronous execution?

In general, we would also be interested if your use case could possibly be supported by standard software components. The data source of your system is mostly SNMP, if I recall correctly?