Insert and query python tuples with python client

I attempted to insert a deeply nested machine learning model config dictionary in a crate dynamic object column - and the insert worked as expected.
All objects, arrays and sub-objects were created. However, there was one object, which contained python tuples. During insert, this tuple-object was manifested in the db as floating point array (which is at least somewhat correct, as the tuples are floats in an array).

Now, when attempting to query this object, one of the following errors is returned.

ClassCastException[class java.util.ArrayList cannot be cast to class java.lang.Double (java.util.ArrayList and java.lang.Double are in module java.base of loader 'bootstrap')]
SQLParseException[Cannot cast object element `calibrator` with value `{max_score=10.0, abs_score=true, name=AnomScoreCalibrator, anchors=[[0.0, 0.0], [0.6264311475091373, 0.5], [0.7304250954819491, 1.0], [0.8414136870266378, 1.5], [0.998546242561757, 2.0], [3.3182936305345176, 4.038226711413138], [6.636587261069035, 5.038226711413138]]}` to type `object`]

(More on when specifically which error occurs later)

Looking on the above errors, is there a way I can still retrieve the successfully inserted data, by eg. utilizing some casts?

For reproduction purposes, I attached a serialized dictionary, which contains the object in question. The behavior can be reproduced as follows:

  1. Create a table model_config as follows
create table model_config (
        timestamp timestamp default now(),
        digest text, -- this is the link to the model blog
        run_name text,
        config object(dynamic));
  1. Load the attached file and insert it
import json
test = json.loads(open('text.txt').read())

cursor = conn.cursor()
cursor.execute("insert into model_config (digest, run_name, config) values (?, ?, ?)", ("23423423", "random_run_name", test))
  1. Query the inserted data as follows
    (The problematic object is the model->models->calibrator->anchors array)
select config['model']['models']['calibrator']['anchors'] from model_config limit 1;

(This gives the above-mentioned ClassCastException).

  1. Alternatively, if you query a parent object of the problem array, a different error is provided
select config['model']['models'] from model_config limit 1;

(This returns the above-mentioned SQLParseException exception)

And last but not least, any other object - even inside the ‘calibrator’ - object can successfully be parsed. Eg.

select config['model']['models']['calibrator']['abs_score'] from model_config limit 1;

correctly returns the expected values.

As for my use case: Storing ML model configs is essential for ML experiment tracking. As the data for the ML training itself are already present in cratedb, having the model config in crate would get rid of the need of having a separate database just for storing these configs.

Model config: model.txt (3.6 KB)

1 Like

The problem is probably related to the nested array in the calibrator in combination with OBJECT(DYNAMIC). The dynamic OBJECT type tries to dynamically map a data type to every (sub)column. CrateDB 5.4 does not yet support nested arrays for storage (potentially with 5.5 → CrateDB 5.5 · GitHub).

An alternative might be to use OBJECT(IGNORED) or specify a object schema beforehand.

1 Like