Using CrateDB Toolkit to sync MongoDB with CrateDB

Hello, I want to sync DB from MongoDB with CrateDB , but it’s error:

<class ‘dlt.destinations.exceptions.DatabaseTransientException’>
“\_id” conflicts with system column pattern
InvalidColumnNameException["_id" conflicts with system column pattern]

How I can fix this ? I tried to write tikray file but my data is jsonl and it’s not work :frowning:

1 Like

I run with command:

ctk load table  “mongodb://user:pass@db_host/collection?table=collection.table&authSource=admin&readPreference=secondary”

Both source and destination were connected successfully.

1 Like

Hi Võ Thiện,

thank you for writing in.

The dlt-cratedb adapter is still in its infancy, where this problem is originating from, accompanying CrateDB’s obstacle SQL DDL error `InvalidColumnNameException["_foo" conflicts with system column pattern]` · Issue #15161 · crate/crate · GitHub. We created two issues [1] and [2] to track improving this.

So, we see three options going forward, which don’t necessarily exclude each other:

  • Fix the flaw you’ve reported within dlt/ingestr, to also make others happy. :100:

  • Use the existing MongoDB data loader in CrateDB Toolkit, which isn’t using dlt or ingestr. => Edit: This also needs to be fixed, see CTK-499 and below. :see_no_evil_monkey:

  • Use the MongoDB integration for CrateDB Cloud.
    => That is the most comfortable variant which has resolved all the basic issues well. :sparkles:

Will one of the options enumerated above help you? Which would be the preferred one? Can you use the feature in CrateDB Cloud, or are you looking for one of the standalone options?

With kind regards,
Andreas.

[1] MongoDB: `dlt.destinations.exceptions.DatabaseTransientException: "_id" conflicts with system column pattern` · Issue #498 · crate/cratedb-toolkit · GitHub
[2] Problem importing from MongoDB · Issue #19 · crate/dlt-cratedb · GitHub

Guidance

You certainly haven’t been wrong to also look at Tikray, because that package provides all sorts of transformations to the machinery in CrateDB Toolkit. However–this is mostly internal knowledge yet–it currently only works with the more native data nozzle subsystems in CTK, but not with the new ingestr-based subsystem just yet, afair.

However, don’t fret: CTK already includes built-in compensation code for CrateDB’s GH-15161 obstacle within its MongoDB subsystem at this spot:

Another problem

However, based on your report, we discovered that the native MongoDB I/O subsystem of CTK has been disconnected by adding the one based on ingestr.

MongoDB: New ingestr-based I/O subsystem masks native adapters · Issue #499 · crate/cratedb-toolkit · GitHub

In this spirit, it would be advisable to tackle this problem first, in order to unlock functionality that already exists [1] to provide instant improvements on your end.

NB: What’s your user handle on GitHub by the way, so we can also tag you there on any updates that might be relevant for you?


  1. You can observe from this occasion that CTK provides a fast-moving environment to accompany the needs for a polyglot data ingestion subsystem. In this case, you discovered a regression in adapter stacking, so thanks a stack [sic!] for your swift report! Apparently, we also need to check the QA why this might have slipped the automatic validations. ↩︎

1 Like

Downgrade CTK to the rescue

What about downgrading to a previous version of CTK and trying that?

From its releases page, we can discover only the most recent version v0.0.37 includes the ingestr-based subsystem, so you could have a good chance to make ctk load table work with MongoDB by using the previous CTK version v0.0.36.

Good luck, and please report back about any success or other failures. :sunflower:

1 Like

Use uvx or containers for ephemeral downgrading

The fastest way to probe this if you are using uv/uvx [1] already, is to use that command, in this case as a temporary or permanent alias:

alias ctk='uvx --from=cratedb-toolkit[mongodb]==0.0.36 ctk'

If you are using Docker or Podman, use an alias like this:

alias ctk='docker run --rm -it --network=bridge "ghcr.io/crate/cratedb-toolkit:0.0.36" ctk'

Then, invoke ctk like you would have done it before:

ctk load table --help

  1. uv ↩︎

Oh no , I tried to use previous version , and I run command but it’s not have arg sslmode , I tried revome this arg but it can’t connect to cratedb . This is my command :

export CRATEDB_CLUSTER_URL=“crate://user:passwd@db_host/db?sslmode=disable”

ctk load table “mongodb://user:passwd@db_host/db?authSource=admin&readPreference=secondary”

It’s error TypeError: Connection.init() got an unexpected keyword argument ‘sslmode’

I tried to remove ssl argument but it’s can’t connect to my cratedb Retrying (Retry(total=9, connect=None, read=0, redirect=None, status=None)) after connection broken by ‘ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7c8c51ca0c80>, ‘Connection to cratedb-dev.khacthienit.click timed out. (connect timeout=None)’)’: /

Can you give me example command ? Both mongodb replicas and cratedb were hosted in K8s

What happens if you omit the sslmode parameter? If you need to enable SSL here, you may want to try ?ssl=true.

Tks for your support , but issue still happening :

(ctk-env) ubuntu@ip-172-31-19-57:~/ctk-env$ export CRATEDB_CLUSTER_URL="crate://u:p@h:p/cloudstore?sslmode=true"
(ctk-env) ubuntu@ip-172-31-19-57:~/ctk-env$ ctk load table "mongodb://u:p@h:p/CloudStore?authSource=admin&readPreference=secondary"
2025-08-12 04:27:11,646 [cratedb_toolkit.io.mongodb.api      ] INFO    : mongodb_copy. source=mongodb://u:p@h:p/CloudStore?authSource=admin&readPreference=secondary, target=crate://u:p@h:p/cloudstore?sslmode=true
2025-08-12 04:27:11,647 [cratedb_toolkit.io.mongodb.api      ] INFO    : Inquiring collections at mongodb://u:p@h:p/CloudStore?authSource=admin&readPreference=secondary
2025-08-12 04:27:12,059 [cratedb_toolkit.io.mongodb.api      ] INFO    : Discovered collections: 92
Traceback (most recent call last):
  File "/home/ubuntu/ctk-env/bin/ctk", line 7, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/cratedb_toolkit/io/cli.py", line 69, in load_table
    cluster.load_table(source=source, target=target, transformation=transformation)
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/cratedb_toolkit/cluster/core.py", line 587, in load_table
    if mongodb_copy(
       ^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/cratedb_toolkit/io/mongodb/api.py", line 170, in mongodb_copy
    MongoDBFullLoad(
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/cratedb_toolkit/io/mongodb/copy.py", line 46, in __init__
    self.cratedb_adapter = DatabaseAdapter(str(self.cratedb_sqlalchemy_url), echo=False)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/cratedb_toolkit/util/database.py", line 69, in __init__
    self.connection = self.engine.connect()
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 3277, in connect
    return self._connection_cls(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 143, in __init__
    self._dbapi_connection = engine.raw_connection()
                             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 3301, in raw_connection
    return self.pool.connect()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 447, in connect
    return _ConnectionFairy._checkout(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 1264, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 711, in checkout
    rec = pool._do_get()
          ^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get
    with util.safe_reraise():
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 224, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/impl.py", line 175, in _do_get
    return self._create_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 388, in _create_connection
    return _ConnectionRecord(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 673, in __init__
    self.__connect()
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 899, in __connect
    with util.safe_reraise():
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 224, in __exit__
    raise exc_value.with_traceback(exc_tb)
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 895, in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy/engine/create.py", line 661, in connect
    return dialect.connect(*cargs, **cparams)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/ctk-env/lib/python3.12/site-packages/sqlalchemy_cratedb/dialect.py", line 233, in connect
    return self.dbapi.connect(servers=servers, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Connection.__init__() got an unexpected keyword argument 'sslmode'

I’m already downgrade version

(ctk-env) ubuntu@ip-172-31-19-57:~/ctk-env$ ctk --version
ctk, version 0.0.36
(ctk-env) ubuntu@ip-172-31-19-57:~/ctk-env$ ctk load table --help
Usage: ctk load table [OPTIONS] URL

  Import data into CrateDB and CrateDB Cloud clusters.

Options:
  --cluster-id TEXT      CrateDB Cloud cluster identifier
  --cluster-name TEXT    CrateDB Cloud cluster name
  --cluster-url TEXT     CrateDB SQLAlchemy or HTTP URL
  --schema TEXT          Schema where to import the data
  --table TEXT           Table where to import the data
  --format TEXT          File format of the import resource
  --compression TEXT     Compression format of the import resource
  --transformation PATH  Path to Tikray transformation file
  --help                 Show this message and exit.

and I tried omit the ssl parameter before

Retrying (Retry(total=9, connect=None, read=0, redirect=None, status=None)) after connection broken by ‘ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7c8c51ca0c80>, ‘Connection to cratedb-dev.khacthienit.click timed out. (connect timeout=None)’)’: /

Hi again.

What happens if you define CRATEDB_CLUSTER_URL like this?

export CRATEDB_CLUSTER_URL="crate://u:p@h:p/cloudstore?ssl=true"

Yah , the error was disappeard , but it’s don’t work

etrying (Retry(total=9, connect=None, read=0, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fa1dc50bb30>, 'Connection to cratedb-dev.khacthienit.click timed out. (connect timeout=None)')': /

same error when I don’t use ssl argument

Thank you Võ, apologies that it didn’t work for you in an instant, and also not after exploring a few parameter permutations. Let me now have a look into the issue so we may come up with a new release and enhanced documentation that improves the situation for you.

Can you let me know when new document release ? I really need it now :frowning:

Dear Võ,

we’ve not evaluated your particular situation around SSL/TLS yet, but we can confirm the data transfer procedure works well when using CTK v0.0.36, as outlined below.

Maybe you can derive some insights from the ctk load table incantation you’ve missed before? We will promptly also exercise a situation using SSL/TLS and report back about it.

With kind regards,
Andreas.

MongoDB => CrateDB

Start MongoDB.

docker run --rm --name=mongodb --publish=27017:27017 mongo:7

Start CrateDB.

docker run --rm --name=cratedb \
  --publish=4200:4200 --publish=5432:5432 \
  --env=CRATE_HEAP_SIZE=2g crate/crate:6.0.0 -Cdiscovery.type=single-node

Insert record into MongoDB collection; you can also do it twice or more.

mongosh --eval 'db.testdrive.insert({"temperature": 42.84, "humidity": 83.1})'

Invoke data transfer.

alias ctk='uvx --from=cratedb-toolkit[mongodb]==0.0.36 ctk'
ctk load table \
  "mongodb://localhost/test/testdrive" \
  --cluster-url="crate://localhost/doc/testdrive"

Inspect data in CrateDB.

crash -c "SELECT * FROM doc.testdrive"

Hi again,

We used a CrateDB Cloud CRFREE cluster on the destination end, and it works for us with CTK v0.0.36. Please note the ?ssl=true parameter on the CrateDB cluster URL.

alias ctk='uvx --from=cratedb-toolkit[mongodb]==0.0.36 ctk'
ctk load table \
  "mongodb://localhost/test/testdrive" \
  --cluster-url="crate://admin:{redacted}@testcluster.aks1.westeurope.azure.cratedb.net/doc/testdrive?ssl=true"

Please let us know if that works for you, or if you need further assistance.

With kind regards,
Andreas.

NB: When using Docker, this would be the appropriate overlay command for ad hoc use of CrateDB Toolkit v0.0.36, where uvx was used above.

alias ctk='docker run --rm -it --network=bridge "ghcr.io/crate/cratedb-toolkit:0.0.36" ctk'

NB: The other command syntax that goes for MongoDB like /collection?table=collection.table, coming from ingestr, does not fit that interface, i.e. is wrong, so please don’t use it. In the same spirit, we will remove it from the documentation, because it seems not to be ready yet. Sorry for the confusion and thanks for your patience.

Hi. CrateDB Toolkit v0.0.38, just released, fixes the problem so you will no longer need to use a specific previous release of CTK. Now, CTK uses the native MongoDB I/O adapter again. Please let us know if you observe any other problems, or if that works well for you now.