Hello,
According to https://crate.io/docs/crate/reference/en/4.6/sql/statements/copy-to.html
The COPY TO command exports the contents of a table to one or more files into a given directory with unique filenames. Each node with at least one shard of the table will export its contents onto their local disk.
The created files are JSON formatted and contain one table row per line and, due to the distributed nature of CrateDB, will remain on the same nodes where the shards are.
Here’s an example:
cr> COPY quotes TO DIRECTORY ‘/tmp/’ with (compression=‘gzip’);
COPY OK, 3 rows affected …
I’m a bit confused about: “will remain on the same nodes where the shards are“. Can you clarify with a few examples? If I export one big table with many shards, will I find more than one JSON file residing on different nodes?
Hi @Emre_Sevinc,
the COPY TO
command will indeed create one file per shard. In the example you quoted from the documentation, the output COPY OK, 3 rows affected
means that the table consists of three shards, and therefore three JSON files have been written. You will see files named quotes_0_.json
, quotes_1_.json
, and quotes_2_.json
on the file system (0 - 2 indicate the shard number).
Let’s assume that each shard is stored on a different node. Then you will find on each node’s local file system only the JSON file belonging to the shard located on that node (node-1: /tmp/quotes_0_.json
, node-2: /tmp/quotes_1_.json
, node-3: /tmp/quotes_2_.json
).
If you wish to export all files to one common directory, then you will have two choices:
- You mount on each node a central network share and use the mount point as a target for
COPY TO
. Since the files have unique names, this can be done without any file name conflicts.
- You use S3 as a target instead of the local file system. Then each node will write to the same S3 bucket and all JSON files will be in the same place.
Hope that clarifies the behavior, please let me know if there are still open questions.
Best
Niklas
2 Likes