If I use use COPY TO for a big table with many shards, will there be different files on different nodes?

Emre_Sevinc · September 13, 2021, 7:47am

Hello,

According to https://crate.io/docs/crate/reference/en/4.6/sql/statements/copy-to.html

The COPY TO command exports the contents of a table to one or more files into a given directory with unique filenames. Each node with at least one shard of the table will export its contents onto their local disk.

The created files are JSON formatted and contain one table row per line and, due to the distributed nature of CrateDB, will remain on the same nodes where the shards are.

Here’s an example:

cr> COPY quotes TO DIRECTORY ‘/tmp/’ with (compression=‘gzip’);

COPY OK, 3 rows affected …

I’m a bit confused about: “will remain on the same nodes where the shards are“. Can you clarify with a few examples? If I export one big table with many shards, will I find more than one JSON file residing on different nodes?

hammerhead · September 13, 2021, 9:04am

Hi @Emre_Sevinc,

the COPY TO command will indeed create one file per shard. In the example you quoted from the documentation, the output COPY OK, 3 rows affected means that the table consists of three shards, and therefore three JSON files have been written. You will see files named quotes_0_.json, quotes_1_.json, and quotes_2_.json on the file system (0 - 2 indicate the shard number).

Let’s assume that each shard is stored on a different node. Then you will find on each node’s local file system only the JSON file belonging to the shard located on that node (node-1: /tmp/quotes_0_.json, node-2: /tmp/quotes_1_.json, node-3: /tmp/quotes_2_.json).

If you wish to export all files to one common directory, then you will have two choices:

You mount on each node a central network share and use the mount point as a target for COPY TO. Since the files have unique names, this can be done without any file name conflicts.
You use S3 as a target instead of the local file system. Then each node will write to the same S3 bucket and all JSON files will be in the same place.

Hope that clarifies the behavior, please let me know if there are still open questions.

Best

Niklas

Topic		Replies	Views
Why does COPY TO create empty JSON files and how is data distributed to these files? SQL	1	654	November 12, 2021
Copying table results in corrupt data CrateDB	3	874	May 20, 2022
Weird behaviour with COPY FROM SQL	2	758	April 1, 2020
Insert data to local node only? CrateDB	6	815	March 31, 2022
Copy data within tables SQL	4	678	May 20, 2021

If I use use COPY TO for a big table with many shards, will there be different files on different nodes?

Related topics