COPY FROM is a handy command: it allows you to connect CrateDB with a filesystem and import data from local or remote files to a specified table. The syntax is fairly simple: often, you just need to specify the destination table and URI of the file:
COPY table_name FROM 'path_to_file';
To import data from a file into the table you will need the table with the same number and data type of columns as present in the file. However, importing data is not always straightforward: there are sometimes cases when files are not formatted in a way that CrateDB accepts with default settings. In this situation, you can see the successful execution of the command but an empty table as a result. So, how do we find out what the error is and how to solve it?
In this tutorial, we will illustrate how to effectively use the
COPY FROM statement in CrateDB and show you several options you should know in order to avoid common mistakes.
To get detailed information about possible failures when executing statements in CrateDB use the
RETURN SUMMARY clause. The clause is optional but very helpful: it reports information such as the node that processed the file, the number of rows that were inserted, the number of records that failed, and detailed information about all errors.
Now, let’s imagine having a simple
users table and a CSV file containing several records:
CREATE TABLE "doc"."users" ( "id" INTEGER, "name" TEXT, "country" TEXT )
If we try to import the following records:
"id","name","country" "1", "Ana", "DE" "2", "Sara", "DE" "\\", "Peter", "DE"
We will get the output as
COPY OK, 2 records affected (1.1 seconds)
But, we actually have three rows in our file. By running the
RETURN SUMMARY clause with the
COPY FROM statement and inspecting the
errors field in the output, we can see that the third row wasn’t imported due to the following error:
As we can see, we also get information about the number of records that failed with the error and the line numbers in the source file where the error occurred. We would strongly advise using this optional clause whenever you run your queries in CrateDB.
More information about the complete output of the
RETURN SUMMARY clause can be found in the official documentation.
CrateDB accepts files in JSON or CSV formats. Please have in mind that if the format is not specified, the file will proceed as JSON. Furthermore, the format specification can be provided as a
format option in the
When importing CSV files, it is important to consider the file formatting. For instance, the default delimiter in CrateDB is
, but it may be the case that the columns in the file are separated with a different delimiter. In the following sections, we will show you how you can import files with other, non-default formatting options.
Although CSV files usually separate values using commas, this is not always the case. Basically, any delimiter can be used to separate columns in CSV files. Some of the most common delimiters include the semicolon (
;), a new line(
\n), vertical line (
|), or tab(
\t). A file with a tab-separated value is also known as a TSV file and we will go into more details on how to import TSV files later in this tutorial.
To specify the delimiter character in the
COPY FROM statement, you should use the
delimiter option with a
WITH clause. For example, if the column values are separated by a semicolon, we need this additional definition:
COPY table_name FROM 'file_path/file_name.csv' WITH (delimiter=';', format='csv')
The first line of a CSV file indicates a header with the column name. CrateDB checks if the types from the import file match the data type of the column in the destination table. This casts the types and will always import the data as in the source file. However, if the import file does not contain the header, you can still import data by setting
header = false option in
COPY table_name FROM 'file_path/file_name.csv' WITH (header=false, format='csv')
The default value of the
header option is
true, and in this case, the first line in the CSV file must contain the column names. Instead of importing all data, you can use the optional column declaration to import a subset of the data:
COPY table_name(column1, column2) FROM 'file_path/file_name.csv' WITH (format='csv')
The example shows the case when you import the data only for the first two columns. The values of other columns in the table will be automatically set to
header option is set to
false, the CSV must not contain column names in the first line, and instead, the columns declared in the
COPY statement are used (e.g.,
COPY table_name(column1, column2, ...)). If no columns are declared in the statement, it will default to all columns present in the destination table.
A tab-separated values (TSV) file is a variation of the comma-separated values format. It is one of the most common files when transferring data between databases or applications. In TSV files, each row is one line of the text file and each field value is separated from the next by a tab character.
You can import data from TSV files in the same way as CSV files, but you need to provide two more options in
WITH clause: delimiter and format, as illustrated below:
COPY table_name FROM 'file_path/file_name.tsv' WITH (delimiter=e'\t', format='csv')
The command above shows that the file should be processed as a CSV file where the columns in the file are separated with a tab delimiter. For
\t sequence to be interpreted as a tab character, it must be specified with an escape token as illustrated.
If you want to import CSV or JSON files that are compressed in
gzip file format you should use the
compression option with the
COPY FROM command:
COPY table_name FROM 'file_path/file_name.csv' WITH (compression='gzip', format='csv')
The default value for the
compression option is
null, and it should be set to
gzip to read gzipped files. For now, CrateDB allows the import of compressed data only in
In this tutorial, we illustrated how to use the
COPY FROM command in CrateDB. We started with the
RETURN SUMMARY clause that gives you a better understanding of the errors that occurred during the import and discussed a couple of options for more effective use of
COPY FROM. A better understanding of these capabilities can help you deal with data imports from files that are formatted using non-standard options.