Error occurs while using COPY FROM json file when sequence of the records in a file mismatch

Version : crate-5.5.0

I am using below command to restore json backup in my table -

copy temp.my_table FROM '/tmp/crate-bkp/new/my_table_0_04732d9p60sj8e9o60o30c1g.json' RETURN SUMMARY;

It returns with the error - “Cannot cast value column 4 data to type bigint

However, restore of the same file was successful on the older version of crate-5.1.0 .

Table definition -

create table temp.my_table (
	id long,
	first_column long,
	second_column string,
	third_column long,
	fourth_column string,
	fifth_column long,
	sixth_column string,
	seventh_column string ,
	eighth_column TIMESTAMP WITH TIME ZONE,
	nineth_column TIMESTAMP WITH TIME ZONE,
	tenth_column string,
	object_column object as (
      	      sub_column_w long,
     	      sub_column_x ip,
      	      sub_column_y string,
              sub_column_z boolean
    ),
	primary key (eighth_column,fifth_column,id )
) PARTITIONED BY (eighth_column)
WITH (
   "translog.durability" = 'ASYNC'
);

JSON File sample -

{"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":125,"first_column":159000000,"fourth_column":"column 4 data","nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":137,"first_column":159000000,"fourth_column":"column 4 data","nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":138,"first_column":159000000,"fourth_column":"column 4 data","nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":149,"first_column":159000000,"fourth_column":"column 4 data","nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
  {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":150,"fourth_column":"column 4 data","first_column":159000000,"nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":152,"fourth_column":"column 4 data","first_column":159000000,"nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":156,"fourth_column":"column 4 data","first_column":159000000,"nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":165,"fourth_column":"column 4 data","first_column":159000000,"nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
 {"second_column":"column 2 data","third_column":159000000,"tenth_column":"column 10 data","id":167,"fourth_column":"column 4 data","first_column":159000000,"nineth_column":1590949800000,"object_column":{"sub_column_w":12101610,"sub_column_x":"104.248.34.219","sub_column_y":"sub_column_1","sub_column_z":true},"fifth_column":93,"sixth_column":"column 6 data","seventh_column":"column 7 data","eighth_column":1590949800000}
1 Like

Hi,
would it be possible that some line in the file is malformed and is missing

,"fourth_column":

?
Maybe run a count of occurrences of the above string?

Hi @hernanc
I checked the count of the columns in my data. It is exactly same as others. So data is not malformed.

Dear Sayali,

thank you for writing in and for reporting this problem.

Without knowing any more details about the root cause, I can confirm it is apparently a regression with CrateDB 5.5.

Importing the JSONL file using the schema you’ve outlined works with latest crate:5.4, but fails with crate:5.5.0 [1].

The full error message from the response is [2]:

{
  "Cannot cast value `column 4 data` to type `bigint`": {
    "count": 5,
    "line_numbers": [5, 6, 7, 8, 9]
  }
}

After coming up with a more minimal reproducer, when possible, let’s carry it over to a bug report at https://github.com/crate/crate.

With kind regards,
Andreas.


  1. Those labels refer to OCI image names on Docker Hub. ↩︎

  2. Well, I can see there are anomalies within the JSONL snippet you shared with us, specifically there is some leading whitespace on a few lines. However, that starts at line 2 already, and I haven’t been able to find any pattern when shuffling the apparant anomalous lines, and well, it worked with CrateDB <5.5, so I think it will be safe to classify as a regression. ↩︎

@amotl
The bug is reported Error occurs while using COPY FROM json file when sequence of the records in a file mismatch · Issue #15184 · crate/crate · GitHub
Thanks.

1 Like