create table myorder (
id long,
ordervalue long,
quantity long,
quantityStr string,
name string,
primary key (id)
);
Note: quantityStr is column with type string.
Now let’s say there are 2 records with values for quantityStr column as:
quantityStr
10
89
Now, lets execute query select * from myorder where TRY_CAST(quantitystr AS bigint) > 3;
My assumption is that since we are casting quantitystr AS bigint, it should do numeric evaluation.
But output for the above query returns only 1 record having value 89 for quantityStr column
i.e. it doesn’t select record with value 10 for quantityStr column.
It seems TRY_CAST is doing character by character comparison of numeric string.
This is used to work with 3.2.7 version.
Could you please provide any info for this behaviour?
Is it a bug introduced in the version after 3.2.7 version.
This indeed seems to be a bug and related how numeric queries on text values are processed. In this case CrateDB does not just do a simple full table scan converting all quantitystr fields to numeric values in an intermediate result, but rather tries to establish a filter, so that the text index can be used.
One can see that behaviour when using EXPLAIN ANALYZE
explain analyze SELECT * FROM myorder
WHERE TRY_CAST(quantitystr AS bigint) > 2 limit 100;
The way numeric values are indexed within CrateDB using BKD-Trees nowadays, while in the past also inverted (text-based) indexes where used with padded 0.
Generally speaking I’d consider using text indexes for numeric lookups nowadays bad practices. I will however raise a issue in crate/crate on GitHub.
You might just want to do the parsing on insert with a generating function
create table myorder (
id long,
ordervalue long,
quantity long,
quantityStr string,
quantityStr_parsed long GENERATED ALWAYS AS try_cast(quantityStr as long),
name string,
primary key (id)
);
A rather expensive workaround (table scan) would be an altering view (e.g. adding 0)
CREATE OR REPLACE VIEW v_myorder AS
SELECT quantity, try_cast(quantitystr as long)+0 AS quantitystr_long
FROM "doc"."myorder";
or also rather expensive with a UDF
CREATE FUNCTION text_gt (txt_val TEXT, num LONG)
RETURNS BOOLEAN
LANGUAGE JAVASCRIPT
AS 'function text_gt(txt_val, num) {
return Number(txt_val) > num
}';
SELECT * FROM myorder
WHERE text_gt(quantitystr,3);
I also had come up with quick fix similar to that only…
CREATE or replace FUNCTION my_try_cast( input string )
RETURNS bigint
LANGUAGE JAVASCRIPT
AS 'function my_try_cast(input) {
if (isNaN(input)) { //This will handle mix data (numeric or non numeric)
return null
} else {
return Number(input)
}
}';