Does comparison operator =(equal) case sensitive?

Valeriy_Dzhura · November 29, 2021, 8:30pm

Hello guys,

After reading this topic I have a question, does comparison operator =(equal) case sensitive for varchar or text values? And can we change this behavior for table/query/server.

I know that I can use ILIKE operator like solution, but I interest behavior like in MSSQL where operator =(equal) is case insensitive but can be case sensitive if we will change database/table collation i.e. we can change this behavior but could we do the same in the CrateDB?

Thanks

proddata · November 30, 2021, 6:10pm

HI @Valeriy_Dzhura

It is case sensitive yes. I am not aware of any table or general settings on cluster level.

ILIKE
using LOWER()
use a fulltext index with e.g. ‘simple’ analyze

CREATE TABLE tab (
  txt TEXT INDEX using FULLTEXT with (type='simple')
 );

INSERT INTO tab VALUES ('Hello');

SELECT txt FROM tab 
WHERE txt = 'hello'
--> 'Hello'

Valeriy_Dzhura · November 30, 2021, 10:53pm

Hello again,

I used next analyzer for my field:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

and

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

And as you can see I used lowercase in TOKEN_FILTERS, but when I compare my field with this analyzer using operator =(equal) it is still case sensitive. Looks like lowercase not work for this field or maybe I do something wrong?

For example, these queries returns different results but should be the same:

select * from x.x where name = 'John'
select * from x.x where name = 'john'

Thanks!

proddata · December 1, 2021, 4:58pm

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

select * from x.x where name = 'john'

or use lower() in the comparison

select * from x.x where name = lower('John')

if you need to have both options you could create a 2nd index.

Valeriy_Dzhura · December 1, 2021, 5:43pm

Can you show me please an example how I can use two indexes for 1 field in my test case:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

Thanks!

proddata · December 1, 2021, 8:38pm

e.g. like that:

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX using plain,
INDEX "name_ft" USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)

also see Fulltext indices — CrateDB: Reference

-- exact match
select name from x.x where name = 'John'

-- using full_text index
select name from x.x where name_ft = 'john'

Valeriy_Dzhura · December 1, 2021, 9:47pm

Not sure that I understand you right, but why when I use lowercase in TOKEN_FILTERS my queries works wrong, for example you can see next statement:

create ANALYZER myAnalyzer(
  TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
  TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));

create table if not exists x.x(
  "name" text index using fulltext with (analyzer = 'myAnalyzer'),
  "test" varchar(8)
);

insert into x.x(name, test)
values('John','John'),('JOHN','JOHN'),('jOhn','jOhn'),('john','john');

select _score, name from x.x where name = 'john' limit 100; -- NO HIT
select _score, name from x.x where name = 'John' limit 100; -- NO HIT
select _score, name from x.x where name = 'JOHN' limit 100; -- NO HIT

In each select returns NO HIT but as you can see we have this records in the table and in each select I should get all four rows.

According to this:

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

My first query select _score, name from x.x where name = 'john' limit 100; -- NO HIT should return 1 row, but nothing. This is fully misconfused me.

proddata · December 1, 2021, 10:02pm

Ah … I overlooked that you used a ngram tokenizer.
Then you should use a MATCH instead Fulltext search — CrateDB: Reference

with

CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX using plain,
   INDEX "name_ft" USING FULLTEXT WITH (analyzer = 'myAnalyzer'),
"test" varchar(8)
)

you could do

select _score, name from x.x where name = 'john' limit 100;

or

select _score, name from x.x where MATCH(name_ft, 'john') limit 100;

Valeriy_Dzhura · December 1, 2021, 11:51pm

Thank you, but I still not understand how lowercase works and why it works like this. According to this:

The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case

In index it will be stored in lowercase but when I will use WHERE condition we will not look into index when will use =(equal) operator. =(equal) operator will look only in fields values, not in index, right? Or is it only for fulltext search?

Topic		Replies	Views
Case Insensitive searching in cratedb CrateDB	1	839	September 20, 2022
Strange behavior with ILIKE operator SQL	4	611	April 5, 2022
String function for lower case? SQL	1	904	December 7, 2018
ANALYZER with two and more TOKENIZER/TOKEN_FILTER CrateDB	3	559	November 23, 2021
I noticed that the Order by function doesn't work correctly in Turkish in the TEXT field CrateDB	3	344	May 16, 2023

Does comparison operator =(equal) case sensitive?

Related topics