After reading this topic I have a question, does comparison operator =(equal) case sensitive for varchar or text values? And can we change this behavior for table/query/server.
I know that I can use ILIKE operator like solution, but I interest behavior like in MSSQL where operator =(equal) is case insensitive but can be case sensitive if we will change database/table collation i.e. we can change this behavior but could we do the same in the CrateDB?
It is case sensitive yes. I am not aware of any table or general settings on cluster level.
ILIKE
using LOWER()
use a fulltext index with e.g. ‘simple’ analyze
CREATE TABLE tab (
txt TEXT INDEX using FULLTEXT with (type='simple')
);
INSERT INTO tab VALUES ('Hello');
SELECT txt FROM tab
WHERE txt = 'hello'
--> 'Hello'
create ANALYZER myAnalyzer(
TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));
and
CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)
And as you can see I used lowercase in TOKEN_FILTERS, but when I compare my field with this analyzer using operator =(equal) it is still case sensitive. Looks like lowercase not work for this field or maybe I do something wrong?
For example, these queries returns different results but should be the same:
select * from x.x where name = 'John'
select * from x.x where name = 'john'
Can you show me please an example how I can use two indexes for 1 field in my test case:
create ANALYZER myAnalyzer(
TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));
CREATE TABLE IF NOT EXISTS "x"."x" (
"name" TEXT INDEX USING FULLTEXT WITH (analyzer = 'myAnalyzer')
)
Not sure that I understand you right, but why when I use lowercase in TOKEN_FILTERS my queries works wrong, for example you can see next statement:
create ANALYZER myAnalyzer(
TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem));
create table if not exists x.x(
"name" text index using fulltext with (analyzer = 'myAnalyzer'),
"test" varchar(8)
);
insert into x.x(name, test)
values('John','John'),('JOHN','JOHN'),('jOhn','jOhn'),('john','john');
select _score, name from x.x where name = 'john' limit 100; -- NO HIT
select _score, name from x.x where name = 'John' limit 100; -- NO HIT
select _score, name from x.x where name = 'JOHN' limit 100; -- NO HIT
In each select returns NO HIT but as you can see we have this records in the table and in each select I should get all four rows.
According to this:
The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case
My first query select _score, name from x.x where name = 'john' limit 100; -- NO HIT should return 1 row, but nothing. This is fully misconfused me.
Thank you, but I still not understand how lowercase works and why it works like this. According to this:
The lowercase token filter lowercases the entries in the index. So you would need to search for the lower case
In index it will be stored in lowercase but when I will use WHERE condition we will not look into index when will use =(equal) operator. =(equal) operator will look only in fields values, not in index, right? Or is it only for fulltext search?