ANALYZER with two and more TOKENIZER/TOKEN_FILTER

Valeriy_Dzhura · November 22, 2021, 7:34pm

Hello guys,

Could someone help me please with next question, I created custom analyzer:

create ANALYZER bankruptcies_ngram_and_synonym (
TOKENIZER CustomTokenizer with (type='ngram', min_gram=2, max_gram=2, token_chars=['letter']),
TOKEN_FILTERS (my_synonyms WITH (type='synonym', synonyms_path='synonyms.txt'), lowercase, kstem)
);

but it is not work well, when I populated and run this:

select * from <my table> WHERE MATCH (firstname,'WILLIMA') AND STATE = 'NY' limit 100;

or this:

select * from bankruptcies.bankruptcies WHERE MATCH (firstname,'william') AND STATE = 'NY' limit 100;

I got no hit, but this data 100% exist in this table and this looks like lowercase and ngram not work. Maybe I did this analyzer wrong or I should to do additional steps or something else?

proddata · November 22, 2021, 7:40pm

Is your table defined to use the analyzer in a fulltext index?

create table <my table> (
  firstname TEXT INDEX using fulltext with (analyzer = 'bankruptcies_ngram_and_synonym')
);

or using a separate fulltext index:

create table <my table> (
  firstname TEXT,
 INDEX firstname_ft using fulltext(firstname) with (analyzer = 'bankruptcies_ngram_and_synonym')
);

Valeriy_Dzhura · November 22, 2021, 9:11pm

Second variant:

CREATE TABLE IF NOT EXISTS "bankruptcies"."test" (
   "bankruptciesid" BIGINT,
   "firstname" TEXT,
   "middlename" TEXT,
   "lastname" TEXT,
   PRIMARY KEY ("bankruptciesid"),
  INDEX firstname_ft USING FULLTEXT (firstname) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
  , INDEX middlename_ft USING FULLTEXT (middlename) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
  , INDEX lastname_ft USING FULLTEXT (lastname) WITH (analyzer = 'bankruptcies_ngram_and_synonym')
)

P.S. I right now tried create as first variant and it works, but what difference between these ones?

proddata · November 23, 2021, 7:54am

If you define separate indexes you need to use them in the query i.e.:

select * from bankruptcies.bankruptcies WHERE MATCH (firstname_ft,'william') AND STATE = 'NY' limit 100;

Topic		Replies	Views
Get list of names/sql code for analizers SQL	5	740	November 22, 2021
Does crate support 3rd party analyzers/tokenizers? CrateDB	2	582	December 5, 2018
Fuzzy Search & Synonyms SQL	3	1028	November 19, 2021
Crate db full text search phrase_prefix does not return everything need to be returned CrateDB	2	24	March 14, 2025
How to search on analyzed fields CrateDB	1	697	February 18, 2019

ANALYZER with two and more TOKENIZER/TOKEN_FILTER

Related topics