Hi all,
seems that there is an error in defining the analyzer in the tables defined for the Academy CrateDB Fundamentals course.
What have I to do?
Going on with the course accepting different results from the lesson?
What I discovered:
In the Hands-on: Full-Text Search lesson is reported that the community_areas table was created using the “english” analyzer
When the community_areas table was created, we told CrateDB to perform full-text indexing on it:
…
details OBJECT(STRICT) AS (
description TEXT INDEX USING fulltext WITH (analyzer='english'),
population BIGINT
), …
Let’s use the MATCH predicate to try some queries against the index that was created. Recall that, when using MATCH, we can select _score to retrieve a score for each result. This score is a measure of the quality of the match compared to other matching rows.
Execute this query to search for community areas containing either the term “railroad” or the term “tracks”:
SELECT name, _score, details['description'] AS description
FROM community_areas
WHERE MATCH(details['description'], 'railroad tracks')
ORDER BY _score DESC;
Take a moment to look at the results returned by this query. Note that Greater Grand Crossing and Burnside are the community areas with the highest scores.
Running the query I see different results: the first two community_areas are West Town and “Burnside” instead of "Greater Grand Crossing"and “Burnside”
Looking at the Hands-on: Your First CrateDB Cluster lesson I see that the table was created using this command
CREATE TABLE community_areas (
areanumber INTEGER PRIMARY KEY,
name TEXT,
details OBJECT(STRICT) AS (
description TEXT INDEX USING fulltext,
population BIGINT
),
boundaries GEO_SHAPE INDEX USING geohash WITH (PRECISION='1m', DISTANCE_ERROR_PCT=0.025)
);
But this means to use the “standard” analyzer
cr> SHOW CREATE TABLE community_areas
;
+------------------------------------------------------+
| SHOW CREATE TABLE doc.community_areas |
+------------------------------------------------------+
| CREATE TABLE IF NOT EXISTS "doc"."community_areas" ( |
| "areanumber" INTEGER NOT NULL, |
| "name" TEXT, |
| "details" OBJECT(STRICT) AS ( |
| "description" TEXT INDEX USING FULLTEXT WITH ( |
| analyzer = 'standard' |
| ), |
| "population" BIGINT |
| ), |
| "boundaries" GEO_SHAPE INDEX USING GEOHASH WITH ( |
| distance_error_pct = 0.025, |
| precision = '1m' |
| ), |
| PRIMARY KEY ("areanumber") |
| ) |
| CLUSTERED BY ("areanumber") INTO 6 SHARDS |
| WITH ( |
| "allocation.max_retries" = 5, |
| "blocks.metadata" = false, |
| "blocks.read" = false, |
| "blocks.read_only" = false, |
| "blocks.read_only_allow_delete" = false, |
| "blocks.write" = false, |
| codec = 'default', |
| column_policy = 'strict', |
| "mapping.total_fields.limit" = 1000, |
| max_ngram_diff = 1, |
| max_shingle_diff = 3, |
| number_of_replicas = '0-1', |
| "routing.allocation.enable" = 'all', |
| "routing.allocation.total_shards_per_node" = -1, |
| "store.type" = 'fs', |
| "translog.durability" = 'REQUEST', |
| "translog.flush_threshold_size" = 536870912, |
| "translog.sync_interval" = 5000, |
| "unassigned.node_left.delayed_timeout" = 60000, |
| "write.wait_for_active_shards" = '1' |
| ) |
+------------------------------------------------------+
SHOW 1 row in set (0.009 sec)
cr>
So using different analyzers we get different results.