Hi all,
before going too far on the wrong way I need some advices on how to manage metadata related to blobs.
If I understand correctly the documentation, the blob only queryable fields are digest and last_modified columns.
But I need to manage blob’s metadata.
What are the best practices to do this in CrateDB?
TL;DR
What I’ve tried so far.
In my use case I need to manage attachments to documents.
So I created a blob table
create blob table attachments with (blobs_path='/data/data0/myblobs/attachments');
But I need to associate to the blob at least a searchable name, a searchable description and the original file name.
So I created a table to contain the metadata related to the blobs
cr> SHOW CREATE TABLE attachments_meta;
+-------------------------------------------------------+
| SHOW CREATE TABLE doc.attachments_meta |
+-------------------------------------------------------+
| CREATE TABLE IF NOT EXISTS "doc"."attachments_meta" ( |
| "digest" TEXT NOT NULL, |
| "name" TEXT INDEX USING FULLTEXT WITH ( |
| analyzer = 'english' |
| ), |
| "description" TEXT INDEX USING FULLTEXT WITH ( |
| analyzer = 'english' |
| ), |
| "filename" TEXT, |
| PRIMARY KEY ("digest") |
| ) |
| CLUSTERED BY ("digest") INTO 6 SHARDS |
| WITH ( |
| "allocation.max_retries" = 5, |
| "blocks.metadata" = false, |
| "blocks.read" = false, |
| "blocks.read_only" = false, |
| "blocks.read_only_allow_delete" = false, |
| "blocks.write" = false, |
| codec = 'default', |
| column_policy = 'strict', |
| "mapping.total_fields.limit" = 1000, |
| max_ngram_diff = 1, |
| max_shingle_diff = 3, |
| number_of_replicas = '0-1', |
| "routing.allocation.enable" = 'all', |
| "routing.allocation.total_shards_per_node" = -1, |
| "store.type" = 'fs', |
| "translog.durability" = 'REQUEST', |
| "translog.flush_threshold_size" = 536870912, |
| "translog.sync_interval" = 5000, |
| "unassigned.node_left.delayed_timeout" = 60000, |
| "write.wait_for_active_shards" = '1' |
| ) |
+-------------------------------------------------------+
SHOW 1 row in set (0.001 sec)
cr>
The tables associations is done using the digest column.
For testing I loaded the tables with some example data
cr> SELECT * FROM BLOB.attachments;
+------------------------------------------+---------------+
| digest | last_modified |
+------------------------------------------+---------------+
| acfa51b963b5a1fb9eecaf7954ae39c9bea736cc | 1735995025468 |
| 135b6d193c9acba8ea180dea7424b863ce52858b | 1735994937133 |
| f83be20cb0439e5008242ec7a01cf1fe168f413b | 1735994988697 |
| 9a3993e2ecbbb3b1ea1435650cedd7fec2454e73 | 1735995055568 |
+------------------------------------------+---------------+
SELECT 4 rows in set (0.020 sec)
cr> SELECT * FROM attachments_meta;
+------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------+----------------------------------------------+
| digest | name | description | filename |
+------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------+----------------------------------------------+
| 135b6d193c9acba8ea180dea7424b863ce52858b | Insert CPU IN motherboard socket | Short video ON how TO INSERT a CPU IN the motherboard socket | 11537353-hd_1920_1080_30fps.mp4 |
| f83be20cb0439e5008242ec7a01cf1fe168f413b | Impress Guide v. 24.2 | LibreOffice Impress Guide version 24.2. A manual FOR creating presentations | IG24-ImpressGuide.pdf |
| acfa51b963b5a1fb9eecaf7954ae39c9bea736cc | ROG Maximus Z790 Hero BTF - Quick START guide | The quick START guide FOR the Asus Republic Of Gamers Maximus Z790 Hero BTF motherboard | Q23155_ROG_MAXIMUS_Z790_HERO_BTF_QSG_WEB.pdf |
| 9a3993e2ecbbb3b1ea1435650cedd7fec2454e73 | Top VIEW of ROG CROSSHAIR X670E HERO | The top VIEW image of the Asus Republic Of Gamers CROSSHAIR X670E HERO motherboard | ROG_CROSSHAIR_X670E_HERO_h732.png |
+------------------------------------------+-----------------------------------------------+-----------------------------------------------------------------------------------------+----------------------------------------------+
SELECT 4 rows in set (0.020 sec)
cr>
Using these tables I’m able to do a full text search for attachments and download the blobs found restoring the original filename.