SQL syntax for a left join where the matching clause is between a text_array and a text

(sorry for the stupid question, but I’m lost in the documentation)

I have these two tables:

cr> show create table processors;
+-----------------------------------------------------+
| SHOW CREATE TABLE doc.processors                    |
+-----------------------------------------------------+
| CREATE TABLE IF NOT EXISTS "doc"."processors" (     |
|    "identifier" TEXT NOT NULL,                      |
|    "name" TEXT INDEX USING FULLTEXT WITH (          |
|       analyzer = 'english'                          |
|    ),                                               |
|    "definition" TEXT INDEX USING FULLTEXT WITH (    |
|       analyzer = 'english'                          |
|    ),                                               |
|    "properties" OBJECT(DYNAMIC) AS (                |
|       "brand" TEXT,                                 |
|       "cores_theads" TEXT,                          |
|       "socket" TEXT                                 |
|    ),                                               |
|    "attachments" ARRAY(TEXT),                       |
|    PRIMARY KEY ("identifier")                       |
| )                                                   |
...

cr> show create table attachments_meta;
+-------------------------------------------------------+
| SHOW CREATE TABLE doc.attachments_meta                |
+-------------------------------------------------------+
| CREATE TABLE IF NOT EXISTS "doc"."attachments_meta" ( |
|    "digest" TEXT NOT NULL,                            |
|    "name" TEXT INDEX USING FULLTEXT WITH (            |
|       analyzer = 'english'                            |
|    ),                                                 |
|    "description" TEXT INDEX USING FULLTEXT WITH (     |
|       analyzer = 'english'                            |
|    ),                                                 |
|    "filename" TEXT,                                   |
|    PRIMARY KEY ("digest")                             |
| )                                                     |
...

The tables contain:

cr> select * from processors;
[
  {
    "identifier": "AMD_R9_7950X",
    "name": "Ryzen 9 7950X",
    "definition": "AMD Ryzen 9 7950X 16-Core, 32-Thread Unlocked Desktop Processor",
    "properties": {
      "cores_theads": "16/32",
      "socket": "AM5",
      "brand": "AMD"
    },
    "attachments": [
      "acfa51b963b5a1fb9eecaf7954ae39c9bea736cc",
      "9a3993e2ecbbb3b1ea1435650cedd7fec2454e73",
      "135b6d193c9acba8ea180dea7424b863ce52858b"
    ]
  },
  {
    "identifier": "AMD_R7_5700G",
    "name": "Ryzen 7 5700G",
    "definition": "Processore AMD Ryzen 7 5700G (scheda grafica integrata Radeon, 8 C/16 T, 65W TDP, AM4 Socket, 20MB cache, Boost di Frequenza fino a 4.6 Ghz max, dissipatore ad aria \"Wraith stealth cooler\"",
    "properties": {
      "cores_theads": "8/16",
      "socket": "AM4",
      "brand": "AMD"
    },
    "attachments": [
      "9a3993e2ecbbb3b1ea1435650cedd7fec2454e73",
      "135b6d193c9acba8ea180dea7424b863ce52858b"
    ]
  },
  {
    "identifier": "INT_i9_14900KS",
    "name": "Core i9 14900KS",
    "definition": "Intel\u00ae Core\u2122 i9 processor 14900KS (36M Cache- up to 6.20 GHz)",
    "properties": {
      "cores_theads": "8+16/32",
      "socket": "FCLGA1700",
      "brand": "Intel"
    },
    "attachments": []
  }
]
SELECT 3 rows in set (0.810 sec)
cr>    



cr> select * from attachments_meta;
[
  {
    "digest": "135b6d193c9acba8ea180dea7424b863ce52858b",
    "name": "Insert CPU IN motherboard socket",
    "description": "Short video ON how TO INSERT a CPU IN the motherboard socket",
    "filename": "11537353-hd_1920_1080_30fps.mp4"
  },
  {
    "digest": "f83be20cb0439e5008242ec7a01cf1fe168f413b",
    "name": "Impress Guide v. 24.2",
    "description": "LibreOffice Impress Guide version 24.2. A manual FOR creating presentations",
    "filename": "IG24-ImpressGuide.pdf"
  },
  {
    "digest": "acfa51b963b5a1fb9eecaf7954ae39c9bea736cc",
    "name": "ROG Maximus Z790 Hero BTF - Quick START guide",
    "description": "The quick START guide FOR the Asus Republic Of Gamers Maximus Z790 Hero BTF motherboard",
    "filename": "Q23155_ROG_MAXIMUS_Z790_HERO_BTF_QSG_WEB.pdf"
  },
  {
    "digest": "9a3993e2ecbbb3b1ea1435650cedd7fec2454e73",
    "name": "Top VIEW of ROG CROSSHAIR X670E HERO",
    "description": "The top VIEW image of the Asus Republic Of Gamers CROSSHAIR X670E HERO motherboard",
    "filename": "ROG_CROSSHAIR_X670E_HERO_h732.png"
  }
]
SELECT 4 rows in set (0.017 sec)
cr>

Now I need to get all the processors rows with the data from the attachments_meta table where any item in the processors “attachments” field (text_array) is equal to the “digest” field in the attachments meta table.

The “attachment” field can contain zero or more items.

The two syntax I used give different errors

cr> SELECT * FROM processors p LEFT JOIN attachments_meta a ON p.attachments = a.digest;
UnsupportedFunctionException[Unknown function: (p.attachments = a.digest), no overload found for matching argument types: (text_array, text). Possible candidates: op_=(E, E):boolean]

cr> SELECT * FROM processors p LEFT JOIN attachments_meta a WHERE p.attachments = a.digest;
SQLParseException[line 1:57: no viable alternative at input 'SELECT * FROM processors p LEFT JOIN attachments_meta a WHERE']

So my question: what is the syntax for a left join where the field of comparison is a text_array ?

Please see the single column vs array column available comparisons here:

You’d probably need:

SELECT * FROM processors p LEFT JOIN attachments_meta a ON a.digest IN (p.attachments);
1 Like

Thanks.

I’ll try ASAP

1 Like

Hi @matriv thanks again.

Your syntax works as expected; but I need something different.

The result repeats the processors row for every attachment in the vector.

See the screenshot

I need the opposite: one row for every processors and inside the row result all the attachment_meta rows that corresponds to the items in the p.attachments field.

Something like this:

Maybe I was wrong in using the ARRAY(TEXT) for the definition of p.attachments.

I’m new to CrateDB and I have more options than I was used to (I learned SQL on Oracle 4 approx in 1980 - then for a long period I never used a the SQL language).

All I need is to add the metadata of the 0 or more attachments (that are Blobs in another table) in the result when I query for a processor.

Is it possible? I’m learning so completely redesign tables is not a problem.

Thanks for your time.

1 Like

On top of executing the left join, you’d need to wrap it (as a subquery) into an aggregation which then uses the Aggregation - CrateDB: Reference aggregate function to accumulate the attachment meta fields into arrays, something like:
SELECT <col1>, <col2>, ..., array_agg(digest), array_agg(name).. FROM (SELECT * FROM processors p LEFT JOIN attachments_meta a ON a.digest IN (p.attachments)) as tt GROUP BY <col1>, <col2>, etc..;

1 Like

Thanks

I’ll try asap

Hi @matriv it almost worked. Thanks

This query

select definition, properties, array_agg(digest), array_agg(description), array_agg(filename) from (select * FROM processors p LEFT JOIN attachments_meta a ON a.digest IN (p.attachments)) as tt group by definition, properties limit 100

works correctly and gives this result

Now I have to solve a problem: in both tables there is a column called “name” but I was not able to distinguish the column. The syntax I tried gave me an error: tt.p.name or p.name or simply name

There is a syntax to distinguish the two columns?

1 Like

I just had a quick look, I assume that the error you got is AmbiguousColumnException?

You can quickly reproduce that with two tables like:
create table t1 (a integer)
create table t2 (a integer)

select
  outter.a
from
  (
    select
      *
    from
      t1
      left join t2 on t1.a = t2.a
  ) as outter
-- AmbiguousColumnException[Column "a" is ambiguous]

This happens because after the join, both a exists, so if we try to reference one as the error says, it’s ambiguous. If you see the result data you’ll see that you actually get both columns a with values.

To solve this you have to specifically give a name to each column, following my example:

select
  outter.t1a,
  outter.t2a
from
  (
    select
      t1.a as t1a,
      t2.a as t2a
    from
      t1
      left join t2 on t1.a = t2.a
  ) as outter
-- +-----+-----+
-- | t1a | t2a |
-- +-----+-----+
-- | 1  |  2  |
---+----+-----+

If that does not apply to your situation could you please share the query and the exact error? thanks!

1 Like

Thanks @surister, the error was AmbiguousColumnException but, trying to assign names to the ambiguous columns gives other errors.

I’ll try asap and post queries and results.

example based on your tables:

select pname, definition, properties, array_agg(digest), array_agg(description), array_agg(filename), array_agg(aname)
from (
    select p.definition, p.properties, p.name as pname, a.digest, a.description, a.filename, a.name as aname
    FROM processors p LEFT JOIN attachment s_meta a ON a.digest IN (p.attachments)
) as tt
group by pname, definition, properties limit 100;
2 Likes

Thanks @matriv and @surister this worked perfectly.

2 Likes