Searching DB with where clause on column whose data type is Array

Is there any better way to search on the table, if you want to search on the basis of a column whose type is Array.

WITH tag_sums AS (
    SELECT
        value,
        SUM(amount) AS total_amount
    FROM (
        SELECT 
            unnest(tags) AS value,
            amount
        FROM
            ezetap_demo.txn
    ) subquery
    WHERE
        value IN ('10460', '10461', '5284', '5285','1','2','3','1111176')
    GROUP BY
        value
)
select * from tag_sums order by total_amount ;

Issues: Current query iterates over dataset who size is aroud 8k rows. If i increase the partition date range , the query is taking too much and CPU usage hits 100 %. How can I resolve this issue and please suggest a better way to use WHERE clause on a column whose type is ARRAY( without using unnest).

Hi,
Depending on how selective this is, one way to improve performance could be by adding an additional WHERE clause to the subquery:

WHERE 8<>ARRAY_LENGTH(ARRAY_DIFFERENCE(['10460', '10461', '5284', '5285','1','2','3','1111176'],tags),1)

NB: this in addition to the WHERE clause you already have.

In the below code, if i want to group them by tags , it is taking huge time( 4-5sec) for 70K records. If I do not group them it is done in around 1 sec. Is there any way , to group them based on the value present in the array? Also, when I am grouping them , Let us say that array contains 3 elements: 102,103 ,104 and the array with which I am comparing contains [102,109,110,111,112,113]. As seen, only 102 is common. So once grouping , only 102 should come. However, when we are grouping 102,103,104 are coming in the results.But it should only show 102 . PFA query for your refernce.

WITH tag_sums AS (
            SELECT
                SUM(amount) AS total_amount,
                 value
            FROM (
                SELECT
                   unnest(tags) AS value,
                    amount
                FROM(
                    select tags,amount from ezetap_demo.txn where   createdts_generated  >  '2023-07-01'  and  createdts_generated  <  '2023-10-17'   and org_code =  'ARCHANAMERCHANT' and  partition_date  >=  '2307'
      and  partition_date  <=  '2310'   and  (1<>ARRAY_LENGTH(ARRAY_DIFFERENCE(['10629'],tags),1))
                ) AS tags_table
            ) subquery
      GROUP BY
                      value

        )
        select * from tag_sums;

Please suggest a better alternative to group by tag’s element in array and also to resolve the bug in presenting the array element multiple times.

Hi,
You are getting these additional records because the query is missing one WHERE clause. As mentioned in my previous comment, you need one WHERE clause to lookup the records that have arrays that have one of the values you are interested in, and after unnesting you need another WHERE clause to only pick up the entries you are actually interested in, as the array could have other entries.

I also think the additional step of the tags_table subquery, which was not present in the original example is not really necessary and you could try
removing it.

One more thing you may want to consider is to restructure the table to have the tags in an array of objects:


CREATE TABLE ezetap_demo.txn (tags ARRAY(OBJECT(STRICT) AS (tag int)),amount int);

INSERT INTO ezetap_demo.txn VALUES 
(	['{"tag":5}','{"tag":10460}','{"tag":6}']	,1	)
	,(	['{"tag":5}','{"tag":7}','{"tag":6}']	,1	)
	,(	['{"tag":8888}','{"tag":5284}']	,1	);

WITH tag_sums AS (
    SELECT
        value,
        SUM(amount) AS total_amount
    FROM (
        SELECT 
            unnest(tags)['tag'] AS value,
            amount
        FROM
            ezetap_demo.txn
			WHERE 10460=ANY(tags['tag']) 	OR 10461=ANY(tags['tag'])
			OR 5284=ANY(tags['tag'])		OR 5285=ANY(tags['tag'])
			OR 1=ANY(tags['tag'])			OR 2=ANY(tags['tag'])
			OR 3=ANY(tags['tag'])			OR 1111176=ANY(tags['tag'])
    ) subquery
    WHERE
        value IN (10460,10461,5284,5285,1,2,3,1111176)
    GROUP BY
        value
)
select * from tag_sums order by total_amount ;

Hi,
Thank you for your previous solution.

cr> select count(*),substr(array_to_string(tags,''),0,14) as temp from ezetap.txn where id>'230801'  and org_code='ATOS_PERFMERC_01' and ('AAAA-AAAD-AAAE'=ANY (tags) OR  'AAAA-AAAD-AAAF'=ANY(tags) OR 'AA
    AA-AAAD-AAAG' =ANY(tags) OR 'AAAA-AAAD-AAAH'=ANY(tags) OR 'AAAA-AAAD-AAAI'=ANY(tags) OR 'AAAA-AAAD-AAAJ' =ANY(tags) OR 'AAAA-AAAD-AAAK'=ANY(tags) OR  'AAAA-AAAD-AAAL'=ANY(tags) OR 'AAAA-AAAD-AAAM'=AN
    Y(tags) OR 'AAAA-AAAD-AAAN'=ANY(tags) OR 'AAAA-AAAD-AAAO' =ANY(tags) OR 'AAAA-AAAD-AAAP' =ANY(tags) OR 'AAAA-AAAD-AAAQ'=ANY(tags)  OR 'AAAA-AAAD-AAAR' =ANY(tags) OR 'AAAA-AAAD-AAAS'=ANY(tags)) and pa
    rtition_date  >=  '2308'  and  partition_date  <=  '2310' group by temp;
+----------+----------------+

As you can see , we are trying to optimize the above query. Is there any way we can turn Array into substring? ALso, is there any way where we can remove the OR ( many ORs are there) operator?
NOTE: the array elements size will be one.
Any suggestion if the above restrictions are there?
Please suggest

the array elements size will be one

Do you mean each element in the array in this case is one character?
If that is the case I would recommend just using a TEXT field instead of an ARRAY.