Performance issue

hey, i have some queries they become very slow (~70 seconds). here is one of this:

WITH diff AS (
  SELECT
    ts,
    (
      fields['Acct-Input-Octets'] - COALESCE(
        LAG(fields['Acct-Input-Octets']) OVER (
          PARTITION BY tags['Acct-Session-Id']
          ORDER BY
            ts ASC
        ),
        0
      )
    ) AS input_octets_diff,
    (
      fields['Acct-Output-Octets'] - COALESCE(
        LAG(fields['Acct-Output-Octets']) OVER (
          PARTITION BY tags['Acct-Session-Id']
          ORDER BY
            ts ASC
        ),
        0
      )
    ) AS output_octets_diff
  FROM
    doc.accounting
  WHERE
    tags['Acct-Session-Id'] IS NOT NULL
    AND ts BETWEEN '2024-07-11T13:50:58.452Z' AND '2024-07-12T13:50:58.452Z'
    AND tags['NAS-Identifier'] ~ '(AA:BB:CC:11:22:33|...|44:55:66:DD:EE:FF)'
    AND tags['SSID'] ~ '(SSID1|SSID2)'
  ORDER BY
    tags['Acct-Session-Id'],
    ts ASC
),
sum_window AS (
  SELECT
    date_bin('15 minutes' :: INTERVAL, ts, 0) as ts_15_min,
    SUM(input_octets_diff) AS input_octets_15_min,
    SUM(output_octets_diff) AS output_octets_15_min
  FROM
    diff
  GROUP BY
    ts_15_min
),
sum_window_15_min AS (
  SELECT
    ts_15_min,
    SUM(input_octets_15_min) OVER(
      ORDER BY
        ts_15_min ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS input_octets_15_min_cum,
    SUM(output_octets_15_min) OVER(
      ORDER BY
        ts_15_min ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS output_octets_15_min_cum
  FROM
    sum_window
  ORDER BY
    ts_15_min
)
SELECT
  ts_15_min as time,
  input_octets_15_min_cum,
  output_octets_15_min_cum,
  input_octets_15_min_cum + output_octets_15_min_cum as total
FROM
  sum_window_15_min;

as i can say, the problem seems to be the number of tags['NAS-Identifier'] that i have to check. in this example, i have tried it with regex, but also the same result seems to be if i use IN or ANY. how can i speed up my query with an large number NAS-Identifier to check (in my current setup are ~230)

regards, volker.

Hi,
If the other filters are not very selective and everything really boils down to the check on tags['NAS-Identifier'] , maybe you could use a full text index with a pattern analyzer with an appropriate regex to get every IPv6 address as a token? just an idea

seems to be not the problem to identify the desire NAS-Identifier, the huge data set that will be selected by this query is the problem. sometimes i get this error:

Data too large, data for [collect: 0] would be [644405822/614.5mb], which is larger than the limit of [644245094/614.3mb]

in combination with the given time-slot that can be many data rows! except to pre-process and pre-aggregate the date in a dedicated table, is there are maybe an alternate way?

Hi,
From that error message it seems you may be running with a small heap size?
In general doing pre-aggregations is not something routinely necessary in CrateDB, with the right querying strategy it can be very fast, even on very large amounts of data.
If the filtering on NAS-Identifier were to narrow down the search scope and we could do it efficiently with FTS, that would also help.
Another option could be to unnest the data for all nas identifiers into a temporary table and then continue the querying from there.
It may also be possible to simplify this from the 3 steps at the moment into 2, but we would need to see some sample data to test and find the optimal approach here.
Maybe you want to schedule a call or if you are in CrateDB Cloud raise a support ticket attaching some sample data directly?

As a first countermeasure, I have increased my heap size to 6 GB, which has resolved some queries. However, the performance might still be a bit slow. I have a 4-node on-premise cluster (local SSD, 8 GB RAM, 8 CPUs per node), so scaling up is not an issue. But will it help?

here is one of my current queries:

WITH diff AS (
  SELECT
    ts,
    tags ['Acct-Session-Id'],
    CASE
      WHEN fields ['Acct-Input-Octets'] >= LAG(fields ['Acct-Input-Octets'], 1, 0) OVER (w_session_id_input) THEN fields ['Acct-Input-Octets'] - LAG(fields ['Acct-Input-Octets'], 1, 0) OVER (w_session_id_input)
      ELSE fields ['Acct-Input-Octets'] + (
        4294967296 - LAG(fields ['Acct-Input-Octets'], 1, 0) OVER (w_session_id_input)
      )
    END AS input_octets_diff,
    CASE
      WHEN fields ['Acct-Output-Octets'] >= LAG(fields ['Acct-Output-Octets'], 1, 0) OVER (w_session_id_output) THEN fields ['Acct-Output-Octets'] - LAG(fields ['Acct-Output-Octets'], 1, 0) OVER (w_session_id_output)
      ELSE fields ['Acct-Output-Octets'] + (
        4294967296 - LAG(fields ['Acct-Output-Octets'], 1, 0) OVER (w_session_id_output)
      )
    END AS output_octets_diff
  FROM
    doc.accounting
  WHERE
    tags ['Acct-Session-Id'] IS NOT NULL
    AND ts BETWEEN '2024-07-09T11:10:13.829Z' AND '2024-07-16T11:10:13.829Z'
    AND tags ['NAS-Identifier'] IN ('AA:BB:CC:11:22:33', ... , 'AA:BB:CC:44:55:66')
    AND tags ['SSID'] IN ('SSID1', 'SSID2') WINDOW w_session_id_input AS (
      PARTITION BY tags ['Acct-Session-Id']
      ORDER BY
        ts ASC,
        CASE
          tags ['Acct-Status-Type']
          WHEN 'Start' THEN 1
          WHEN 'InterimUpdate' THEN 2
          WHEN 'Stop' THEN 3
        END,
        fields ['Acct-Input-Octets']
    ),
    w_session_id_output AS (
      PARTITION BY tags ['Acct-Session-Id']
      ORDER BY
        ts ASC,
        CASE
          tags ['Acct-Status-Type']
          WHEN 'Start' THEN 1
          WHEN 'InterimUpdate' THEN 2
          WHEN 'Stop' THEN 3
        END,
        fields ['Acct-Output-Octets']
    )
),
sum_window AS (
  SELECT
    date_bin('15 minutes' :: INTERVAL, ts, 0) as ts_15_min,
    SUM(input_octets_diff) AS input_octets_15_min,
    SUM(output_octets_diff) AS output_octets_15_min
  FROM
    diff
  GROUP BY
    ts_15_min
),
sum_window_15_min AS (
  SELECT
    ts_15_min,
    SUM(input_octets_15_min) OVER(
      ORDER BY
        ts_15_min ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS input_octets_15_min_cum,
    SUM(output_octets_15_min) OVER(
      ORDER BY
        ts_15_min ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS output_octets_15_min_cum
  FROM
    sum_window
  ORDER BY
    ts_15_min
)
SELECT
  ts_15_min AS time,
  input_octets_15_min_cum,
  output_octets_15_min_cum,
  input_octets_15_min_cum + output_octets_15_min_cum AS total
FROM
  sum_window_15_min
ORDER BY
  ts_15_min;

this query takes 2-3 seconds, with 63 NAS-Identifier and one million rows. it is okay for me, but maybe you can see some no-goes to improve this query?

regards, volker.

1 Like

Hi,
Regarding this query there are a couple of things I would try,

  • Is there some field that when set to a certain value would indicate that tags ['Acct-Session-Id'] is not going to be null? if so I would replace the IS NOT NULL filter with an affirmative assertion on that other field instead
  • Rendering Acct-Status-Type as 1,2,3 in a generated column may help, I would try that as well
  • Instead of doing the CASE to differentiate between Acct-Input-Octets going up or down from the previous reading I would try if we get better performance by getting the delta on the first CTE and then adding up positive values separate from negative values and then doing 4294967296 minus the sum of the negative values, if that makes sense
1 Like
  1. a null-check for Acct-Session-Id is just a safety feature, maybe this can also completely omitted. but why should i find another other to prevent a null-check? is a null-check no good idea?

  2. i will check if a generated-column will help…

  3. the case for Acct-Input-Octets is to handle a given input 32bit int overflow. but sorry, i do not really sure if i understand you approach?