CrateDB 4.7 adds a new DATE_BIN function, offering greater flexibility for grouping rows into time buckets. This article will show how to use
DATE_BIN to group rows into time buckets and resample the values.
In previous versions, DATE_TRUNC could be used for similar requirements but had its limitations. With
DATE_TRUNC, timestamps can be truncated on different intervals, such as seconds, minutes, hours, and so on. But it is not possible to create time buckets based on more complex intervals, such as 10 seconds, 5 minutes, or similar intervals.
We assume a simple time-series table as a basis:
CREATE TABLE sensor_readings ( time TIMESTAMP WITH TIME ZONE NOT NULL, sensor_id INTEGER NOT NULL, battery_level DOUBLE PRECISION, battery_status TEXT, battery_temperature DOUBLE PRECISION );
The table is filled with a number of rows. The rows follow no clear sampling rate (i.e. are not consistently appearing every minute).
As measurements in our table occur in irregular intervals, we now want to resample them in time buckets of five minutes. For each time bucket, one row should be returned. Different approaches are possible to achieve this downsampling. We will use the latest row in each time bucket as the representative, but other approaches (like calculating the average) are also possible.
This leads to the following query, which:
DATE_BINto generate time buckets of 5 minutes each
- Assigns every row within a time bucket a
ROW_NUMBER, ordered by timestamp descending. This means, that the latest row will be assigned the value
- The query is wrapped in a subquery so that the
row_numbercolumn can be referenced and filtered in the
SELECT ts_bin, battery_level, battery_status, battery_temperature FROM ( SELECT DATE_BIN('5 minutes'::INTERVAL, "time", 0) AS ts_bin, battery_level, battery_status, battery_temperature, ROW_NUMBER() OVER (PARTITION BY DATE_BIN('5 minutes'::INTERVAL, "time", 0) ORDER BY "time" DESC) AS "row_number" FROM doc.sensor_readings ) x WHERE "row_number" = 1 ORDER BY 1 ASC
The output shows that three time buckets have been created based on the table’s rows:
With the addition of
DATE_BIN in CrateDB 4.7, highly flexible intervals can now be used to define time buckets.
DATE_TRUNC can still be used as an alternative for simple time bucketing requirements, if sufficient.