Introduction
CrateDB 4.7 adds a new DATE_BIN function, offering greater flexibility for grouping rows into time buckets. This article will show how to use DATE_BIN
to group rows into time buckets and resample the values.
In previous versions, DATE_TRUNC could be used for similar requirements but had its limitations. With DATE_TRUNC
, timestamps can be truncated on different intervals, such as seconds, minutes, hours, and so on. But it is not possible to create time buckets based on more complex intervals, such as 10 seconds, 5 minutes, or similar intervals.
Data model
We assume a simple time-series table as a basis:
CREATE TABLE sensor_readings (
time TIMESTAMP WITH TIME ZONE NOT NULL,
sensor_id INTEGER NOT NULL,
battery_level DOUBLE PRECISION,
battery_status TEXT,
battery_temperature DOUBLE PRECISION
);
The table is filled with a number of rows. The rows follow no clear sampling rate (i.e. are not consistently appearing every minute).
Time-bucketing and resampling
As measurements in our table occur in irregular intervals, we now want to resample them in time buckets of five minutes. For each time bucket, one row should be returned. Different approaches are possible to achieve this downsampling. We will use the latest row in each time bucket as the representative, but other approaches (like calculating the average) are also possible.
This leads to the following query, which:
- Uses
DATE_BIN
to generate time buckets of 5 minutes each - Assigns every row within a time bucket a
ROW_NUMBER
, ordered by timestamp descending. This means, that the latest row will be assigned the value1
. - The query is wrapped in a subquery so that the
row_number
column can be referenced and filtered in theWHERE
clause
SELECT ts_bin,
battery_level,
battery_status,
battery_temperature
FROM (
SELECT DATE_BIN('5 minutes'::INTERVAL, "time", 0) AS ts_bin,
battery_level,
battery_status,
battery_temperature,
ROW_NUMBER() OVER (PARTITION BY DATE_BIN('5 minutes'::INTERVAL, "time", 0) ORDER BY "time" DESC) AS "row_number"
FROM doc.sensor_readings
) x
WHERE "row_number" = 1
ORDER BY 1 ASC
The output shows that three time buckets have been created based on the table’s rows:
Conclusion
With the addition of DATE_BIN
in CrateDB 4.7, highly flexible intervals can now be used to define time buckets. DATE_TRUNC
can still be used as an alternative for simple time bucketing requirements, if sufficient.
Want to learn more about working with time series data in CrateDB? Our Advanced Time Series course is a free, online course that will teach you all you need to know. Earn a certificate of completion to share on your LinkedIn profile! Enroll at learn.cratedb.com.