Resampling time-series data with DATE_BIN

Introduction

CrateDB 4.7 adds a new DATE_BIN function, offering greater flexibility for grouping rows into time buckets. This article will show how to use DATE_BIN to group rows into time buckets and resample the values.

In previous versions, DATE_TRUNC could be used for similar requirements but had its limitations. With DATE_TRUNC, timestamps can be truncated on different intervals, such as seconds, minutes, hours, and so on. But it is not possible to create time buckets based on more complex intervals, such as 10 seconds, 5 minutes, or similar intervals.

Data model

We assume a simple time-series table as a basis:

CREATE TABLE sensor_readings (
   time TIMESTAMP WITH TIME ZONE NOT NULL,
   sensor_id INTEGER NOT NULL,
   battery_level DOUBLE PRECISION,
   battery_status TEXT,
   battery_temperature DOUBLE PRECISION
);

The table is filled with a number of rows. The rows follow no clear sampling rate (i.e. are not consistently appearing every minute).

Time-bucketing and resampling

As measurements in our table occur in irregular intervals, we now want to resample them in time buckets of five minutes. For each time bucket, one row should be returned. Different approaches are possible to achieve this downsampling. We will use the latest row in each time bucket as the representative, but other approaches (like calculating the average) are also possible.

This leads to the following query, which:

  1. Uses DATE_BIN to generate time buckets of 5 minutes each
  2. Assigns every row within a time bucket a ROW_NUMBER, ordered by timestamp descending. This means, that the latest row will be assigned the value 1.
  3. The query is wrapped in a subquery so that the row_number column can be referenced and filtered in the WHERE clause
SELECT ts_bin,
       battery_level,
       battery_status,
       battery_temperature
FROM (
  SELECT DATE_BIN('5 minutes'::INTERVAL, "time", 0) AS ts_bin,
         battery_level,
         battery_status,
         battery_temperature,
         ROW_NUMBER() OVER (PARTITION BY DATE_BIN('5 minutes'::INTERVAL, "time", 0) ORDER BY "time" DESC) AS "row_number"
  FROM doc.sensor_readings
) x
WHERE "row_number" = 1
ORDER BY 1 ASC

The output shows that three time buckets have been created based on the table’s rows:

Conclusion

With the addition of DATE_BIN in CrateDB 4.7, highly flexible intervals can now be used to define time buckets. DATE_TRUNC can still be used as an alternative for simple time bucketing requirements, if sufficient.

3 Likes