What are the basic optimization techniques/parameters for a time series table in this example?

Emre_Sevinc · October 7, 2021, 11:17am

Hello,

I’m aware of Optimizing storage for historic time-series data, but in this specific question, I’m not interested data retention or moving old (cold) data to another table, etc.

I want to understand better what kind of basic optimizations/techniques should be used when creating a time series table with the following structure, stored on a 3-node CrateDB cluster, each node with 64 GB RAM, 16 CPU cores and 500 GB storage:

CREATE TABLE IF NOT EXISTS emre.test_table_001 (
  time timestamptz NOT NULL,
  location_id text NOT NULL,
  device_id text NOT NULL, 
  sensor_reading_01 double precision,
  sensor_reading_02 double precision,
  ...
  sensor_reading_50 double precision,
);

Combination of (time, location_id, device_id) is unique.

Let’s assume that:

this table stores sensor readings from various devices identified by device_id, residing at a geographical location identified by a location_id.
the sensor data is sent to a .NET application, and this .NET application INSERTs data into this table every few seconds.
other .NET applications SELECT from this table to show various dashboards to users, whenever users request these via a web application.
a user is generally interested in the time series data for 1 or more devices in a single location in a given web page request.
currently there are about ~80 devices scattered to ~50 different locations, sending sensor readings every second.
‘hot data’ should be available for about 3 months, meaning, currently hot data is about 100 million rows, but as new locations and devices are added, the amount of ‘hot data’ is estimated to be on the order a few billion rows.

Based on these assumptions and the ‘naive’ table CREATE command above, what would CrateDB experts say, with respect to the following topics and table parameters:

Indexing: as far as I understand CrateDB creates indexes automatically, but coming from a PostgreSQL background, I wonder if I should also create some other indexes manually based on the conditions specified in WHERE queries in .NET applications?
CLUSTERED parameter as described in CREATE TABLE
PARTITIONED BY parameter as described in CREATE TABLE
number-of-routing-shards, as described in CREATE TABLE

Topic		Replies	Views
What would be the basic optimization techniques/parameters for a time series table in this example? CrateDB	1	992	October 14, 2021
Optimizing storage for historic time-series data Tutorials performance , data-storage	10	3344	June 22, 2022
Starting out with cratedb CrateDB	5	1020	May 31, 2022
Sharding and partitioning guide for time-series data Tutorials sql , fundamentals , getting-started , performance	0	4882	July 2, 2021
Table creation considerations CrateDB	7	539	July 14, 2022

What are the basic optimization techniques/parameters for a time series table in this example?

Related topics