Hello,
I’m aware of Optimizing storage for historic time-series data, but in this specific question, I’m not interested data retention or moving old (cold) data to another table, etc.
I want to understand better what kind of basic optimizations/techniques should be used when creating a time series table with the following structure, stored on a 3-node CrateDB cluster, each node with 64 GB RAM , 16 CPU cores and 500 GB storage:
CREATE TABLE IF NOT EXISTS emre.test_table_001 (
time timestamptz NOT NULL,
location_id text NOT NULL,
device_id text NOT NULL,
sensor_reading_01 double precision,
sensor_reading_02 double precision,
...
sensor_reading_50 double precision,
);
Combination of (time, location_id, device_id)
is unique .
Let’s assume that:
- this table stores sensor readings from various devices identified by
device_id
, residing at a geographical location identified by alocation_id
. - the sensor data is sent to a .NET application, and this .NET application INSERTs data into this table every few seconds.
- other .NET applications SELECT from this table to show various dashboards to users, whenever users request these via a web application.
- a user is generally interested in the time series data for 1 or more devices in a single location in a given web page request.
- currently there are about ~80 devices scattered to ~50 different locations, sending sensor readings every second.
- ‘ hot data ’ should be available for about 3 months, meaning, currently hot data is about 100 million rows , but as new locations and devices are added, the amount of ‘ hot data ’ is estimated to be on the order a few billion rows .
Based on these assumptions and the ‘naive’ table CREATE command above, what would CrateDB experts say, with respect to the following topics and table parameters:
- Indexing: as far as I understand CrateDB creates indexes automatically, but coming from a PostgreSQL background, I wonder if I should also create some other indexes manually based on the conditions specified in WHERE queries in .NET applications?
- CLUSTERED parameter as described in CREATE TABLE — CrateDB: Reference
- PARTITIONED BY parameter as described in CREATE TABLE — CrateDB: Reference
- number-of-routing-shards, as described in CREATE TABLE — CrateDB: Reference