Sharding and partitioning guide for time-series data

The goal of this guide is to support you with building a sharding and partitioning strategy for your time series data.

Let’s start with your time series data which you want to store in a table in CrateDB. You need to consider how to partition your data, how many shards to configure, and how many replicas to use.

Partitioning

Typically you want to partition your time series data based on a time axis. You can achieve this in CrateDB by using the date_trunc function on a timestamp.

An example table with monthly partitioning:

CREATE TABLE computed_parted_table (
  payload OBJECT,
  ts TIMESTAMP WITH TIME ZONE,
  ts_month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', ts)
) PARTITIONED BY (ts_month);

Note that it’s not possible to later change the partition column.

Sharding

To fully optimize all nodes in your cluster you should split your data into multiple shards. These shards are distributed evenly across your cluster and speed up query computation as the workload will be distributed onto multiple nodes.

Your shard size should not be too small and not too big. We suggest keeping the size per shard between 3GB to 70 GB.

An example table with monthly partitioning clustered into 6 shards per partition.

CREATE TABLE computed_parted_table (
  payload OBJECT,
  ts TIMESTAMP WITH TIME ZONE,
  ts_month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', ts)
) PARTITIONED BY (ts_month)
CLUSTERED INTO 6 SHARDS;

Replication

Replication of a table will replicate your shards and store copies of your data on other cluster nodes. These secondary shards are useful for better read performance and continued availability of your data in case one cluster node crashes.

CREATE TABLE computed_parted_table (
  payload OBJECT,
  ts TIMESTAMP WITH TIME ZONE,
  ts_month TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('month', ts)
) PARTITIONED BY (ts_month)
CLUSTERED INTO 6 SHARDS
WITH (number_of_replicas = 1);

We recommend using the default ( 0-1 ) or number_of_replicas=1 for increased resiliency.

How would I choose the partitioning schema and decide on the number of shards?

When deciding on an effective sharding strategy for your use case several factors come into play:

  • amount of time series data you currently write
  • amount of time series data you expect in the future
  • number of nodes in your CrateDB cluster
  • recommended max. 1000 shards per node
  • data retention period

Step 1: Decide on partitioning strategy

First, start with the amount of data you write per given period. This gives you the first indication of how to partition your time series data.

E.g. if you write 500GB data/year, partitioning into half-year or quarter-year partitions would be useful. If you write 2.5TB data/year, consider monthly or bi-weekly partitions.

Please note that you can’t change the partition column later on, so factor in future data growth into your partitioning decision (i.e. opt for a shorter time frame if you expect to double the data within the next year).

Step 2: Decide on the number of shards

Calculate your expected partition size to calculate an appropriate number of shards.

E.g. with 2.5TB data/year and monthly partition, every partition will have approximately 210 GB of data. The recommended shard size is 3 GB up to 70 GB. Use an appropriate number of shards to stay within this limit. E.g. 6 shards will give you a per shard size of 35 GB, 12 shards a per shard size of 17.5 GB, and so on.

It’s useful to consider the number of available nodes and CPUs in your cluster for this as well. As the computation of your queries is typically distributed between your shards you can better utilize your cluster resources and speed up the performance. To optimize these aspects it’s useful to try to match the number of shards to the number of nodes.

Step 3: Validate that you stay within the 1000 shards per node limit

This limit is configurable (see Shard limits) but we recommend staying below 1000 shards per node because higher numbers negatively affect your cluster stability.

Therefore your total available shards are #number of nodes x max_shards_per_node (default 1000). For a 3-node cluster, your max. number of shards would be 3000.

To calculate the expected number of shards for your time series data you calculate:

#number_of_shards_per_partition x #number of replicas x #data_retention_period (x #number_of_partition_columns)

Example:
12 shards per partition, 1 replica, 12 months of data retention, Monthly partitioning, partitioned by ts_month column
12 x 2 x 12 x 1 = 288 shards

Scaling your table with data growth

When your ingested data size increases, you can increase the number of shards for new partitions to stay within the recommended shard size of 3 GB up to 70 GB.

ALTER TABLE computed_parted_table SET (number_of_shards = 12)

Existing partitions are not affected by this query.

Scaling your cluster with additional nodes

If you expect that you will add additional nodes for scaling in the future, you can consider using a higher number of shards when creating the table.

For example, if you have a 3-node cluster with a table with 9 shards, every node will store 3 shards. When you later upgrade your cluster to a 9-node cluster, CrateDB will automatically rebalance the shards and store 1 shard per node.

Performance testing

We recommend always benchmarking your sharding and partitioning strategy and validating that it fulfills your performance requirements.

References

Please make sure to also check out the documentation on this topic, specifically:

7 Likes