What are the compression performance and charasteristics compared to TimescaleDB?

proddata · September 9, 2021, 8:32pm

CrateDB by default uses LZ4 as compression for document sources. Further doc values (columnar stores) are delta-encoded, bit-packing and GCD compressed. Tables can also use deflate instead of LZ4 to reduce the storage requirements even further. All that without any “hackery” of combining multiple rows in arrays and using standard encoding techniques like delta-delta-enoding.

Some simple test with some Timescale-provided data sets showed, that CrateDB typically performs better than Timescale without chunk compressing. That being said, that data was rather optimised for delta-delta-encoding and xor-compression.

CREATE TABLE readings (
    time  TIMESTAMP WITH TIME ZONE NOT NULL,
    device_id  TEXT,
    battery_level  DOUBLE PRECISION,
    battery_status  TEXT,
    battery_temperature  DOUBLE PRECISION,
    bssid  TEXT,
    cpu_avg_1min DOUBLE PRECISION,
    cpu_avg_5min DOUBLE PRECISION,
    cpu_avg_15min DOUBLE PRECISION,
    mem_free DOUBLE PRECISION,
    mem_used DOUBLE PRECISION,
    rssi  DOUBLE PRECISION,
    ssid  TEXT
);

Index on device and time for all databases

That being said, with chunk compression enabled Timescale is more efficient (~6GB in this case) for use-cases with very few indexes, but this quickly turns around with higher cardinality data and the use of more indexes. also you can achieve many of the compression characteristics of Timescale by simple using arrays in CrateDB (remember it is mostly a document store). So if you really want to squeeze your data a bit more, most of the timescale “magic” can be achieved by moving data into a second table and using arrays. Also see Optimizing storage for historic time-series data

Another valid option is to use the snapshot mechanism of CrateDB and move old partitions of data further compressed using gzip to a low cost blob-store like S3 or Azure Storage. If you needed that the again, you can restore it with a one-liner

There might arise the question, if the techniques used are so simple, why we just don’t integrate it and the short answer is, that we want to offer our users flexibility how to use CrateDB. The compressions techniques used by timescale limit you what and how you can store it. E.g. JSON data can’t be compressed using those techniques. Further updates, deletes, inserts in compressed chunks are limited.

Nevertheless we ares looking into storage optimisation more and recent results are promising, saving quite a lot of storage without any real performance caveats.

Topic		Replies	Views
CrateDB partitioned table vs. TimescaleDB Hypertable CrateDB fundamentals	4	485	February 1, 2024
Optimizing storage for historic time-series data Tutorials performance , data-storage	10	3429	June 22, 2022
PostgreSQL time-related queries faster than the same queries in CrateDB SQL	1	922	December 1, 2021
Can somebody clarify the Senseforce benchmarks with respect to TimescaleDB? CrateDB	1	545	September 9, 2021
Is crate time series or search engine CrateDB	1	472	June 3, 2022

What are the compression performance and charasteristics compared to TimescaleDB?

Related topics