CrateDB by default uses LZ4 as compression for document sources. Further doc values (columnar stores) are delta-encoded, bit-packing and GCD compressed. Tables can also use deflate instead of LZ4 to reduce the storage requirements even further. All that without any “hackery” of combining multiple rows in arrays and using standard encoding techniques like delta-delta-enoding.
Some simple test with some Timescale-provided data sets showed, that CrateDB typically performs better than Timescale without chunk compressing. That being said, that data was rather optimised for delta-delta-encoding and xor-compression.
CREATE TABLE readings (
time TIMESTAMP WITH TIME ZONE NOT NULL,
device_id TEXT,
battery_level DOUBLE PRECISION,
battery_status TEXT,
battery_temperature DOUBLE PRECISION,
bssid TEXT,
cpu_avg_1min DOUBLE PRECISION,
cpu_avg_5min DOUBLE PRECISION,
cpu_avg_15min DOUBLE PRECISION,
mem_free DOUBLE PRECISION,
mem_used DOUBLE PRECISION,
rssi DOUBLE PRECISION,
ssid TEXT
);
Index on device and time for all databases
That being said, with chunk compression enabled Timescale is more efficient (~6GB in this case) for use-cases with very few indexes, but this quickly turns around with higher cardinality data and the use of more indexes. also you can achieve many of the compression characteristics of Timescale by simple using arrays in CrateDB (remember it is mostly a document store). So if you really want to squeeze your data a bit more, most of the timescale “magic” can be achieved by moving data into a second table and using arrays. Also see Optimizing storage for historic time-series data
Another valid option is to use the snapshot mechanism of CrateDB and move old partitions of data further compressed using gzip to a low cost blob-store like S3 or Azure Storage. If you needed that the again, you can restore it with a one-liner
There might arise the question, if the techniques used are so simple, why we just don’t integrate it and the short answer is, that we want to offer our users flexibility how to use CrateDB. The compressions techniques used by timescale limit you what and how you can store it. E.g. JSON data can’t be compressed using those techniques. Further updates, deletes, inserts in compressed chunks are limited.
Nevertheless we ares looking into storage optimisation more and recent results are promising, saving quite a lot of storage without any real performance caveats.