PostgreSQL time-related queries faster than the same queries in CrateDB

Hello,

We are considering to migrate to CrateDB from Postgres as data that we are using are from real-time sensors with timestamps and geospatial information. Crate seems very promising as, according to your homepage, it is very well suited and fast for use cases related to querying timeseries/geo data.

However, at first, I wanted to check performance difference between these two DBs. I’ve imported table with 9.5 millions of rows from Postgres, which contains data from one sensor with timestamps.

Create statement in crateDB:

create table sensor_table(
  id BIGINT,
  category varchar,
  category_id integer,
  speed float,
  entry_zone text,
  exit_zone text,
  ts timestamp,
  date_timestamp text,
  week GENERATED ALWAYS AS date_trunc('week',ts)
) PARTITIONED BY(week);

Create statement in Postgres:

CREATE TABLE IF NOT EXISTS public.sensor_table 
(
    id integer NOT NULL,
    category character varying COLLATE pg_catalog."default",
    category_id integer,
    speed numeric,
    entry_zone text COLLATE pg_catalog."default",
    exit_zone text COLLATE pg_catalog."default",
    ts timestamp without time zone,
    date_timestamp date,
    CONSTRAINT sensor_table_pkey PRIMARY KEY (id)
)

Then, I tried simple SELECT queries both on Crate and Postgres, to select all data by week/month/any data ranges, e.g.:

select * from sensor_table where ts >= '2021-08-20' and ts < '2021-09-20' limit 10000000;

Here it becomes interesting - in Crate (Admin UI console), this query takes ~7 seconds, while in Postgres (pgAdmin4) only ~2.4 seconds.

Crate vs. Postgres query speed:

Both queries were done on the same datasets, on the same local machine, on Windows OS. Only difference is that Crate in addition to other columns contains generated column week, by which table is partitioned (without partitioning, this query took more than 30 seconds btw.). Partitioning by month was slower compared to partition by week.

Postgres table contains no partitions nor indexes. No changes in the configuration after local installation were done.

In Crate, I was trying querying with heap size of 2gb, 4gb, then clustering into 2 shards and 4 shards - results were pretty same in every scenario.

So my question is - why is this happening? Is there some major configuration-related thing I forgot to set? Is this somehow related to OS? Do I have to query/create table in different way? Or do I need to use multi-node cluster to actually see the difference?

How to achieve results where it can be clearly seen that Crate outperforms Postgres (from the perspective of querying)?

As you can imagine, query speed is crucial for our use cases, so it must be at least as efficient as in postgres.
Thank you very much for any reaction.

Best regards,
Pavol

1 Like

Could you run an EXPLAIN ANALYZE before the query. I could imagine that returning 1.5 mio rows takes most of the time? CrateDB performs best when doing the calculation in the database. Returning lots of data to the client might be slower, as data generally is stored compressed in CrateDB.


by which table is partitioned (without partitioning, this query took more than 30 seconds btw.). Partitioning by month was slower compared to partition by week.

I don’t think this is directly related to partioning. It might be, that the table was not refreshed after inserting the data. If you run a query multiple times, you could see performance improvements. Partitioning is typically only needed for bigger datasets. A single shard can easily handle 30-50 GiB of data.

So my question is - why is this happening? Is there some major configuration-related thing I forgot to set?

CrateDB is best / fastest when the result set is small. E.g. doing aggregations probably would be significantly faster.

Or do I need to use multi-node cluster to actually see the difference?

A multi node cluster is definitely where CrateDB shines.

How to achieve results where it can be clearly seen that Crate outperforms Postgres (from the perspective of querying)?

Larger datasets typically. Also be aware that CrateDB indexes every column by default and uses columnar data structures, which can significantly improves search speed and aggregations.

As you can imagine, query speed is crucial for our use cases, so it must be at least as efficient as in postgres.
Thank you very much for any reaction.