Why Crate Sql is slower than elasticsearch http API

I am testing the crateDB performance.
There are 78164052 docs in CrateDB on 1 node server.

select “year”,count(hotel_code) as count from f_and_b group by “year” limit 100;
will take SELECT OK, 18 rows in set (6.020 sec)

but when I query same data from es api in the same crateDB:

{
“size”: 0,
“aggs”: {
“group_by_year”: {
“terms”: {
“field”: “year”
}
}
}
}
will response
Jietu20190703-150800

CrateDB uses its very own execution engine so the requests will be executed completely different. In order to make that more clear and avoid confusions (CrateDB is not on top of ES), the ES API shipped with CrateDB was deprecated with v3.1.0 and removed by v4.0.0.
Specially on how to execute aggregations, ES is very different from what CrateDB does.

Which makes the main difference in your case it that the CrateDB SQL is:

  • doing a additional aggregation on the hotel_code field while the ES request isn’t
  • is returning all rows (limit 100) while the ES request is returning 0 (size: 0).
  • it will aggregate all found year terms while ES’s terms aggregation will return only the top ten terms ordered by doc_count.

Beside of that, CrateDB’s execution engine it optimized for distributed execution, so adding nodes will increase performance a lot (which may not completely the same for ES aggregations).