Unix socket instead of port for http/rest transport (and other performance suggestions)

Hello Crate community. First of all, congratulations on CrateDB. Its unique.

I would like to know if it’s possible to use a unix socket path instead of a http port for the REST interface, because it is able to provide half of the latency and twice the performance, with zero changes both on server and client. [1][2]

Also, I cordially ask your opinion about how the byte-serialized POJO used internally on CrateDB would compare to the different data serialization strategies presented on the FlatBuffers benchmark test.

Because, in case it doesn’t fit on the “raw structs” strategy, it seems that CrateDB could benefit from using FlatBuffers, since it’s orders of magnitude faster than any other strategy.

FlatBuffers C++ library uses just 15kB, and there are implementations in 14 languages, including Java.

[1] blog.myhro.info/2017/01/how-fast-are-unix-domain-sockets
[2] redis .io/docs/management/optimization/benchmarks/

Since I’m a new user, I cannot create a topic with more than 2 links

1 Like

Dear Paulo,

thank you for writing in, welcome to the community, and apologies for the late reply.

Thank you for the kind words. :sparkles: [1]

UNIX sockets

I’ve consulted with the database team, and @Baur came back with this response:

ES issue for unix sockets and comment why they don’t want it.

Those two tickets provide further discussions about why Elasticsearch and OpenSearch do not support UNIX sockets. The same holds true for their sister CrateDB.

FlatBuffers

This sounds interesting, but I don’t know if that or something similar has been considered by the database team, or if it would be up for consideration. Maybe @smu or @matriv are able to answer this?

With kind regards,
Andreas.


  1. We are always happy to learn about where and how CrateDB is used, specifically by people who value its uniqueness. So, if you can share your use case or application, we will be all ears to hear about it. :rocket: ↩︎

Flatbuffers sounds interesting! Up to now, we haven’t considered changing the serialization, I’ll bring it up for discussion with the team.

Thank you @paulocoghi!

@paulocoghi First of all, thank you again for your suggestions!
After some discussion with the team, we decided not to invest time in the near future to investigate serialization/de-serialization improvements.

The reason is that so far we haven’t noticed a bottleneck in this area, which justifies a time investment to improve things. If you see, in the Flatbuffer benchmarks you’ve posted, CrateDB is more in the raw struct case, and from investigations on slow queries or inserts/updates, we see that the time spent for serialization/deserialization is in the order of microseconds, when the bottleneck in other areas, like the query execution engine, or Lucene, is in the order of seconds. We have thoughts though to optimize the content that we send around in certain cases, which can reduce both the time spent for serialization/deserialization and the network bandwidth required.