What do you recommend as a load balancer?


In your documentation at Clustering — CrateDB: Reference, I see the following architecture:

I have two questions:

  • Do you recommend a particular load balancer, e.g. http://www.haproxy.org ?
  • Unlike Figure 5, and similar to Figure 4 in the documentation above, would it be OK to put the Load Balancer between an “Application” instance and CrateDB nodes? I assume that should be OK, because if I understand correctly, any CrateDB node can access any part of the data thanks to replication and shared-nothing architecture, therefore a Load Balancer sending a query from the Application to any one of the CrateDB nodes shouldn’t matter. Is my understanding correct? (Note that this question can be generalized to multiple Application instances, any of them connecting to Load Balancer, and then Load Balancer selecting any one of the CrateDB nodes).

Typically any round-robin tcp load balancer is sufficient.

  • For kuberentes setup, we typically use the built-in kubernetes load balancer
  • haproxy and nginx are used in production environments on-premises (and in cloud setups) - though afaik only nginx plus supports tcp load balancing. Some customers use only the http-endpoint though.
  • For vm setups on Azure / AWS / etc. also the respective cloud load balancers can be used