Multi-Threaded Inserts on crateDB


I just wanted to ask/confirm the following - if we have a data loader application that starts up multiple threads and inserts say 100,000 records per thread… will crateDB load balance the these inserts across the available crateDB nodes or do we have to spoof the IP address per thread to achieve this?



Typically the query itself gets handled by the CrateDB node you connect with your client.
For inserting, I’d say almost no difference, as the actual query load is low and the data storage and indexing is distributed anyway (if using multiple shards). For high query loads with complex queries it definitely makes sense to distribute across multiple nodes.

Also for automatic failover (HA) it makes sense to load balance the solution.

depending on the setup e.g. k8s load balancer, nginx or haproxy. (or aws / azure load balancer for cloud setups)

some libraries already provide the ability to do provide multiple hosts (e.g. with crate-python it is possible to supply multiple hosts). Also for Npgsql this will be part of v6.0+ (currently in preview)