Poor performance with five tables using join

We’re using CrateDB 5.5 with one node, and we’re testing a below sql, while it takes about than 75 seconds. Did I configure something wrong?Any suggestion will be appreciated.

Server info:

processor       : 79
model name      : Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz

$ free -g
              total        used        free      shared  buff/cache   available
Mem:            755          92          61           1         600         658
Swap:            15           0          15

Table size:


select count(1)
    from pgdb_scm.inv_Inv_Post_Detail pd 
    join pgdb_scm.inv_inv_post_line pl on pl.post_line_id = pd.post_line_id
    join pgdb_scm.inv_inv_shipping_dlvy_l dl on pl.dlvy_line_id = dl.line_id
    join pgdb_scm.inv_inv_shipping_notice_l nl on nl.line_id = dl.shipping_line_id
    join pgdb_scm.inv_inv_shipping_notice_h nh on nh.header_id = nl.header_id;


To best review this we would need to see the table definition of all these tables.
But one factor here is probably also going to be the high number of shards in relation to the size of the tables, please take a look at Sharding and partitioning guide for time-series data - Tutorials - CrateDB Community

Hi @hernanc, I changed the number of shards that the five tables to 4, while the query still slow. Please see the attachment included all table’s definition.

table definition

Hi, access to the file seems restricted. If you do not feel comfortable discussing the details publicly here in the community maybe you would like to schedule a call with a CrateDB Customer Engineer?

I’m sorry I forget to grant the permission to access the definition file. And now you can download it. I don’t feel uncomfortable discussing issues in the community, and I will provide you any details you need.

@hernanc Hi, did you have any suggestion on this issue? :grinning:

I reupload related files.
table_ddl.sql (7.5 KB)
crate.yml (25.5 KB)
sql_plan.json (167.0 KB)

Is this data coming from some other system where FK constraints are enforced?
Are the rows created in a certain order?
If so perhaps the count that you are looking for is actually the number of distinct shipping_line_id values (excluding null)?
If instead the result set that you are actually looking for is not a count but some aggregation then we could maybe write the query in a different way, happy to look at this if you want to share further details.
Thank you.