Caller=server.go:349 level=error msg="Failed to write data to CrateDB" err="error closing write batch: timeout: context deadline exceeded"

Have configured a single node CrateDB cluster with cratedb-prometheus-adapter as a proof of concept for LTS of Prometheus metrics.

The following error is produced after starting the adapter:

caller=server.go:349 level=error msg=“Failed to write data to CrateDB” err=“error closing write batch: timeout: context deadline exceeded”

Here are the versions in play:
Prometheus - 2.36.2
CrateDB Prometheus Adapter - 0.5.1
CrateDB version - 5.7.1-1

And here is the adapter config:

cratedb_endpoints:

  • host: xx.xx.xx.xx
    port: 5432
    user: admin
    password: “xxxxx”
    schema: “”
    max_connections: 0
    read_pool_size_max: 0
    write_pool_size_max: 0
    connect_timeout: 10
    read_timeout: 5
    write_timeout: 5
    enable_tls: false
    allow_insecure_tls: false

Any suggestions what can be tuned?

Could you check that:
[1] CrateDB is up and running
[2] CrateDB is reachable from where cratedb-prometheus-adapter is being run with the same credentials?

Hi, yes, CrateDB is up and reachable. We do have a large number of metrics already ingested; however, the adapter seems to baulk after approx 20 seconds.

  1. You start the program, it ingests correctly and after 20s you get the timeouts.
  2. While you are getting the timeouts, CrateDB is reachable from the service’s environment, issuing for example a “SELECT 1;” returns alright.

Are these two assertions correct?

While you are getting the timeouts, is there anything weird in the admin UI graphs? No peaks? Red shards or alerts?

Dear Tony,

thank you for writing in. I think we had a similar report at Failed to POST/GET data from CrateDB: Croaks with err="context deadline exceeded" · Issue #33 · crate/cratedb-prometheus-adapter · GitHub. Therefore, we added the connect_timeout, read_timeout, and write_timeout options.

Can you try to increase the values for the timeout parameters so that the Prometheus Adapters does not quit the connection too early? When it’s taking too long to evaluate your query, you may need to optimize it, or your data schema / cluster configuration.

Please let us know if that improves the situation for you.

With kind regards,
Andreas.

The assertions are correct. I’ve increased the connect_timeout, read_timeout and write_timeout values which has helped massively.

The server running CrateDB and the adapter is behind multiple firewalls and I don’t have access to the UI. Are there SQL queries I can run to capture CrateDB performance stats?

1 Like

Thanks Andreas, increasing connect_timeout, read_timeout and write_timeout values has improved things. I’ve only captured a single error in a hour with ingestion at the same rate. I’ll continue to tweak the parameters and monitor.

1 Like