So I’m currently running a 3 node CrateDB cluster for Prometheus long term storage and using the CrateDB adapter provided by crateio. I’ve ran into an issue where Prometheus is attempting to POST data to the remote_write (crate adapter) but it’s unable to. Logs are below from CrateDB adapter
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
time="2020-04-17T08:46:31Z" level=error msg="Failed to POST inserts to Crate." err="context deadline exceeded" source="server.go:332"
Any help would be great i’ve tried increasing the scrape_interval on prometheus and it doesn’t seem to help. I was thinking maybe CrateDB is limiting the number of HTTP connections it can receive but can’t really find any documentation to support that
can you confirm that all docker containers are running? Is the config file for prometheus-adapter correctly applied? What do the logs output for cratedb-prometheus-adapter container?
I just followed the tutorial from scratch and it works perfectely for me locally so I’m a bit puzzeled what doesn’t work on your end.
Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)? Do you start all containers as superuser? Maybe this interferes with the container being able to connect to each other.
How much data do you have in Prometheus?
Error message context deadline exceeded indicates some kind of connection / timeout issue. Maybe the containers can’t see each other. Maybe you transmit too much data and connection times out.
thank you for writing in. I deliberately moved your question to this discussion to keep things in line.
After Ryan originally reported this issue, we took some actions to adjust the TCP keepalive interval and the TCP connect timeout settings, also making the latter configurable.
However, Ryan never reported back if those improvements have been helpful in any way. May I humbly ask you about this, @RyanWN4?
@Florencia_Artegoytia: Maybe you can increase the connect_timeout setting in your adapter’s config.yml as outlined within [1], bounce your containers and report back about any improvements you might be able to observe?
Can you check if you use latest version of Prometheus (v2.34.0) and CrateDB (v4.7.1)?
Yes I’m using the latest version.
Do you start all containers as superuser?
Yes.
I think the problem is, the containers don’t see each other. I tried to ping one of the networks and I get no response. My docker-compose file is the same as the tutorial.
While it shouldn’t make much of a difference, can I humbly ask you to try again using the setup shared within the repository and also share the corresponding software versions of your environment with us?
we are currently testing CrateDB as a long term storage backend to Prometheus. At the moment, we are remote_writing a small subset of metrics via cratedb-prometheus-adapter-0.4.0 into a stand-alone test instance of crate 5.3.4.
After a few weeks, the DB holds about 835M records (130GiB) and so far it does not look too bad.
However, if we try to run a long query via Prometheus’ data explorer, we end up with a time-out after more or less exactly one minute
remote_read: remote server http://adapter:9268/read returned HTTP status 500 Internal Server Error: context deadline exceeded
At the moment, I’m unsure how to proceed/where to look for this specific time-out.
Thanks for any pointer
Carsten
PS: If it matters: CrateDB is a bare metal installation with upstream’s Debian packages 5.3.4-1~bookworm and the adapter was just downloaded via github and started with a systemd service file.
thank you for writing in, and your excellent report.
After evaluating your observations, and revisiting this topic, I think what is missing in CrateDB Prometheus Adapter, is to be able to properly configure the TCP read timeout. Currently, only the TCP connect timeout is configurable.
It looks like unlocking SetReadDeadline and SetWriteDeadline on the net.Conn object would be right approach for this.
from what little I understand, that seems to be the correct way, but given that I’ve never written any go code and and programming beyond simple scripting is not my core strength, I would defer that decision to you and/or other experts.
That being said, I should be able to quickly test any changes if you could provide a test binary (or I will start looking into how to build go binaries from the git repo).
Thanks for your offer to be a canary in this regard ;]. I will get back to you as soon as there will be something to test. I can’t promise anything for this week, but I will try to get into it next week latest.
there is no release or pre-release yet, but at least we have been able to modernize the code base a bit. Other than this, there are two patches specifically addressing performance topics which are probably relevant for you and @RyanWN4.
we addressed a few of the most prominent performance issues with the most recent release, CrateDB Prometheus Adapter 0.5.0, see also the release notes.
If you still want to give it a try, we will be happy to hear about your feedback, and if the improvements will resolve the issues you have been running into. Thank you very much!