The documentation says to initialize a connection and create cursors and then use them to execute queries and fetch data. Can multiple cursors been created and do jobs parallel? Or is there another approach to parallelize executing queries?
The crate-python library is thread-safe only at the module level, not at the connection or cursor level. Additionally, since the DB API lacks an asynchronous specification, it does not yet support executing queries asynchronously. To use crate-python in parallel, you must open multiple connections.
If you need async execution, you might want to look into using asyncpg or psycopg3 using the pg wire protocol port (5432)
The documentation helped me to fetch some data from the database.
How would you establish a pool feature from the asyncpg package?
Some Background: I use streamlit in combination with crateDB. streamlit provide some drivers for different database solutions. I would implement one for crateDB, but not sure how it has to be done properly.
Excellent. Thanks. Please let us know any time where details can be improved.
Sorry for not displaying a verbatim example demonstrating this use case. Are you able to make it work by combining relevant details appropriately, or do you observe any errors when doing so?
We also would like to see CrateDB working well together with Streamlit, so we appreciate you sharing more details about your use case with us. Did you already try to make it work using Streamlit’s PostgreSQL database adapter?
Thank you for sharing your outcome. This error message usually indicates some other error happened where CrateDB does not exactly behave like PostgreSQL, the error is propagated to the application, and it issues a ROLLBACK, which also does not work but finally bubbles up the stack to be presented to the user.
In other words, another query fails, and the error report is getting masked by the exception you’ve shared.
@proddata was so kind to whip up a quick example at GitHub - proddata/streamlit-cratedb. It looks like it validates that CrateDB is able to work successfully with Streamlit. You may want to exercise this application and maybe start from there when applicable?
On the side of support for asyncio, because Streamlit uses SQLAlchemy to connect to SQL databases, that other patch may well be able to make a difference if async operations on the database level are supported by Streamlit: Dialect: Add support for `asyncpg` and `psycopg3` drivers by amotl · Pull Request #11 · crate/sqlalchemy-cratedb · GitHub. It hasn’t been mainlined yet, but the PR description includes instructions how to install the improvement directly from the feature branch. Documentation is missing yet, but the change log item conveys the ingredients of the patch:
Added support for psycopg and asyncpg drivers, by introducing the crate+psycopg://, crate+asyncpg://, and crate+urllib3:// dialect identifiers. The asynchronous variant of psycopg is also supported.
Thanks for your help Georg and Andreas!
Not sure if I should split this topic and if how it should be done… There are kind of 2 problems and 2 solutions
You are welcome. Please ask further questions any time. Based on your suggestions, and @proddata’s investigations, we added those two resources to our documentation and code snippets.
No worries about splitting this. While I admit the conversation mixes different topics, I think it will be too difficult to split in this case. If you want, you may want to rephrase the title and fragments of your original post, at your disposal.
do you know how to add multiple hosts so that a kind of load balancer is in place?
Or do I need to setup a load balancer manually on the cluster and use that access point?
I think connection() takes kwargs** and sqlalchemy-cratedb supports a servers argument, but I am not sure how they need to be provided. Maybe @amotl knows
I just added the servers list to the toml file but host have to be included as well. Due to that I have no information if any load balancing will be happening or not. @amotl Andreas is there any way to get that information? I found that method active_servers() within the source code of the http client but I think I have no access from within streamlit:
So, effectively, just use connection.connection.dbapi_connection.client.active_servers, where the first connection symbol is a reference to an sa.engine.Connection object, like outlined in the example above.
This results in only one element, which I specified in the toml file under host.
The servers property will be ignored and if I try to add multiple hosts this also fails from streamlit side. I also tried a stupid approach to add servers with the append method:
Hi Andreas
There is a work around so it is not that important.
→ I think I will go for a load balancer and use that as entry point for the cluster. Do you have a recommendation?
Regards