Trino (formerly known as Presto SQL) is a distributed query engine, that allows running analytical queries across different data sources via SQL. One of those data sources can be CrateDB and this article is going to look at how to configure the connection.
Prerequisites
We assume a Trino client/server installation is already in place as per Trino’s installation instructions.
For this post, I installed Trino on macOS using Homebrew with brew install trino
and my installation directory is /usr/local/Cellar/trino/375
. Depending on your installation method, there might be different ways to start the Trino server. For the sake of this post, I start it in my console from the installation directory with the command ./bin/trino-server run
. Your preferred way of starting might differ.
Connector configuration
Due to CrateDB’s PostgreSQL protocol compatibility, we can make use of Trino’s PostgreSQL connector. Create a new file /usr/local/Cellar/trino/375/libexec/etc/catalog/postgresql.properties
to configure the connection:
connector.name=postgresql
connection-url=jdbc:postgresql://<CrateDB hostname>:5432/
connection-user=<CrateDB username>
connection-password=<CrateDB password>
insert.non-transactional-insert.enabled=true
Please replace the placeholders for the CrateDB hostname, username, and password to match your setup. Besides the connection details, the configuration has two particularities:
- No database name: With PostgreSQL, a JDBC connection URL usually ends with a database name. We intentionally omit the database name when connecting to CrateDB for compatibility reasons.
CrateDB consists of a single database with multiple schemas, hence we do not specify a database name in theconnection-url
. If a database name is specified, you will run into an error message on certain operations (ERROR: Table with more than 2 QualifiedName parts is not supported. Only <schema>.<tableName> works
). - Disabling transactions: Being a database with eventual consistency, CrateDB doesn’t support transactions. By default, the PostgreSQL connector will wrap
INSERT
queries into transactions and attempt to create a temporary table. We disable this behavior with theinsert.non-transactional-insert.enabled
parameter.
Running queries against CrateDB
Once the PostgreSQL connector is configured, we can connect to the Trino server using its CLI:
# schema refers to an existing CrateDB schema
$ ./bin/trino --catalog postgresql --schema doc
trino:doc>
A SHOW TABLES
query should successfully list all existing tables in the specified CrateDB schema and you can proceed with querying them.
As CrateDB differs in some aspects from PostgreSQL, there are a few particularities to consider for your queries:
- Querying
OBJECT
columns: Columns of the data typeOBJECT
can usually be queried using the bracket notation, e.g.SELECT my_object_column['my_object_key'] FROM my_table
. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such asSELECT "my_object_column['my_object_key']" FROM my_table
. -
INSERT
queries: When inserting, Trino addresses tables withcatalog_name.schema_name.table_name
, which currently isn’t supported by CrateDB. Please see crate/crate#12658 on addressing this issue. - Data types: Not all of Trino’s data types can be mapped to CrateDB data types and vice versa.
- For creating tables, it can be advisable to run the
CREATE TABLE
statement directly in CrateDB. This approach is also recommended if you want to configure custom table settings, such as sharding, partitioning, or replication. - For querying tables, a strategy can be to create views preparing data in a Trino-compatible way. For example, when dealing with the
GEO_POINT
data type, using the functionsLONGITUDE
andLATITUDE
, splittingGEO_POINT
into two simple, numerical values. - Columns with data types that cannot be mapped are skipped by Trino when importing metadata. This means that such columns cannot be queried through Trino. Creating a view can be a workaround (see the previous bullet point).
- For creating tables, it can be advisable to run the
- There are limitations in Trino on what parts of a query are pushed down to the data source. Therefore, the performance of a query can decrease significantly when running it through Trino compared to running it on CrateDB directly.
Conclusion
With a few parameter tweaks, Trino can successfully connect to CrateDB. The information presented in this post is the result of a short compatibility test and is likely not exhaustive. If you use Trino with CrateDB and are aware of any additional aspects, please let us know!