Crate python package how to handle geo data?!?

Dear Community

I have a geopandas dataframe and would like to store it in crateDB. I failed with so many approaches and in the end I just did a simple test:

import sqlalchemy as sa
import pandas as pd

engine = sa.create_engine(dburi, echo=True)

with engine.connect() as conn:
    df = pd.read_sql(sql=sa.text("SELECT * FROM sys.summits"), con=conn)
    df.info()
    df.to_sql(schema="geo", name="test", con=conn, if_exists='replace', index=False)

which just result in a error…

ProgrammingError: (crate.client.exceptions.ProgrammingError) SQLParseException[Cannot cast value `[6.86444, 45.8325]` to type `text`]
[SQL: INSERT INTO geo.test (classification, coordinates, country, first_ascent, height, mountain, prominence, range, region) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ? ... 28774 characters truncated ... , ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: ('I/B-07.V-B', [6.86444, 45.8325], 'FR/IT', 1786.0, 4808, 'Mont Blanc', 4695, 'U-Savoy/Aosta', 'Mont Blanc massif', 'I/B-09.III-A', [7.86694, 45.93694], 'CH', 1855.0, 4634, 'Monte Rosa', 2165, 'Valais', 'Monte Rosa Alps', 'I/B-09.V-A', [7.85889, 46.09389], 'CH', 1858.0, 4545, 'Dom', 1046, 'Valais', 'Mischabel', 'I/B-09.III-A', [7.83556, 45.92222], 'CH/IT', 1861.0, 4527, 'Liskamm', 376, 'Valais/Aosta', 'Monte Rosa Alps', 'I/B-09.II-D', [7.71583, 46.10139], 'CH', 1861.0, 4506, 'Weisshorn', 1235, 'Valais', 'Weisshorn-Matterhorn', 'I/B-09.II-A', [7.65861, 45.97639], 'CH/IT', 1865.0, 4478 ... 8900 parameters truncated ... 2600, 'Monte Mars', 409, 'Aosta/Biella', 'Biellese Alps', 'II/A-17.II-E', [13.04167, 47.08917], 'AT', None, 2600, 'Silberpfennig', 374, 'Salzburg', 'Goldberg Group', 'II/A-18.II-A', [13.97694, 47.32], 'AT', None, 2599, 'Großer Knallstein', 548, 'Styria', 'Niedere Tauern', 'II/A-15.VIII-A', [9.58111, 47.05278], 'CH/LI', None, 2599, 'Vorder Grauspitz', 353, 'Graub./Liecht', 'Rätikon', 'II/B-22.II-C', [10.23278, 47.27056], 'AT/DE', 1853.0, 2599, 'Biberkopf', 337, 'N-Tyrol/Bavaria', 'Allgäu Alps', 'I/B-07.V-C', [7.11083, 46.05417], 'CH', None, 2598, 'Le Catogne', 1100, 'Valais', 'Mont Blanc massif')]
(Background on this error at: https://sqlalche.me/e/20/f405)

Well that said I am bit frustrated right now… How can I store and load geo data from a database? (would love to use geopandas)

Regards
Schabi

1 Like

Hi, and thank you for writing in about this topic. We are running a tracking discussion how to improve the situation over there, where I’ve just added your request.

We will respond on both spots (there and here) after looking at the problem at hand, if we can see any improvements to be made.

Hi,
Let me suggest a possible workaround.

geopandas.read_postgis wants a geometry in WKB representation.
If you have a geo_shape in CrateDB of type POLYGON we can convert it to a WKT with this UDF, then you could use another UDF with code along the lines of Boop Scripts: Convert between WKT and WKB · GitHub to convert it to WKB.

In the opposite direction, that same gist has javascript code for getting a WKT representation from a WKB, and WKTs can be directly casted to geo_shapewell_known_text::geo_shape

For to_sql to succeed with a geo_point you could create the table in advance with the right data types and then use if_exists='append', otherwise data can also be inserted in ARRAY or TEXT columns as appropriate and then we can have a derived geo_shape or geo_point column using Generated columns - CrateDB: Reference. Creating the table manually in advance in these cases is interesting anyway as you may want to go for specific geographical indexing options.

I hope this helps.

1 Like