Data path on s3 storage

Taoufik_Dachraoui · March 12, 2023, 12:03pm

Hi

currently the data path must be in a local filesystem. I would like to define data path to be in S3 (or GCP) storage; this could be difficult to do (depending on code); do you think it is possible to add this possibility (configure data path to be stored on s3 storage)?

any reference to github code where data is written to disk is welcome
Thanks

proddata · March 12, 2023, 12:18pm

I would not recommend doing this at all for various reasons, one of them being that CrateDB persists a tranlsog (wal) for every operation. Another, that iops and therefore performance would most likely be terrible.

However if you still want to try it you could use something like S3FS-Fuse or equivalent for kubernetes pvcs

Taoufik_Dachraoui · March 12, 2023, 1:14pm

yes, I considered s3fs but was wondering about performance; did you try it (for large data)? also, since s3fs have some limitations (i.e. random access, …), does cratedb works without any issues with s3fs?

amotl · March 13, 2023, 6:06pm

Dear Taoufik,

CrateDB uses memory-mapped files for accessing the filesystem. While you could put the filesystem on a remote system and connect it over the network, it will suffer severely from latency problems, which is probably not the right thing to do when running a database ^[1].

Other than latency and throughput issues ^[2]^[3]^[4], there will probably be also severe concurrency issues, eventually leading to data corruption, because databases are allowed to write data.

If you are only looking at optimizing the read path to your data, you may want to look at solutions/technologies like using a sparse index or Zarr, but both are usually only applied to more specific data domains, and are not suitable for general purpose databases.

In general, we recommend to use fast local-attached SSD disks for running CrateDB, to avoid any network roundtrips.

With kind regards,
Andreas.

Topic		Replies	Views
How to export data using `COPY TO` to non-Amazon object storage? CrateDB	2	657	November 22, 2021
CrateDB Cloud News: Introducing Direct Data Import from S3 to CrateDB Cloud CrateDB Cloud	0	342	October 18, 2023
File system recommendation for production systems? CrateDB	1	757	September 17, 2021
Still new to CrateDB. Have some basic quesions Configuration	5	949	January 18, 2022
CrateDB from On-promise to Cloud data migration CrateDB Cloud	3	983	November 26, 2019

Data path on s3 storage

Related topics