Auto increment primary key to identify rows insertion order

Hello guys,

CrateDB offers lots of options in terms of auto generated columns, system fields, text based row identifiers. This is great, but I didn’t find in the documentation how to identify the rows based on the insertion order.

The system field _ID is of type text, and if you query a table with ORDER BY _ID DESC you don’t get the oldest to newest rows inserted.

My use case is quite complex. Basically I have a cluster with x nodes that ingest data into CrateDB at it’s own pace. So I can’t relay on a timestamp field to paginate and sort the results from newest to oldest or oldest to newest because each node can insert multiple rows with the same timestamp. I referring to a cursor based pagination with next (pointer field value < cursor value) and previous (pointer field value > cursor value) conditions.

In the use case I presented above, caching the row ID on the client side, and increment it for each row in the batch insert request to CrateDB, works if the cluster has only one node active.

My question is:

It is possible to use something similar to SERIAL type in PostgreSQL when creating the tables?

Maybe use auto generated columns with auto increment capabilities, create a sequence on the server side to generate the next row ID, or use a trigger.

Any feedback, ideas, or alternatives to achieve this is highly appreciated.

Thank you

1 Like

CrateDB - as a highly scalable database designed to handle hundreds of nodes - does not support a global auto-increment or serial data type. This is because such a feature would necessitate synchronization among all nodes during data ingestion. The strength of CrateDB lies in its ability to allow nodes, and even individual shards within nodes, to operate largely independently and concurrently. A global counter would hinder this parallel processing, undermining the core advantages of CrateDB.

You could maybe use a generated timestamp field in CrateDB on a table level, like

  my_val DOUBLE,