I was fascinated by Cratedb and I’m trying it (as a complete newbie) to see if it’s possible to use it instead of postgresql, especially with time series.
I have a query that works perfectly in Postgresql but when run on CrateDB it throws an error.
SELECT DISTINCT ON (device_id) device_id, status, utime
FROM Mydb.logs
WHERE utime BETWEEN ‘2022-01-01 00:00:00’ and ‘2022-01-02 23:59:59’
‘utime’ is the datetime as CrateDB TIMESTAMP type.
The error is : SQLParseException[line 1:17: no viable alternative at input 'SELECT DISTINCT ON']
Hi Luca,
CrateDB does not currently support DISTINCT ON , but you can achieve the same results with a query like:
SELECT t.device_id,t.status,t.utime
FROM (
SELECT DISTINCT device_id,status,utime
,row_number() OVER (
PARTITION BY device_id
ORDER BY device_id,status,utime
) AS rn
FROM Mydb.logs
WHERE utime BETWEEN '2022-01-01 00:00:00' AND '2022-01-02 23:59:59'
) AS t
WHERE rn = 1
ORDER BY device_id,status,utime
Please note you can change the ORDER BY to meet your needs.
I hope this helps.
Thank you very much for the solution and for the speed with which you replied !!
I’m liking CrateDB more and more, thanks also to the community.
Sorry if I take advantage of your courtesy, would you be kind enough to tell me how to modify your solution to locate the record with the timestamp closest to the second date or closest to a single specific date, without defining a time frame via BETWEEN … AND …) ?
No worries at all, and apologies I got busy with something else and it took me a while to come back to you.
To get, for each device_id, the record with the utime closest to the upper boundary of the filter on utime, we could modify the ORDER BY in the row_number() function to ORDER BY utime desc
Starting in CrateDB 5.2.0 (which is currently in testing - not ready for production use - docker image crate/crate:5.2.0) there are new functions available (max_by and min_by) which allow to simplify the query to:
SELECT device_id,max_by(status,utime),max(utime)
FROM Mydb.logs
WHERE utime BETWEEN '2022-01-01 00:00:00' AND '2022-01-02 23:59:59'
GROUP BY device_id
ORDER BY 1,2,3;
Regarding distance to a specific point in time, would you be looking for the closest record regardless of it being before or after the point?
Hi,
I think a query like this may be what you are looking for:
WITH devices as (SELECT DISTINCT device_id FROM Mydb.logs),
latest_records_per_device AS (
SELECT device_id, (
SELECT _id
FROM Mydb.logs
WHERE logs.device_id=devices.device_id
AND logs.utime<=('2022-01-01 03:00:00'::TIMESTAMP)
ORDER BY logs.utime DESC
LIMIT 1) AS latestid
FROM devices)
SELECT logs.device_id,logs.status,logs.utime
FROM latest_records_per_device
LEFT JOIN Mydb.logs on logs._id=latestid
ORDER BY 1,2,3;
I am sorry to hear that. One thing I am not sure I understood, were you looking to combine two different filters on time intervals? (one to narrow down the number of devices and a different one to lookup the statuses), because if utime being behind the special point in time is the only filter, I think you could use the query with the row_number and the ORDER BY utime desc
Also the new version with the max_by function should be only a few days away now, and using this new function, besides making queries easier, it is also a lot faster in all my tests, so you may want to wait or test in the meanwhile with the version from the testing channel, and use a query with max_by like:
SELECT device_id, max_by(status,utime) as LatestStatus,max(utime) as LatestUTime
FROM Mydb.logs
WHERE logs.utime<=('2022-01-01 03:00:00'::TIMESTAMP)
GROUP BY device_id
ORDER BY 1,2,3;