I’m exploring CrateDB and found that it supports downsampling using DATE_BIN
and the LTTB algorithm. However, I couldn’t find direct support for materialized views where the results are stored in a table for efficient querying.
I came across an integration with dbt (link), but it seems dbt itself doesn’t support materialized views for CrateDB.
Given that many time-series databases offer materialized views, is there a recommended approach to achieve similar functionality in CrateDB? Any best practices or workarounds to persist aggregated/query results for faster access?
Thanks in advance!
1 Like
Hi Dennis,
Materialized views are in the roadmap,
in the meanwhile using dbt incremental models with the delete+insert
strategy is something that works, you may have been put off by the mention of “delete”, it is only records that need to be updated that are deleted before new versions are inserted, not all the historical data you already have in the table.
Another way to incrementally materialize results is using scheduled jobs to run appropriate INSERT .. SELECT … FROM … ON CONFLICT … DO UPDATE … commands. See Continuous Aggregates for an example.
1 Like
Thanks for your prompt response, Hernanc.
Could you please have a look into the below clarifications:
- We are considering an on-premise CrateDB deployment. In this case, can we integrate dbt Core with an incremental model as a solution?
- Does on-premise CrateDB support automation or scheduled jobs?
- Regarding data retention policies, I understand that there is no out-of-the-box support, and we would need to integrate third-party tools like Apache Airflow to manage retention. Could you confirm if this understanding is correct?
2 Likes
Hi Hernan,
Has this feature been released yet? If not, is there an estimated timeline for when it might be available?
Thanks
Hi,
Foundation work for materialized views has been done, but we do not have an ETA for the full feature to be released yet.
At the moment, using scheduled jobs or a external tool like dbt, as discussed above, is the way to go.
I can also recommend other tools if you prefer an alternative to dbt.
In general we see with many companies using CrateDB that doing the incremental updates in existing pipelines works well, as people often already have some tooling for transforms or stream processing anyway.