Guidelines on node sizing

CArsten · July 17, 2023, 8:59am

Hi all,

we are currently looking into using crate.io on premise as a long term storage for our Prometheus monitoring. Obviously, for first tests, any single node should just work, but my question is about what kind of nodes I should look for for a 3+ node cluster? So far, I’ve drawn blanks on node sizing specifics except maybe that ext4/XFS are “good” and ZFS is not a so good file system (File system recommendation for production systems?).

Therefore, what direction should we rather take for a potentially multi-dozen TByte long term storage, focus on CPU/RAM and just use a few local HDD[1] and go nuts in terms of nodes or rather go for large raid10 HDD md storage[2] with fewer nodes and fewer CPU cores?

Obviously, the final answer will be governed by use cases but without any initial guide lines, I don’t know which direction to shoot at. As long as ingestion is no problem, the most important metric IMHO is query speed from Prometheus/Grafana.

Obviously^2, using NVMe flash would be must, much faster, but not sure if I can get the purchase order out for that mount of storage just yet.

Thanks a lot in advance for any insight

CArsten

[1] e.g. 64core nodes with 512 GB RAM but only 4 SATA disk slots
[2] e.g. 6core nodes with 192GB RAM but 24 SATA disks

proddata · July 17, 2023, 9:13am

While I think that questions would be better covered by talking to sales / solution engineering, it also comes down to what you really want to achieve. There are certain limitations (JVM memory related, number of files, … )

You can manage 100 TiB with rather few cpus and limited amount of memory.

So the biggest question would be:

How much do you plan to ingest / s (hour, day)?
How much data to you want to manage on average in the cluster?

Obviously^2, using NVMe flash would be must, much faster, but not sure if I can get the purchase order out for that mount of storage just yet.

In many/most cases NVMe based storage is not really needed anyway. However SSD based storage (even consumer) is significantly better in terms of performance

CArsten · July 17, 2023, 9:42am

At the moment our Prometheus test instance has got 4M time series and with the potential to go up to 10 or 15M. Most of these are scraped every 15 or 30s depending on the sensor time.

At the moment this means about 3TByte for 60d retention time.

Well, to keep query times sensible and I think it’s doubtful, I would need to know how each of the O(90k) CPU cores behaved 2yrs ago, I guess how initial data set can be weeded out/downsampled quite a bit. Thus my guess would be, to go down to say 1M time series maybe sampled at 5 minute intervals for 5-10 yrs. So, 1M values for each series or 1T values in total - not quite sure about cardinality here, as we are in hot discussions which information should accompany each measurement.

Yeah, but I can easier get servers with U.2/U.3 storage than one with man M.2 interfaces

Thanks a lot already!

PS: My “killer” query at the moment (which completely “kills” InfluxDB’s performance - which we test in parallel is simply: Give me a list of all instances where the CPU(s) was/were more than 5% busy with iowait for 5 minute intervals over the past 60h.

The background here is that a user could come in on Monday and tell me, their jobs did not run smoothly over the weekend and I suspect data access problems and I want to correlate their jobs with the instances where the job ran on. But that would just be the future goal, as I do not even get to finish the initial query for 500 instances quickly, i.e. it returns after about a minute. This is partially due to Prometheus only storing per core counters and not even the total number of cores per system.

hernanc · July 18, 2023, 10:18am

Hi,

Because of the way we index data in CrateDB, this kind of query should not be a problem, even with low specs, so the sizing would be mostly condition by the requirements around data ingestion.

This is an interesting article in case you have not seen it already:
How we scaled ingestion to one million rows per second (crate.io)

Would you be interested in having a call with us?
Contact Crate.io | Get in touch with the team behind CrateDB

CArsten · July 18, 2023, 11:04am

Hopefully, I’ll come to that. Right now, I’ve repurposed an old “storage server” with 24disks, 1 NVMe but only 12 logical E5-1650v3 cores. Initially I ran the Prometheus adapter on the same node which was a no-go, now with that on a different system I can still not get data into that stand-alone system at close to real time.

So far, tried mdadm/XFS, ZFS and 24 single XFS disks (this with 12 shards) but so far to no avail.

I’ve seen and browsed it but not followed carefully yet. But looks promising especially as I’m on a stand-alone set-up at the moment.

Maybe, but given we have been burned quite a few times with “enterprise” vendors in the past, I’m not sure we are that interested to be tied into a specific product just yet. If you still think, it would be worthwhile talking and not wasting your time, I can fill out the form.

Edit/Addendum: Just realized just how much data we are currently ingesting from our Prometheus test instance:

min(timestamp): 1689673859.721
max(timestamp):1689679279.070
count(timetamp): 268441622

which results in (count/(max-min)) close to 50k records/s (with quite a few warnings from the garbage collectors and the adapter failing to fully write all data)

hernanc · July 18, 2023, 1:17pm

I can still not get data into that stand-alone system at close to real time.

Reviewing the clustering and partitioning strategy may help here, we also have several other options in the table definition we can work with.

Sharding and partitioning guide for time-series data - Tutorials - CrateDB Community

I’m not sure we are that interested to be tied into a specific product just yet.

I understand. Some settings like the options we got around sharding are of course CrateDB specific, but as CrateDB is compatible with the PostgreSQL protocol and the SQL language a lot of the work you are doing could be ported to another system if needed, and CrateDB can run on premises so you are not locked in to a specific cloud vendor.

If you still think, it would be worthwhile talking and not wasting your time, I can fill out the form.

I think a call would definitely be worthwhile, you seem to have got an interesting use case in your hands and we are always eager to learn more about what people are doing or trying to do with CrateDB and we want to help, the success of the community is also our success.

Topic		Replies	Views
Monitoring a self-managed CrateDB cluster with Prometheus and Grafana Tutorials integration , monitoring , data-visualization , prometheus , grafana	0	1911	September 30, 2022
Community Edition 4.0.10 as prometheus remote storage, data takes up too much space CrateDB	2	851	February 3, 2020
Scaling CrateDB cluster for increased read and write throughput - Emmanuel Katto CrateDB integration	1	29	October 8, 2024
CrateDB nodes constantly crashing CrateDB	2	759	September 30, 2020
New CrateDB Cloud Feature: Custom Storage CrateDB Cloud	0	651	February 28, 2022

Guidelines on node sizing

Related topics