Disk space issues on prometheus integration

cyberplant · September 11, 2024, 2:23pm

Hello!

I’m building a new monitoring cluster on my company (Zyte) and looking for a clustered storage for prometheus. Found CrateDB and loved it! Installed and everything was working smoothly from the beginning.

BUT… the disk space used is A LOT… tried changing the schema and disabling indexing but still huge. For comparison, we’re storing 45 days of metrics in 800Gb in prometheus. In CrateDB 1 day is 1.1Tb!

I know that I’m probably doing something wrong here, but been searching and trying things with no luck. Anything else that I can be missing? I would love to keep CrateDB!

Also, the cratedb-prometheus-adapter memory usage is increasing daily and after some days it’s consuming up to 85% of the server’s memory (128Gb). I comment this just in case is indication of something.

Thanks!

surister · September 12, 2024, 7:36am

Hi!

What’s your current table definition, could you share the SHOW CREATE table_name?
For the memory part, anything strange in the logs? Does the memory keep just going up, do you have maybe any memory over time graph?

cyberplant · September 12, 2024, 3:31pm

Yes, sure. This is my last test, decreasing the sharding and disabling partitioning:

CREATE TABLE IF NOT EXISTS "doc"."metrics" (
   "timestamp" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
   "labels_hash" TEXT NOT NULL,
   "labels" OBJECT(DYNAMIC) AS (
      "instance" TEXT,
      "prometheus" TEXT,
      "job" TEXT,
      "__name__" TEXT,
      "quantile" TEXT,
      "le" TEXT,
      "dialer_name" TEXT,
      "version" TEXT,
      "reason" TEXT,
      "listener_name" TEXT,
      "slice" TEXT,
      "handler" TEXT,
      "code" TEXT,
      "goversion" TEXT,
      "goarch" TEXT,
      "goos" TEXT,
      "branch" TEXT,
      "revision" TEXT,
      "tags" TEXT,
      "alertmanager" TEXT,
      "remote_name" TEXT,
      "url" TEXT,
      "role" TEXT,
      "event" TEXT,
      "call" TEXT,
      "endpoint" TEXT,
      "name" TEXT,
      "config" TEXT,
      "interval" TEXT,
      "scrape_job" TEXT,
      "type" TEXT,
      "consumer" TEXT,
      "appid" TEXT,
      "interface" TEXT,
      "taskid" TEXT,
      "cpu" TEXT,
      "hash" TEXT,
      "core" TEXT,
      "package" TEXT,
      "device" TEXT,
      "mode" TEXT,
      "controller" TEXT,
      "csrow" TEXT,
      "fstype" TEXT,
      "mountpoint" TEXT,
      "chip" TEXT,
      "sensor" TEXT,
      "chip_name" TEXT,
      "label" TEXT,
      "broadcast" TEXT,
      "address" TEXT,
      "duplex" TEXT,
      "operstate" TEXT,
      "collector" TEXT,
      "nodename" TEXT,
      "machine" TEXT,
      "release" TEXT,
      "sysname" TEXT,
      "domainname" TEXT,
      "file" TEXT,
      "master" TEXT,
      "generation" TEXT,
      "major" TEXT,
      "minor" TEXT,
      "patchlevel" TEXT,
      "implementation" TEXT,
      "cluster" TEXT,
      "clientid" TEXT,
      "topic" TEXT,
      "delayedoperation" TEXT,
      "request" TEXT,
      "error" TEXT,
      "networkprocessor" TEXT,
      "partition" TEXT,
      "value" TEXT,
      "environment" TEXT,
      "exported_instance" TEXT,
      "table" TEXT,
      "mechanism" TEXT,
      "database" TEXT,
      "_target" TEXT,
      "state" TEXT,
      "stage" TEXT,
      "node" TEXT,
      "meta_package" TEXT,
      "meta_hostname" TEXT,
      "meta_ip_address" TEXT,
      "method" TEXT
   ),
   "value" DOUBLE PRECISION,
   "valueRaw" BIGINT,
   PRIMARY KEY ("timestamp", "labels_hash")
)
CLUSTERED INTO 3 SHARDS
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

I initially created the table as the documentation suggested:

    "timestamp" TIMESTAMP,
    "labels_hash" STRING,
    "labels" OBJECT(DYNAMIC),
    "value" DOUBLE,
    "valueRaw" LONG,
    "day__generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('day', "timestamp"),
    PRIMARY KEY ("timestamp", "labels_hash", "day__generated")
  ) PARTITIONED BY ("day__generated");

And another test was to define every field in my dynamic object with INDEX OFF, but the disk usage was pretty much the same.

For the memory part, I don’t have anything, sorry. I removed the metrics table in CrateDB many times, and also restarted both containers multiple times. Also the strange part of it is that I stopped prometheus and the adapter was still consuming the same amount of memory and CPU!

sergey.gerasimenko · September 17, 2024, 6:15pm

Hi, thanks a lot for the kind words about CrateDB, hope we can make things work for you

You might want to change codec parameter from default to best compression. By default, data is stored using LZ4 compression, best_compression changes it to DEFLATE. Depending on your data it might half the storage needs, but will come at the expense of slower column value lookups.
As of today CrateDB stores data both in the columar store and a separate document representation which is not ideal. We have been working on addressing that through the past few development cycles and right now testing the changes. Please keep an eye on Don't store JSON source for table rows on disk · Issue #15548 · crate/crate · GitHub to get an update on when it will be released.

surister · September 18, 2024, 6:52am

The option Sergey mentions is https://cratedb.com/docs/crate/reference/en/latest/sql/statements/create-table.html#codec, if the table has data you’ll most likely have open indeces and will not be able to alter it like

ALTER TABLE
  metrics
SET
  (codec = 'best_compression')

You will have to recreate it

CREATE TABLE "metrics" (
  "timestamp" TIMESTAMP,
  "labels_hash" STRING,
  "labels" OBJECT(DYNAMIC),
  "value" DOUBLE,
  "valueRaw" LONG,
  "day__generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('day', "timestamp"),
  PRIMARY KEY ("timestamp", "labels_hash", "day__generated")
) PARTITIONED BY ("day__generated") WITH (codec = 'best_compression')

As for the memory usage I think your memory usage is clearly unexpected, at most I’d only expect a few Mbs of memory being used, the possible memory leak is being tracked at Posible memory leak · Issue #176 · crate/cratedb-prometheus-adapter · GitHub

cyberplant · September 25, 2024, 4:08am

Hello again! I’m back!

I’ve been testing other backends as well, but I wanted to give CrateDB another try as I really liked it.

Started from scratch again, and created the table as you described. After 15 minutes, disk usage is 400Mb in Prometheus and 8.5Gb in CrateDB! I think that maybe it’s related to storing the JSON for each entry, as I have lots of entries per second!! so maybe I have to wait for that before using CrateDB on my setup

Memory usage on the adapter started at 0.1% and now it’s at 0.3%, I’ll leave it running for some hours to see if I can give you some data regarding this other issue.

Thanks for your help!!

Baur · September 26, 2024, 8:32am

Hi @cyberplant do you have INDEX OFF in the new setup?

Also, you have a lot of TEXT columns, if you don’t use them often in aggregations or grouping, you can disable column storage as well.

See Storage

some_column TEXT INDEX OFF STORAGE WITH (columnstore = false)

sergey.gerasimenko · March 5, 2025, 9:22am

Hi @cyberplant. Just wanted to let you know that 5.10 release includes improvement that can save up to 50% of the disk space. Details CrateDB Blog | CrateDB v5.10 Release: 50% storage space reduction and fast outer joins

Topic		Replies	Views
Community Edition 4.0.10 as prometheus remote storage, data takes up too much space CrateDB	2	855	February 3, 2020
Storing long-term metrics with Prometheus in CrateDB Tutorials integration , monitoring	1	2963	April 6, 2022
Still new to CrateDB. Have some basic quesions Configuration	5	955	January 18, 2022
CrateDB performance CrateDB	0	728	October 21, 2020
Does Crate expose prometheus metrics? CrateDB	1	580	November 27, 2018

Disk space issues on prometheus integration

Related topics