Disk space issues on prometheus integration

Hello!

I’m building a new monitoring cluster on my company (Zyte) and looking for a clustered storage for prometheus. Found CrateDB and loved it! Installed and everything was working smoothly from the beginning.

BUT… the disk space used is A LOT… tried changing the schema and disabling indexing but still huge. For comparison, we’re storing 45 days of metrics in 800Gb in prometheus. In CrateDB 1 day is 1.1Tb!

I know that I’m probably doing something wrong here, but been searching and trying things with no luck. Anything else that I can be missing? I would love to keep CrateDB!

Also, the cratedb-prometheus-adapter memory usage is increasing daily and after some days it’s consuming up to 85% of the server’s memory (128Gb). I comment this just in case is indication of something.

Thanks!

Hi!

  • What’s your current table definition, could you share the SHOW CREATE table_name?
  • For the memory part, anything strange in the logs? Does the memory keep just going up, do you have maybe any memory over time graph?

Yes, sure. This is my last test, decreasing the sharding and disabling partitioning:

CREATE TABLE IF NOT EXISTS "doc"."metrics" (
   "timestamp" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
   "labels_hash" TEXT NOT NULL,
   "labels" OBJECT(DYNAMIC) AS (
      "instance" TEXT,
      "prometheus" TEXT,
      "job" TEXT,
      "__name__" TEXT,
      "quantile" TEXT,
      "le" TEXT,
      "dialer_name" TEXT,
      "version" TEXT,
      "reason" TEXT,
      "listener_name" TEXT,
      "slice" TEXT,
      "handler" TEXT,
      "code" TEXT,
      "goversion" TEXT,
      "goarch" TEXT,
      "goos" TEXT,
      "branch" TEXT,
      "revision" TEXT,
      "tags" TEXT,
      "alertmanager" TEXT,
      "remote_name" TEXT,
      "url" TEXT,
      "role" TEXT,
      "event" TEXT,
      "call" TEXT,
      "endpoint" TEXT,
      "name" TEXT,
      "config" TEXT,
      "interval" TEXT,
      "scrape_job" TEXT,
      "type" TEXT,
      "consumer" TEXT,
      "appid" TEXT,
      "interface" TEXT,
      "taskid" TEXT,
      "cpu" TEXT,
      "hash" TEXT,
      "core" TEXT,
      "package" TEXT,
      "device" TEXT,
      "mode" TEXT,
      "controller" TEXT,
      "csrow" TEXT,
      "fstype" TEXT,
      "mountpoint" TEXT,
      "chip" TEXT,
      "sensor" TEXT,
      "chip_name" TEXT,
      "label" TEXT,
      "broadcast" TEXT,
      "address" TEXT,
      "duplex" TEXT,
      "operstate" TEXT,
      "collector" TEXT,
      "nodename" TEXT,
      "machine" TEXT,
      "release" TEXT,
      "sysname" TEXT,
      "domainname" TEXT,
      "file" TEXT,
      "master" TEXT,
      "generation" TEXT,
      "major" TEXT,
      "minor" TEXT,
      "patchlevel" TEXT,
      "implementation" TEXT,
      "cluster" TEXT,
      "clientid" TEXT,
      "topic" TEXT,
      "delayedoperation" TEXT,
      "request" TEXT,
      "error" TEXT,
      "networkprocessor" TEXT,
      "partition" TEXT,
      "value" TEXT,
      "environment" TEXT,
      "exported_instance" TEXT,
      "table" TEXT,
      "mechanism" TEXT,
      "database" TEXT,
      "_target" TEXT,
      "state" TEXT,
      "stage" TEXT,
      "node" TEXT,
      "meta_package" TEXT,
      "meta_hostname" TEXT,
      "meta_ip_address" TEXT,
      "method" TEXT
   ),
   "value" DOUBLE PRECISION,
   "valueRaw" BIGINT,
   PRIMARY KEY ("timestamp", "labels_hash")
)
CLUSTERED INTO 3 SHARDS
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

I initially created the table as the documentation suggested:

    "timestamp" TIMESTAMP,
    "labels_hash" STRING,
    "labels" OBJECT(DYNAMIC),
    "value" DOUBLE,
    "valueRaw" LONG,
    "day__generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('day', "timestamp"),
    PRIMARY KEY ("timestamp", "labels_hash", "day__generated")
  ) PARTITIONED BY ("day__generated");

And another test was to define every field in my dynamic object with INDEX OFF, but the disk usage was pretty much the same.

For the memory part, I don’t have anything, sorry. I removed the metrics table in CrateDB many times, and also restarted both containers multiple times. Also the strange part of it is that I stopped prometheus and the adapter was still consuming the same amount of memory and CPU!

Hi, thanks a lot for the kind words about CrateDB, hope we can make things work for you

  1. You might want to change codec parameter from default to best compression. By default, data is stored using LZ4 compression, best_compression changes it to DEFLATE. Depending on your data it might half the storage needs, but will come at the expense of slower column value lookups.
  2. As of today CrateDB stores data both in the columar store and a separate document representation which is not ideal. We have been working on addressing that through the past few development cycles and right now testing the changes. Please keep an eye on Don't store JSON source for table rows on disk · Issue #15548 · crate/crate · GitHub to get an update on when it will be released.

The option Sergey mentions is https://cratedb.com/docs/crate/reference/en/latest/sql/statements/create-table.html#codec, if the table has data you’ll most likely have open indeces and will not be able to alter it like

ALTER TABLE
  metrics
SET
  (codec = 'best_compression')

You will have to recreate it

CREATE TABLE "metrics" (
  "timestamp" TIMESTAMP,
  "labels_hash" STRING,
  "labels" OBJECT(DYNAMIC),
  "value" DOUBLE,
  "valueRaw" LONG,
  "day__generated" TIMESTAMP GENERATED ALWAYS AS date_trunc('day', "timestamp"),
  PRIMARY KEY ("timestamp", "labels_hash", "day__generated")
) PARTITIONED BY ("day__generated") WITH (codec = 'best_compression')

As for the memory usage I think your memory usage is clearly unexpected, at most I’d only expect a few Mbs of memory being used, the possible memory leak is being tracked at Posible memory leak · Issue #176 · crate/cratedb-prometheus-adapter · GitHub

Hello again! I’m back!

I’ve been testing other backends as well, but I wanted to give CrateDB another try as I really liked it.

Started from scratch again, and created the table as you described. After 15 minutes, disk usage is 400Mb in Prometheus and 8.5Gb in CrateDB! I think that maybe it’s related to storing the JSON for each entry, as I have lots of entries per second!! so maybe I have to wait for that before using CrateDB on my setup :frowning:

Memory usage on the adapter started at 0.1% and now it’s at 0.3%, I’ll leave it running for some hours to see if I can give you some data regarding this other issue.

Thanks for your help!!

Hi @cyberplant do you have INDEX OFF in the new setup?

Also, you have a lot of TEXT columns, if you don’t use them often in aggregations or grouping, you can disable column storage as well.

See Storage

some_column TEXT INDEX OFF STORAGE WITH (columnstore = false)