Adaptive partitioning

ibrhmkoz · February 21, 2025, 11:14am

CREATE TABLE projects_posts (
    project_id VARCHAR,
    labels ARRAY (VARCHAR),
    sentiment VARCHAR CHECK (sentiment IN ('negative', 'positive', 'neutral')),
    favorited_at TIMESTAMP WITH TIME ZONE,
    hidden_at TIMESTAMP WITH TIME ZONE,
    post_id VARCHAR,
    channel VARCHAR,
    external_id VARCHAR,
    published_at TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE,
    url VARCHAR NULL,
    title VARCHAR,
    content TEXT,
    transcript ARRAY (
        OBJECT AS (
            start DOUBLE PRECISION,
            text TEXT
        )
    ) NULL,
    media ARRAY (OBJECT (DYNAMIC)) NULL,
    language VARCHAR,
    hashtags ARRAY (VARCHAR) NULL,
    comment_count BIGINT,
    like_count BIGINT,
    view_count BIGINT,
    parent_external_id VARCHAR NULL,
    extras OBJECT(DYNAMIC) NULL,
    source OBJECT AS (
        id VARCHAR,
        description TEXT,
        name VARCHAR,
        url VARCHAR,
        domain VARCHAR,
        favicon VARCHAR,
        score DOUBLE PRECISION,
        monthly_traffic BIGINT,
        country_id VARCHAR,
        state_id VARCHAR,
        city_id VARCHAR,
        location VARCHAR
    ) NULL,
    profile OBJECT AS (
        id VARCHAR,
        avatar VARCHAR,
        name VARCHAR,
        username VARCHAR,
        description TEXT,
        url VARCHAR,
        links ARRAY(VARCHAR),
        location VARCHAR,
        joined_at TIMESTAMP,
        verification_status VARCHAR,
        gender VARCHAR,
        followers BIGINT,
        following BIGINT,
        post_count INTEGER
    ) NULL,
    INDEX keyword_ft USING FULLTEXT(title, content, transcript['text']) 
        WITH (analyzer = 'ilike_analyzer'),
    INDEX search_ft USING FULLTEXT(
        source['name'],
        source['url'],
        profile['name'],
        profile['username'],
        profile['url']
    ) WITH (analyzer = 'ilike_analyzer'),
    CHECK (source['location'] IN ('local', 'foreign')),
    CHECK (profile['verification_status'] IN ('unknown', 'unverified', 'verified')),
    CHECK (profile['gender'] IN ('male', 'female', 'unknown')),
    PRIMARY KEY (project_id, post_id)
) WITH (codec = 'best_compression');

Can we apply partitioning when the number of posts under a project exceeds a certain threshold? I don’t want to partition all projects because the number of posts varies drastically. Is there an article authored in this regard?

This kind of partitioning effectively applies multitenancy for projects with a huge number of posts, leaving the insignificant ones in the same partition.

Topic		Replies	Views
Disproportional space usage CrateDB	4	44	January 13, 2025
Partioning question CrateDB	3	603	January 16, 2023
Partitioning on TEXT type CrateDB	2	227	October 16, 2023
Getting nothing when selecting more than 1 time interval CrateDB	2	590	June 26, 2020
Querying Partitions CrateDB	1	513	August 30, 2022

Adaptive partitioning

Related topics