Set up an Apache Superset development sandbox with CrateDB

:spiral_notepad: Introduction

This is a little walkthrough about how to quickly spawn a development sandbox with Apache Superset, in order to work on the CrateDB Python driver with live code reloading.

:battery: Prerequisites

You will need Bash, Docker, Git, and Python to be installed on your workstation. All other prerequisites will be installed into your working tree.

:hammer_and_wrench: Setup

CrateDB

Start CrateDB using Docker.

docker run --rm --publish=4200:4200 --publish=5432:5432 --name=cratedb --env CRATE_HEAP_SIZE=1g crate:latest -Cdiscovery.type=single-node

Create an example table and insert a single record.

docker run --interactive --rm --network=host crate:latest crash <<EOF
CREATE TABLE IF NOT EXISTS testdrive (
    ts TIMESTAMP,
    tstz TIMESTAMPTZ,
    val INTEGER CHECK (val >= 0),
    str TEXT NOT NULL,
    str2 TEXT,
    PRIMARY KEY(val, str, str2)
);
INSERT INTO testdrive (ts, tstz, val, str, str2) VALUES (now(), now(), 42, 'foobar', 'bazqux');
EOF

If you need more data to explore, follow how to load 2.6M records from the NYC Yellowcab dataset into CrateDB instead.

Sandbox

Install Apache Superset from source

You can copy this whole section verbatim into your terminal.

# Acquire sources.
git clone https://github.com/apache/superset --depth=1
cd superset

# Create Python virtualenv.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements/local.txt

# Setup Node.js 16 with NPM 7.
export NODEJS_VERSION=16.15.1
export NPM_VERSION=7
source /dev/stdin <<<"$(curl -s https://raw.githubusercontent.com/cicerops/supernode/main/supernode)"

# Run provisioning steps for Apache Superset.
superset db upgrade
superset fab create-admin --username=admin --password=admin --firstname=admin --lastname=admin --email=admin@example.org
superset init

Link the SQLAlchemy dialect for CrateDB

In order to link the filesystem location of the Python driver into the sandbox environment, install the package in β€œeditable” mode.

pip install --editable=/path/to/sqlalchemy-cratedb

If you don’t have the sources yet, you can obtain them from the Git repository using git clone https://github.com/crate/sqlalchemy-cratedb.

Start backend

By using the --reload option, changes on the Python code will be automatically picked up.

# Invoke development web server with code reload machinery.
FLASK_ENV=development superset run -p 8088 --with-threads --reload --debugger

Build and start frontend

In another console, but also within the same virtualenv, you will need to build the frontend and run its development web server.

source .venv/bin/activate
cd superset-frontend
npm install
npm run dev-server

:arrow_forward: Usage

User interface

You should be ready to go. Now,

Create a database connection

For creating a database connection to CrateDB in Apache Superset, you can either use the user interface, or the HTTP API. Those steps will create the connection using the HTTP API, saving a few clicks and keystrokes.

# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)

# Acquire a CSRF token.
CSRF_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)

# Create a data source item / database connection.
http --session=superset http://localhost:8088/api/v1/database/ database_name="CrateDB Testdrive" engine=crate sqlalchemy_uri=crate://crate@localhost:4200 Authorization:"Bearer ${AUTH_TOKEN}" X-CSRFToken:"${CSRF_TOKEN}"

Hacking

Now, you can just go ahead and edit code on the CrateDB Python driver located on your workstation. The application will notice your changes and pick them up by reloading the daemon environment. Please make sure to watch the output on the first console, where superset run has been invoked, for any anomalies or stacktraces.

:broom: Clean up

  1. Both development web servers of Apache Superset (backend and frontend) can be terminated by hitting CTRL+C.
  2. The CrateDB database instance running in a container can be terminated by invoking docker rm cratedb --force.
  3. The metadata database of Apache Superset, where user accounts and database connections are stored, can be deleted by invoking rm ~/.superset/superset.db.
1 Like