Quickly starting CrateDB with 2.5M records of the NYC Yellowcab dataset

amotl · June 29, 2022, 4:20pm

Introduction

This is a little note on how to quickly spawn a single-node CrateDB instance on your workstation using Docker and load it with a subset of the NYC Yellowcab dataset. The intention is to have a single command to give you a fast path to be ready for different explorations, without needing to invoke the corresponding commands interactively.

Prerequisites

You will only need Bash and Docker to be installed on your workstation, the launcher program has been confirmed to work in Linux, macOS, and WSL environments. When running it, you need to be connected to the internet because it will acquire the dataset from an S3 bucket using the HTTP protocol.

Usage

curl https://raw.githubusercontent.com/crate/cratedb-examples/main/operation/testbench-yellowcab/cratedb-import-nyc-yellowcab.sh | bash

Before running the program, you may want to inspect it at cratedb-import-nyc-yellowcab.sh.

Clean up

The CrateDB database instance running in a container can be terminated by invoking:

docker rm cratedb --force

Reference

The program can be used as a blueprint for your own explorations as well. Just reuse the boilerplate code from the top half of the program and exercise your own explorations by adjusting the SQL statements in the bottom half. If you feel they might be interesting to others as well, you are encouraged to share them back.

Topic	Replies	Views
Importing and exporting data in CrateDB Tutorials sql , getting-started	2302	August 8, 2022
Importing data to CrateDB Cloud clusters Tutorials getting-started , cratedb-cloud	2065	May 3, 2023
Live Demo: Learn simple No-Code Data Import and Visualization with CrateDB Cloud Events cratedb-cloud	329	April 7, 2023
Querying CrateDB with DataStation Tutorials integration , ide	826	December 5, 2022
About the Integrations category Integrations	866	November 26, 2018