Deploying a CrateDB cloud cluster has never been easier, simply follow our tutorial here and you can have a cluster up and running within minutes. We offer a CRFREE plan which offers up to 2 vCPUs, 2 GiB of memory, and 8 GiB of storage completely for free. Ideal for small-scale testing and evaluation purposes.
Once you have access to the Admin UI of your cluster, execute these statements in the console:
CREATE TABLE airports ( code character varying(3) NOT NULL, name character varying(100) NOT NULL, city character varying(50) NOT NULL, country character varying(50) NOT NULL, latitude double precision NOT NULL, longitude double precision NOT NULL, elevation integer NOT NULL );
CREATE TABLE flights ( year integer, month integer, day_of_month integer, day_of_week integer, dep_time integer, crs_dep_time integer, arr_time integer, crs_arr_time integer, unique_carrier varchar(6), flight_num integer, tail_num varchar(8), actual_elapsed_time integer, crs_elapsed_time integer, air_time integer, arr_delay integer, dep_delay integer, origin varchar(3), dest varchar(3), distance integer, taxi_in integer, taxi_out integer, cancelled integer, cancellation_code varchar(1), diverted varchar(1), carrier_delay integer, weather_delay integer, nas_delay integer, security_delay integer, late_aircraft_delay integer, dep_timestamp timestamp, arr_timestamp timestamp );
This creates 2 empty tables in your database.
airports, with the correct data types of the columns.
Now you should import the data into the tables. We will use Console “Import” feature in this example. Use the following links:
- airports - https://s3.amazonaws.com/crate.sampledata/flights/dataset-airports.csv.gz
- flights - https://s3.amazonaws.com/crate.sampledata/flights/dataset-flights.csv.gz
Make sure to use your pre-created tables in the “Table name” field, otherwise the column types may be created incorrectly. Do this for both .csv files:
After this your tables should no longer be empty.
airports contains 5876 records, and
flights 150 000 records.
Now that you have the dataset to visualize, you can connect your cluster to Cluvio.
In Cluvio, navigate to
Password will understandably differ for you.
After filling out the details for your cluster, press
Next: Test Connection. If the credentials are correct, you should see this success message:
A dashboard is the main point of Cluvio. It is a collection of interactive reports, giving great insight into any area of your data. These are the types of charts Cluvio offers:
- Table Chart
- Pivot / Cohort Table Chart
- Number Chart
- Pie Chart
- Line Chart
- Bar Chart
- Map Chart
- Gauge Chart
- XY / Bubble Chart
- Word Cloud Chart
- Histogram Chart
Now, let’s create some and see how Cluvio works. Head to Dashboards →
New Dashboard. After naming your Dashboard, you can create your first report. Click the
New report in the upper right.
The first piece information you might be interested in, for a given period, is the number of flights and average delays of departures and arrivals. This is the code for this report:
SELECT COUNT(*) AS "Number of flights", AVG(dep_delay) AS "Average Departure Delay", AVG(arr_delay) AS "Average Arrival Delay" FROM doc.flights ORDER BY 1
This is a pretty simple query that counts the number of rows in the
flights as the number of flights, and averages values in the
arr_delay for the departure delays and arrival delays respectively.
To see the information displayed this way, you need to switch to “Number” chart after running query.
This query looks at the country distribution in the
SELECT country, COUNT(1) FROM doc.airports GROUP BY country ORDER BY 2 DESC
In this one, it’s suitable to use pie chart to better see the distribution. We also used the
Value(%) option for the legend, and edited the legend to show up to 25 values (countries).
Filters offer a great way to quickly specify the condition under which you want to display your data.
flights table in
day_of_week column 1 represents Monday, 2 means Tuesday, etc. Using that, we can create a filter to display data for a specific day of the week without changing the SQL in our reports.
VALUES (1, 'Monday'), (2, 'Tuesday'), (3, 'Wednesday'), (4, 'Thursday'), (5, 'Friday'), (6, 'Saturday'), (7, 'Sunday') ORDER BY 1
Now we can filter the data by day of the week:
Find out more about filters here.
SQL snippets are small reusable pieces of code that can make your work easier within larger dataset. They are managed here.
We used them to create JOIN statements:
JOIN doc.airports AS origin_airport ON flights.origin = origin_airport.code JOIN doc.airports AS dest_airport ON flights.dest = dest_airport.code
This snippet creates two joins between the
airports tables, aliasing the
airports table as
dest_airport for the origin and destination airports, respectively.
Then create a report using the snippet:
SELECT flights.year, flights.month, origin_airport.city AS origin_city, dest_airport.city AS destination_city, COUNT(*) AS number_of_flights FROM doc.flights [join_airports] -- Reference to the SQL Snippet WHERE [flight_filters] -- Reference our filters GROUP BY flights.year, flights.month, origin_airport.city, dest_airport.city ORDER BY number_of_flights DESC LIMIT 100;
Using the SQL snippets and filters, we can quickly find out what is the most popular destination departing from Los Angeles (LAX) on a Tuesday. Pretty cool.
That’s it for this tutorial. If using Cluvio could help you make sense of your data, feel free to head to Cloud Console, connect your cluster to Cluvio and get started! Make sure to visit their documentation to explore all the features.