In this tutorial, you will learn the basics of CrateDB Objects. This tutorial is also available in the video format: CrateDB Video | Fundamentals: Getting Started with CrateDB Objects
First, I present a simple use case to demonstrate how CrateDB Objects can add clarity to data models.
Then, I give an overview of the Column Policies for CrateDB Objects: dynamic
, strict
, and ignored
. I also provide examples of how these policies affect INSERT
statements.
Finally, I show you how to INSERT
, UPDATE
and DELETE
records with the Object datatype.
Setup and Intro to Object Type
Something unique about CrateDB is that it has OBJECT
among the data types. Objects grant considerable versatility to data models and can be manipulated using SQL, thanks to CrateDB’s familiar PostgreSQL interface.
But when do I use OBJECT
? Let’s see an example where OBJECT
fits like a glove.
Let’s imagine a system with a shipping address and an invoice address. Each address has attributes like name, street, city, etc. Without Objects, I would probably set it up like this:
shipping_name
, shipping_street
, shipping_city
, and then also invoice_name
, invoice_street
, invoice_city
This looks a bit repetitive and disorganized.
Imagine you could group the attributes into an invoice and a shipping Object. The system would then look like this:
invoice: {name: ..., street: ..., city: ...}
and shipping: {name: ..., street: ..., city: ...}
looking much more neat and clear.
This is one example of how Objects can add clarity to data models. Throughout this tutorial, I’ll work extensively with Objects, so stay tuned for more examples!
But before anything else, let’s start CrateDB. In our previous tutorial, I showed you how to get started with CrateDB for the first time using Docker. Today I’m following that same method, so I open my terminal and run the following command:
docker run --publish=4200:4200 --publish=5433:5432 --env CRATE_HEAP_SIZE=1g crate:latest
If you wish to learn more about getting started with CrateDB, check out our Getting Started with CrateDB video and the Getting Started documentation at crate.io.
Now that CrateDB is running, I navigate to localhost:4200 and land on the CrateDB Admin UI.
So while I set up a table for this tutorial, I’ll give you an overview of the OBJECT
syntax for creating tables and executing queries.
First, I create a quotes table, where I’ll store some literary quotes and information about the title and protagonist.
CREATE TABLE quotes (
title TEXT,
quotation OBJECT,
protagonist OBJECT(STRICT) AS (
surname TEXT,
first_name TEXT,
details OBJECT AS (
age integer,
birthday TIMESTAMP
)
)
);
Let’s break up this statement:
- I set the title as a usual
TEXT
type - I put the quotation as an
OBJECT
- by default, set toDYNAMIC
- Finally, I store the protagonist as an
OBJECT
, but with theSTRICT
column policy and some pre-defined key attributes -surname
,first_name
,details
.
So, what is the difference between the quotation
and protagonist
objects? Let’s look at how objects and their column policies work in CrateDB.
Objects and the Column Policy
CrateDB structures OBJECT
as a collection of key-value pairs, where an OBJECT
can contain any other type, including child OBJECT
s. Moreover, an OBJECT
column can be schemaless or have a fixed sub-column structure.
In CrateDB, there are three possible policies: DYNAMIC
, STRICT
, or IGNORED
, and I’ll tackle them in the coming examples in the tutorial. But before that, what are the main differences between them?
- The default
DYNAMIC
policy dynamically allows inserts to add new subcolumns to the object definition. It grants the highest flexibility to theOBJECT
column, and new columns are usable like any other subcolumn. You can retrieve them, sort by them, and use them in where clauses. - The
STRICT
policy works together with a column definition. CrateDB will only accept inserts that strictly obey that definition. - Finally, the
IGNORED
policy will only index the sub-column values that match the definition and ignore those that don’t. I use this policy when there is a sub-column definition I want to keep, but inserts may have different sub-columns as previously defined.
So now that I have my quotes table ready and learned the basics of CrateDB OBJECT
, I can move on to the first INSERT
in the table.
INSERT
With the INSERT
statement, I can add data to my CrateDB tables. I can insert a single row of values, bulk-insert several rows at once, and even insert queried values directly.
First, I’ll populate my quotes
table for the first time with a single insert.
INSERT INTO quotes (
title,
quotation,
protagonist
) VALUES (
'Alice in Wonderland',
{
"words" = 'Curiouser and curiouser!',
"length" = 3
},
{
"surname" = 'Pleasance Liddell',
"first_name" = 'Alice',
"details" = {
"age" = 7,
"birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
}
}
);
Here, I insert the quotation and protagonist values as JSON strings, but it’s worth saying they are NOT JSON objects.
The difference between inserting values for the quotation
object and the strict protagonist
object is evident here.
- In the
quotation
column, I insert an object with two sub-columns,words
andlength
, which were never defined before. CrateDB accepts this insert asquotation
has a DYNAMIC column policy and dynamically adds these subcolumns to theOBJECT
definition. - CrateDB accepts this insert for the
protagonist
column because it strictly follows theOBJECT
definition from theCREATE TABLE
statement.
Let’s see a different example for INSERT
that CrateDB accepts:
INSERT INTO quotes (
title,
quotation,
protagonist
) VALUES (
'Alice in Wonderland',
{
"words" = 'Curiouser and curiouser!',
"meaning" = 'Increasingly strange'
},
{
"first_name" = 'Alice',
"details" = {
"birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
}
}
);
This INSERT
is successful in CrateDB, although it has some differences from my previous statement.
- I have added a
meaning
sub-column to the quotation, which works because thequotation
object has a DYNAMICpolicy. - Also, I have not given the protagonist object a
surname
or anage
, although these attributes were given on the originalSTRICT
object definition. This works because, in this case, CrateDB will provide aNULL
value for these sub-columns.
Finally, let’s do an INSERT
that CrateDB will decline:
INSERT INTO quotes (
title,
quotation,
protagonist
) VALUES (
'Alice in Wonderland',
{
"words" = 'Curiouser and curiouser!',
"meaning" = 'Increasingly strange'
},
{
"age" = '7',
"first_name" = 'Alice',
"details" = {
"birthday" = '1852-05-04T00:00Z'::TIMESTAMPTZ
}
}
);
In this example, I attempted to insert the age attribute outside the details sub-column, which is different from my STRICT
object definition. CrateDB returns the following error:
Error!
StrictDynamicMappingException[mapping set to strict, dynamic introduction of [age]
within [protagonist] is not allowed]
I will now query my quotes
table to check what it looks like so far:
SELECT
title AS title,
protagonist['first_name'] AS name,
date_format(
'%D %b %Y',
'GMT',
protagonist['details']['birthday']
) AS born,
quotation['words'] AS quote
FROM quotes limit 100;
And I will also create a VIEW
for this same query, as I’ll repeatedly use it during this tutorial. This will be very handy for checking the results from the following inserts.
create view general_information as
SELECT
title AS title,
protagonist['first_name'] AS name,
date_format(
'%D %b %Y',
'GMT',
protagonist['details']['birthday']
) AS born,
quotation['words'] AS quote
FROM quotes limit 100;
Now that the VIEW
is ready let’s check some other INSERT
options.
For instance, I can insert several values simultaneously with a bulk INSERT
. Each row of values is inside brackets, and commas separate the different rows.
INSERT INTO quotes (
title,
quotation,
protagonist
) VALUES
(
'Slaughterhouse-Five',
{
"words" = 'Everything was beautiful, and nothing hurt.'
},
{
"surname" = 'Pilgrim',
"first_name" = 'Billy',
"details" = {
"birthday" = '1922-07-04T00:00Z'
}
}
),
(
'The Complete Tales of Winnie-the-Pooh',
{
"words" = 'How lucky I am to have something that makes saying goodbye so hard.'
},
{
"first_name" = 'Winnie-the-Pooh',
"details" = {
"birthday" = '1926-10-14T00:00Z'
}
}
),
(
'The Complete Tales of Winnie-the-Pooh',
{
"words" = 'I am short, fat, and proud of that.'
},
{
"first_name" = 'Winnie-the-Pooh',
"details" = {
"birthday" = '1926-10-14T00:00Z'
}
}
);
Let’s now check the state of the quotes
table by querying the general_information
view.
SELECT title, name, born, quote
FROM "doc"."general_information"
LIMIT 100;
I can also insert data directly from a query. For instance, I want to create a table to store my favorite quotes.
CREATE TABLE favorite_quotes (
title TEXT,
quotation TEXT,
protagonist TEXT
);
And as a big Winnie-the-Pooh fan, I want to add all its quotes to my favorite_quotes
table. To do that, I can query the quotes
table, filter for Winnie-the-Pooh results, and then insert the result directly into the favorite_quotes
table.
INSERT INTO favorite_quotes (title, quotation, protagonist)
(SELECT title,
quotation['words'] AS quotation,
protagonist['first_name'] AS protagonist
FROM quotes
WHERE protagonist['first_name'] = 'Winnie-the-Pooh' );
UPDATE
While doing my research for this tutorial, I stumbled across two possible birth dates for “Winnie-the-Pooh.” October 14th, 1926, when the book “Winnie-the-Pooh” was first published, and on August 21st, 1921, Pooh was gifted to Christopher Robin on his first birthday.
I used the former date as the birthday, but now I want to update all Winnie-the-Pooh records to the other date. I can do that with the UPDATE
statement. Here, I specify the table and attribute to be updated and apply it to all records where the first name is “Winnie-the-Pooh.”
UPDATE quotes
SET protagonist['details']['birthday'] = '1921-08-21T00:00Z'
WHERE protagonist['first_name'] = 'Winnie-the-Pooh';
DELETE
Deleting rows in CrateDB is done with the DELETE
statement, where we specify the table and conditions to delete. For example, if I want to delete all records with the “Alice in Wonderland” title, I run the following statement:
DELETE
FROM quotes
WHERE title = 'Alice in Wonderland';