In this tutorial, you will learn how to use window functions in CrateDB.
CrateDB is an opensource distributed database designed for scalability and performance. One of the features of CrateDB is its support for window functions. Unlike regular aggregate functions like SUM
and COUNT
, which aggregate data across the entire result set, window functions operate on a subset of rows defined by the window specification. Window functions help perform calculations that depend on the rows’ order or involve comparisons with other rows in the same result set. Some everyday use cases for window functions include calculating running totals and averages and ranking rows based on their values.
In this tutorial, you will learn how to
 define a window function
 use partitioning and ordering to specify a window
 perform standard calculations and aggregations
So let’s dive in!
Defining a Window Function
To declare a function a window function, use the OVER
clause to define the window, or in other words, define which rows will take part in the function computation. If the OVER
clause is left empty, the window is defined as all the rows in the result set.
For example, let’s use COUNT(*)
as a window function in an employees
database. If you leave the OVER
clause empty, COUNT(*)
will be computed over the entire database of employees
.
SELECT first_name,
last_name,
designation,
COUNT(*) OVER()
FROM employees
ORDER BY last_name
LIMIT 100;
PARTITION BY
PARTITION BY
is a clause that divides the result set into partitions or groups based on the values of one or more columns. The window function is then applied to each partition separately rather than the entire result set.
Let’s say you want to calculate the number of employees with the same designation. Then, define the COUNT(*)
computation window as a partition with the same designation, using OVER(PARTITION BY designation)
.
SELECT first_name,
last_name,
designation,
COUNT(*) OVER(PARTITION BY designation)
FROM employees
ORDER BY last_name
LIMIT 100;
ORDER BY
With ORDER BY
, you can specify criteria for processing the rows within each partition, such as ascending or descending values of a particular column.
For instance, say you want to get the employee’s salary rank among people with the same job designation. You can keep the same designation partition as the previous example, but now order it by salary and use the DENSE_RANK()
function to get the employee’s salary rank in this window.
Moreover, you can define your window using the WINDOW
clause. Using named windows in your queries simplifies your code, allows you to reuse window specifications across multiple queries, and improves query readability.
SELECT first_name,
last_name,
designation,
salary,
DENSE_RANK() OVER w
FROM employees
WINDOW w as (PARTITION BY designation ORDER BY salary DESC)
ORDER BY last_name
LIMIT 100;
Performing standard calculations and aggregations
Now that you are familiar with defining windows, let’s have an overview of the functions you can use as window functions and then check out some practical examples.
Aggregations
Aggregate functions calculate aggregate values over the rows defined by the window. These functions include:

SUM
: Calculates the sum of a column over the set of rows in the window. 
AVG
: Calculates the average of a column over the set of rows in the window. 
MIN
: Returns the minimum value of a column over the set of rows in the window. 
MAX
: Returns the maximum value of a column over the set of rows in the window. 
COUNT
: Counts the number of rows in the window. 
MIN_BY
,MAX_BY
: allows you to find the minimum or maximum value in a given column based on the values in another column. Check out our blog post introducing this feature in CrateDB 5.2
Check out our Aggregation documentation for more details on all supported functions.
Generalpurpose functions
On the other hand, generalpurpose functions perform various operations on the set of rows defined by the window. These functions include:

ROW_NUMBER
: Returns the current row number within its window. 
FIRST_VALUE
: Returns the argument value to the window’s first row. 
LAST_VALUE
: Returns the argument value to the window’s last row. 
NTH_VALUE
: Returns the argument value to the window’s nth row, orNULL
if the nth row does not exist. 
RANK
: Returns the rank of each row in the window based on the values in theORDER BY
clause. Subsequent tied rows get the same rank, and the potential rank of the next row gets incremented by each tie. Because of this, ranks may not be sequential. 
DENSE_RANK
: Returns the dense rank of each row in the window based on the values in theORDER BY
clause. UnlikeRANK
, it always returns sequential rank values. 
LEAD
: Returns the value of a column from a row n rows ahead of the current row in the window. 
LAG
: Returns the value of a column from a row that is n rows behind the current row in the window.
Check out our windowfunctions documentation for more details on these functions.
Examples
 YearoverYear growth
Consider a table with the year
and the sales
for a company. Suppose you want to calculate the previous and next sales and the yearoveryear growth in sales. Here, the LAG
function comes in handy to figure out the previous sales, the LEAD
function returns the following sales, and you can use the LAG
function in calculating your YoY growth.
SELECT
year,
sales,
LAG(sales) OVER w prev_sales,
LEAD(sales) OVER w next_sales,
(sales / LAG(sales) OVER w  1)*100 as year_over_year_growth
FROM (VALUES
(2018, 100.5),
(2019, 150.2),
(2020, 200.7),
(2021, 180.1),
(2022, 220))
AS sales (year, sales)
WINDOW w AS (ORDER BY year)
LIMIT 100;
 Percentage of sales for each product in each year
To calculate the percentage of sales for each product each year, you can use the SUM
function to calculate the total sales on each year window and then divide the product’s sales by the total sales for that year.
 Percentage of total sales for each product over the entire period
Similarly, you can use SUM
to calculate the total sales of that product and divide the product’s sales by this result.
SELECT
year,
product,
sales,
(sales / SUM(sales) OVER(PARTITION BY year))*100 AS percentage_of_sales_on_year,
(sales / SUM(sales) OVER(PARTITION BY product))*100 AS percentage_of_total_sales_of_product
FROM (VALUES
(2018, 'A', 110.5),
(2019, 'A', 120),
(2020, 'A', 124.2),
(2018, 'B', 70),
(2019, 'B', 68.1),
(2020, 'B', 65.9),
(2018, 'C', 100),
(2019, 'C', 101.3),
(2020, 'C', 102))
AS product_sales (year, product, sales) LIMIT 100;
Wrap Up
In summary, window functions are a valuable tool for data analysis in CrateDB. By defining partitions and orderings, you can perform complex calculations and aggregations over subsets of data, making it easier to gain insights and understand patterns in our data.