Thoughts, summaries, and tutorials on a variety of topics.

SQL Queries: Window functions

Aggregation functions are used to collapse data according to group membership (as specified by columns in a GROUP BY clause). There are times when we want to perform calculations across rows without condensing those rows into a single output row; this is when window functions are useful.

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

I regularly have to search for this info so this article is written as a personal reference 😅.


To generate a column whose values are the result of a window function calculation, the syntax can be summarized as
function_name OVER (PARTITION BY col_name1, col_name2, ...
                    ORDER BY col_name3
                    frame_clause)

The OVER clause is mandatory -- it is what distinguishes a window function from a regular function or aggregate function. The PARTITION BY list, ORDER BY list, and frame_clause are each syntactically optional, although the parentheses are not; so they must be present, even if they're empty. It's also possible to FILTER rows before specification of the partition window.
Note The window (the code in the parentheses) can be aliased and reused (details).

  • If we omit PARTITION BY, the entire table becomes the window.
  • If we omit ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.
  • If we omit frame_clause, the default window is the first row in the partition to the current row (inclusive).
PARTITION BY

A list of columns whose values are used to partition the table, i.e., define the partition window.
Note NULL is treated as its own value.

ORDER BY

A list of columns whose values are used to order the rows in the window before performing the calculation. Essentially determines which rows are in the same window (for functions where ordering matters).

  • Can order ASCending (default) or DESCending
  • Can specify NULLS FIRST or NULLS LAST in the ordering
    Note By default, null values sort as if larger than any non-null value; that is, NULLS FIRST is the default for DESC order, and NULLS LAST otherwise (details).
  • Can specify some other function to do the ordering using USING (rare, details)
frame_clause

This clause allows us to specify a subset of the partition window as the window frame, the set of rows (relative to the current row) that is used to perform the window function calculation. Michał summarizes the clause structure as

mode BETWEEN frame_start AND frame_end [ frame_exclusion ]

mode
Sets the way the database engine treats rows. Takes one of three values.
ROWS
endpoints are rows
GROUPS
endpoints are groups
RANGE
endpoints are values
frame_start
Defines where the window frame starts. Can take one of the following four values.
UNBOUNDED PRECEDING
first row of the partition
offset PRECEDING
depends on the mode, but offset specifies the number of rows/groups/difference before the current row.
CURRENT ROW
depends on the mode, but refers to the current row or group
offset FOLLOWING
depends on the mode, but offset specifies the number of rows/groups/difference after the curent row
frame_end
Defines where the window frame ends. Can be omitted (with BETWEEN); its value will default to the current row. Can take one of the following four values.
offset PRECEDING
depends on the mode, but offset specifies the number of rows/groups/difference before the current row.
CURRENT ROW
depends on the mode, but refers to the current row or group
offset FOLLOWING
depends on the mode, but offset specifies the number of rows/groups/difference after the current row
UNBOUNDED FOLLOWING
last row of the partition
frame_exclusion
Allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options. Can take one of the following four values.
EXCLUDE CURRENT ROW
excludes current row from the frame
EXCLUDE GROUP
excludes current row and rodering peers from the frame
EXCLUDE TIES
excludes ordering peers of the current row from the frame, but not the current row itself
EXCLUDE NO OTHERS
simply specifies explicitly the default behavior of not excluding the current row or its peers

Related links