SQL Queries: Window functions
Aggregation functions are used to collapse data according to group membership (as specified by columns in a GROUP BY
clause). There are times when we want to perform calculations across rows without condensing those rows into a single output row; this is when window functions are useful.
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
I regularly have to search for this info so this article is written as a personal reference 😅.
To generate a column whose values are the result of a window function calculation, the syntax can be summarized as
function_name OVER (PARTITION BY col_name1, col_name2, ...
ORDER BY col_name3
frame_clause)
The OVER
clause is mandatory -- it is what distinguishes a window function from a regular function or aggregate function. The PARTITION BY
list, ORDER BY
list, and frame_clause
are each syntactically optional, although the parentheses are not; so they must be present, even if they're empty. It's also possible to FILTER
rows before specification of the partition window.
Note The window (the code in the parentheses) can be aliased and reused (details).
- If we omit
PARTITION BY
, the entire table becomes the window. - If we omit
ORDER BY
, all rows of the partition are included in the window frame, since all rows become peers of the current row. - If we omit
frame_clause
, the default window is the first row in the partition to the current row (inclusive).
- PARTITION BY
-
A list of columns whose values are used to partition the table, i.e., define the partition window.
NoteNULL
is treated as its own value. - ORDER BY
-
A list of columns whose values are used to order the rows in the window before performing the calculation. Essentially determines which rows are in the same window (for functions where ordering matters).
-
Can order
ASC
ending (default) orDESC
ending -
Can specify
NULLS FIRST
orNULLS LAST
in the ordering
Note By default, null values sort as if larger than any non-null value; that is, NULLS FIRST is the default for DESC order, and NULLS LAST otherwise (details). - Can specify some other function to do the ordering using
USING
(rare, details)
-
Can order
- frame_clause
-
This clause allows us to specify a subset of the partition window as the window frame, the set of rows (relative to the current row) that is used to perform the window function calculation. Michał summarizes the clause structure as
mode BETWEEN frame_start AND frame_end [ frame_exclusion ]
- mode
- Sets the way the database engine treats rows. Takes one of three values.
- ROWS
- endpoints are rows
- GROUPS
- endpoints are groups
- RANGE
- endpoints are values
- frame_start
- Defines where the window frame starts. Can take one of the following four values.
- UNBOUNDED PRECEDING
- first row of the partition
- offset PRECEDING
- depends on the mode, but offset specifies the number of rows/groups/difference before the current row.
- CURRENT ROW
- depends on the mode, but refers to the current row or group
- offset FOLLOWING
- depends on the mode, but offset specifies the number of rows/groups/difference after the curent row
- frame_end
- Defines where the window frame ends. Can be omitted (with
BETWEEN
); its value will default to the current row. Can take one of the following four values.- offset PRECEDING
- depends on the mode, but offset specifies the number of rows/groups/difference before the current row.
- CURRENT ROW
- depends on the mode, but refers to the current row or group
- offset FOLLOWING
- depends on the mode, but offset specifies the number of rows/groups/difference after the current row
- UNBOUNDED FOLLOWING
- last row of the partition
- frame_exclusion
- Allows rows around the current row to be excluded from the frame, even if they would be included according to the frame start and frame end options. Can take one of the following four values.
- EXCLUDE CURRENT ROW
- excludes current row from the frame
- EXCLUDE GROUP
- excludes current row and rodering peers from the frame
- EXCLUDE TIES
- excludes ordering peers of the current row from the frame, but not the current row itself
- EXCLUDE NO OTHERS
- simply specifies explicitly the default behavior of not excluding the current row or its peers
Related links
- PostgreSQL window function syntax documentation
-
Window function examples (including
GROUPS
andRANGE
examples) - Listing of window functions + brief descriptions and examples (scroll to the middle)