14 August 2020 by Sarah Anoke
reading time: < 5 minutes
SQL queries don't execute in the order that they are written. It's important to understand execution order to be able to produce the tables we want, and understand errors or other undesirable behavior.
I regularly have to search for this listing so this article is written as a personal reference.
There's a more in-depth but concise write-up at SQLBolt.
is a link to the (PostgreSQL) documentation for that clause.
#
Clause
Additional notes
01
FROM
any subqueries will also execute
02
JOIN
any subqueries will also execute
good to limit and prefilter / preaggregate beforehand, to speed up query
can preaggregate with CTEs (using WITH)
03
WHERE
filters out rows that don't meet the constraint
processed after the join; this matters for outer joins
can do partial matches on strings using LIKE
select data with a region name like Melanesia and Micronesia: WHERE region LIKE 'M%esia'; (% matches 0+ characters)
select data with a region name that does not end in 's': WHERE region NOT LIKE '%s';
select IDs that match the pattern of a US social security ID: WHERE id LIKE '___-__-___'; (_ matches exactly 1 character)
column aliases not accessible
04
GROUP BY
groups data that's been filtered by WHERE
column aliases not accessible
05
HAVING
filters grouped rows created by GROUP BY
column aliases not accessible so use COUNT(cust_id) < 5 for example
06
SELECT
window functions are calculated here
if the query uses GROUP BY or HAVING, then the rows seen by the window functions are the group rows instead of the original rows from FROM/WHERE
CASE statements (used to create categorical vars) are calculated here
can access columns from JOINed tables
07
DISTINCT
of the remaining rows, those with duplicate values in the column(s) marked DISTINCT are discarded