Thoughts, summaries, and tutorials on a variety of topics.

SQL exercises from the Stanford DB course

These SQL tables and the questions asked of them are from the famed Stanford DB Course ( older link).
Solutions are my own, in PostgreSQL.

I learn by example and like to write code by starting from a previously-working template. The goal of this article is to have a reproducible example for later personal reference (and to make note of some personal corrections/updates).

Contents


Introduction

Students at your hometown high school have decided to organize their social network using databases. So far, they have collected information about sixteen students in four grades, 9-12. The schema is as follows.

highschooler
(id, name, grade)
Interpretation There is a high school student with a unique ID and a given first name is a certain grade.
friend
(id1, id2)
Interpretation The student with id1 is friends with the student with id2. Friendship is mutual, so if (123, 456) is in the friend table, so is (456, 123).
likes
(id1, id2)
Interpretation The student with id1 likes the student with id2. Liking someone is not necessarily mutual, so if (123, 456) is in the likes table, there is no guarantee that (456, 123) is also present.
Graph showing the various connections between the students in the database.
Graph showing the various connections between the students in the database.
Click to enlarge.
Credit: Stanford DB course on Lagunita"

On the right is a graph showing the various connections between the students in the database. 9th graders are blue, 10th graders are green, 11th graders are yellow, and 12th graders are purple. Undirected black edges indicate friendships, and directed red edges indicate that one student likes another student. Complete contents of the database can be viewed in the appendix.

Questions

The following questions have the respondent writing SQL queries to produce the solution.

Question 1 Find the names of all students who are friends with someone named Gabriel.

Expected output

Alexis
Andrew
Cassandra
Jessica
Jordan

Solution 1 This problem can be solved with a semi-join. In other words, we'll decide what values to keep in the left table by using values in the right.

/* LOGIC
 - fix id1 as Gabriel
 - we want to find all the id2 with id1 as Gabriel
 - we want to find the names associated with those id2
*/
SELECT h.name
  FROM friend f
LEFT JOIN highschooler h
  ON f.id2 = h.id
WHERE f.id1 IN (
    SELECT id
    FROM highschooler
    WHERE name = 'Gabriel'
)

Question 2 For every student who likes someone two or more grades younger than themselves, return that student's name and grade, and the name and grade of the student they like.

Expected output

John 12 Haley 10

Solution 2 We want to select rows of the likes table, filtered by whether the grade of the liker (e.g., John) is at least 2 grades more than the grade of the likee (Haley).

/* LOGIC
 - for every pair in the likes table,
   pull their name and grade
 - then filter for pairs where student1's grade is more
   then 2 levels greater than student 2
*/
SELECT
  h1.name AS student1,
  h1.grade AS student1_grade,
  h2.name AS student2,
  h2.grade AS student2_grade
FROM likes l
LEFT JOIN highschooler h1
  ON h1.id = l.id1
LEFT JOIN highschooler h2
  ON h2.id = l.id2
WHERE h1.grade >= h2.grade + 2

Question 3 For every pair of students who both like each other, return the name and grade of both students. Include each pair only once, with the two names in alphabetical order.

Expected output

Cassandra 9 Gabriel 9
Jessica 11 Kyle 12

Solution 3 We'll do a self-join with the likes table to identify pairs of students that like each other.

/* LOGIC
 - (in a subquery, w/ a self-join)
   first select pairs of students
   that like each other, by their ID numbers
 - in the outer query, attach names and grades to these IDs
 - filter duplicates / alphabetize in a WHERE statement
*/
SELECT
  h1.name  AS student1,
  h1.grade AS student1_grade,
  h2.name  AS student2,
  h2.grade AS student2_grade
FROM (
  -- select pairs of students that like each other
  SELECT
    l.id1,
    l.id2
  FROM likes l
  INNER JOIN likes as ll
     ON ll.id1 = l.id2
    AND ll.id2 = l.id1
) AS soln

LEFT JOIN highschooler h1
  ON h1.id = soln.id1
LEFT JOIN highschooler h2
  ON h2.id = soln.id2

WHERE h1.name < h2.name

ORDER BY student1

Question 4 For each student A who likes student B where the two are not friends, find if they have a friend C in common (who can introduce them!). For all such trios, return the name and grade of A, B, and C.

Expected output

Andrew 10 Cassandra 9 Gabriel 9
Austin 11 Jordan 12 Andrew 10
Austin 11 Jordan 12 Kyle 11

Solution 4 We know who likes who (unidirectional relationship) from the likes table. We'll filter down these pairs through a series of joins.

  1. For every pair in the likes table, we want to identify which pairs are not friends.
    • We'll do this by left joining friends to likes on both like-pair IDs.
    • There will be null values for friends columns associated with like-pairs that are not friends.
  2. For pairs A and B, we want to find out if A and B have a friend in common, and identify that friend.
  3. Finally, we attach names and grades to those ID numbers.
/* LOGIC
 - identify which pairs in the likes table are not friends
 - identify if any of these pairs have a friend in common
 - attach names and grades to those ID numbers
*/
SELECT
  h1.name, h1.grade,
  h2.name, h2.grade,
  h3.name, h3.grade
FROM (
    SELECT
      -- student A
      l.id1 AS student_A,
      -- student B
      l.id2 AS student_B,
      -- friends of student A
      -- (will later filter such that these are also
      --  friends of B)
      (CASE WHEN l.id1 = f.id1
            THEN f.id2 END) AS student_C
    FROM likes l

    -- table f: friends of student A
    LEFT JOIN friend f
      ON l.id1 = f.id1

    -- table f2: identifies whether A & B are friends
    --           (not friends if f2.id* is null)
    LEFT JOIN friend f2
      ON l.id1 = f2.id1 -- student A
     AND l.id2 = f2.id2 -- student B
   
    -- table f3: identifies student C 
    --  i.e., friends of B who are also friends of A
    INNER JOIN friend f3
      ON  l.id2 = f3.id1 -- student B
     AND f3.id2 =  f.id2 -- B's friend
   
    WHERE f2.id1 IS NULL -- filter 'like' pairs
                         -- that are not friends
) AS students

LEFT JOIN highschooler h1
  ON student_A = h1.id
LEFT JOIN highschooler h2
  ON student_B = h2.id
LEFT JOIN highschooler h3
  ON student_C = h3.id    

Question 5 Find the name and grade of all students who are liked by more than one other student.

Expected output

Cassandra 9
Kris 10

Solution 5 We'll aggregate the likes table and filter out those students that have less than two admirers.

SELECT h.name, h.grade

FROM (
    SELECT id2
    FROM likes
    GROUP BY id2
    HAVING COUNT(id1) > 1
) AS liked
    
LEFT JOIN highschooler h
    ON h.id = liked.id2

Appendix: Code to create tables

I used the code below to populate the highschooler, friend, and likes tables in a PostgreSQL CoderPad sandbox.