12 Refactoring Queries with Common Table Expressions
12.1 Module Introduction
This module explores how to use Common Table Expressions (CTEs) in Oracle SQL to write clearer, testable, and reusable queries. CTEs are defined using the WITH
clause and allow you to isolate logical parts of your query, improving readability and simplifying testing.
12.2 Explanation
12.2.1 Common Table Expressions (CTEs)
CTEs allow you to define temporary result sets for use within a larger query, often improving readability.
WITH high_scores AS (
SELECT student_id, section_id, numeric_grade
FROM grade
WHERE numeric_grade >= 90
)SELECT hs.student_id, s.last_name, hs.numeric_grade
FROM high_scores hs
JOIN student s ON hs.student_id = s.student_id;
This query first identifies high-scoring grades, then joins with student information for a readable result.
You can also chain multiple CTEs:
WITH base AS (
SELECT * FROM enrollment
),AS (
grouped SELECT section_id, COUNT(*) AS total_enrollments
FROM base
GROUP BY section_id
)SELECT * FROM grouped WHERE total_enrollments > 5;
This query demonstrates chaining CTEs to break down complex logic into manageable steps.
N.B. We highly encourage using CTEs whenever you need to write queries with more than a couple of steps. - Treat each CTE as a self-contained transformation of your data. - This makes debugging much easier; you will have created several checkpoints for yourself, which makes it easier for you to check your logic. - See the chapter on refactoring queries with CTEs for more advice on this topic.
Reference: Lab 17.1
12.2.2 Using CTEs to Improve Readability
CTEs separate complex logic into named blocks that read like building steps. This is helpful when working with long queries, nested subqueries, or aggregations.
Example: Find the locations of sections with more than 10 students enrolled.
WITH high_enrollments AS (
SELECT section_id
FROM enrollment
GROUP BY section_id
HAVING COUNT(*) > 10
)SELECT s.section_id, s.location
FROM section s
JOIN high_enrollments h ON s.section_id = h.section_id;
This query first identifies sections with high enrollments, then retrieves their locations, making it easier to understand each step.
Reference: Lab 17.1
12.2.3 CTEs for Testing
Use CTEs to isolate and verify subquery logic. This allows you to debug step-by-step rather than deciphering deeply nested SQL.
Example: Calculate average grade per student, then select only those with an average above 85.
WITH valid_grades AS (
SELECT student_id, ROUND(AVG(numeric_grade), 2) avg_grade
FROM grade
GROUP BY student_id
)SELECT *
FROM valid_grades
WHERE avg_grade > 85;
This CTE first computes the average grade for each student, then filters those with an average above 85, making it clear what each part of the query does.
Reference: Lab 17.1
12.2.4 CTEs and Reusability
CTEs allow modular design where intermediate results can be reused. They also support chaining: one CTE can build on another.
Example: Identify students enrolled in at least three sections, then list their names.
WITH enroll_count AS (
SELECT student_id, COUNT(*) AS total_courses
FROM enrollment
GROUP BY student_id
),AS (
active_students SELECT student_id
FROM enroll_count
WHERE total_courses >= 3
)SELECT s.first_name, s.last_name
FROM student s
JOIN active_students a ON s.student_id = a.student_id;
This example first counts enrollments per student, then filters those with three or more enrollments, and finally retrieves their names.
Reference: Lab 17.1
12.2.5 Thinking about CTEs as Variables
If you have any experience with more general programming languages like R or Python, you may notice some similarities between CTEs and the idea of a variable. It is worth taking a moment to compare the two.
Similarities
- Scope and Lifetime
- Both CTEs and variables have a defined scope and lifetime. CTEs are limited to the query they are defined in, while variables can have different scopes (local, global) depending on the programming language.
- Reusing Information Checkpoints
- Both are used to simplify complex operations. CTEs break down complex SQL queries, and variables store data for manipulation throughout a program.
- Readability and Maintenance
- Both can improve readability and maintainability. CTEs make SQL queries easier to understand by breaking them into named components, while variables use meaningful names to make code more readable.
Differences
- Performance
- CTEs: Can lead to performance issues if not optimized, especially with large datasets or complex recursive queries.
- Variables: Generally have minimal performance impact, but their usage in loops or recursive functions can affect program performance.
- Error Handling
- CTEs: Limited to the SQL query execution context. If a CTE fails, the entire query fails.
- Variables: Traditional programming languages offer robust error handling mechanisms (e.g., try-catch blocks) for more granular control over errors.
- Flexibility
- CTEs: Limited to SQL queries and primarily used for data manipulation and retrieval.
- Variables: Highly flexible and can be used in various contexts within a program, including arithmetic operations, string manipulation, and more.
CTEs are not quite as flexible as a variable in a more general language, but you may find it useful to think of them as “table variables.” Use this understanding to make it easier for yourself to write complicated queries!
12.3 Exercises
Basic CTE Syntax Create a CTE that selects all students who live in ZIP code ‘30303’.
Aggregate Inside a CTE Use a CTE to find the average course cost and select only courses that cost more than this average.
Testing a Join with CTE Use a CTE to join
enrollment
andgrade
onstudent_id
andsection_id
, and select the average grade per student.Chain Two CTEs First, use a CTE to identify students with more than one grade recorded. Then, in a second CTE, get those who have an average grade above 90.
Reuse Logic with CTE Use a CTE to calculate enrollments per student. Then use this CTE twice: once to filter those with 3+ enrollments, once to get those with fewer.
Refactor a Nested Query Refactor this nested query into CTEs:
SELECT student_id, first_name, last_name
FROM student
WHERE student_id IN (
SELECT student_id
FROM enrollment
GROUP BY student_id
HAVING COUNT(*) > 2
);
12.4 Q&A
- When should you use a CTE instead of a subquery?
- What are the tradeoffs of readability vs performance in CTEs?
- How do chained CTEs improve logic clarity?
- How can you test each step of a complex query using CTEs?
- How does using CTEs relate to writing modular code in other languages?
- What are the performance implications of using multiple CTEs in a single query?
- When might a CTE be overkill for a simple query?
12.5 Answers
1.
WITH zip_students AS (
SELECT * FROM student WHERE zip = '30303'
)SELECT * FROM zip_students;
2.
WITH avg_cost AS (
SELECT AVG(cost) AS avg_c FROM course
)SELECT *
FROM course
WHERE cost > (SELECT avg_c FROM avg_cost);
3.
WITH joined AS (
SELECT e.student_id, g.numeric_grade
FROM enrollment e
JOIN grade g ON e.student_id = g.student_id AND e.section_id = g.section_id
)SELECT student_id, AVG(numeric_grade) AS avg_grade
FROM joined
GROUP BY student_id;
4.
WITH multiple_grades AS (
SELECT student_id
FROM grade
GROUP BY student_id
HAVING COUNT(*) > 1
),AS (
excellent_students SELECT student_id, AVG(numeric_grade) avg_grade
FROM grade
WHERE student_id IN (SELECT student_id FROM multiple_grades)
GROUP BY student_id
HAVING AVG(numeric_grade) > 90
)SELECT * FROM excellent_students;
5.
WITH enrollment_counts AS (
SELECT student_id, COUNT(*) AS count
FROM enrollment
GROUP BY student_id
)SELECT * FROM enrollment_counts WHERE count >= 3;
For students with fewer enrollments:
WITH enrollment_counts AS (
SELECT student_id, COUNT(*) AS count
FROM enrollment
GROUP BY student_id
)SELECT * FROM enrollment_counts WHERE count < 3;
6.
WITH enrolled AS (
SELECT student_id
FROM enrollment
GROUP BY student_id
HAVING COUNT(*) > 2
)SELECT s.student_id, s.first_name, s.last_name
FROM student s
JOIN enrolled e ON s.student_id = e.student_id;