Subqueries
Subqueries
Imagine you're at a supermarket and ask the clerk: "What's the most expensive item in your store?" The clerk checks and says: "It's a MacBook Pro." Then you follow up: "Who are the customers that bought this product?"—What you just did is a subquery: using the answer to one question to ask the next. Subqueries in SQL work the same way: use the result of one query as a condition or data source for another query.
1. Core Concepts
What is a Subquery?
A subquery (also called an inner query or nested query) is a query embedded within another SQL statement. The outer query is called the main query, and the inner query is called the subquery. The subquery executes first, and its result is used by the main query.
-- Subquery: find the employee with the highest salary
SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);
Execution order: First execute SELECT MAX(salary) FROM employees to get 20000, then execute the outer query to find the employee(s) with salary = 20000.
Subquery Classification
Based on where the subquery appears, there are three main types:
| Type | Location | Purpose | Returns |
|---|---|---|---|
| WHERE subquery | In the WHERE clause | Filter data | Single value or list |
| FROM subquery (derived table) | In the FROM clause | Acts as a temporary table | Result set |
| SELECT subquery (scalar subquery) | In the SELECT column list | Acts as a computed column | Single value |
WHERE Subquery
The most common usage. The subquery appears in the WHERE clause, dynamically determining the filter condition.
-- Find employees whose salary is above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Subqueries can return:
- Single value (scalar subquery): used with comparison operators like
=,>,< - Multiple values (multi-row subquery): used with operators like
IN,ANY,ALL
-- Return multiple values: match with IN
SELECT name, salary
FROM employees
WHERE department_id IN (
SELECT id FROM departments WHERE location = 'Beijing'
);
FROM Subquery (Derived Table)
The subquery appears in the FROM clause, effectively creating a temporary table (also called a derived table). It must be given an alias.
-- First calculate each department's average salary, then filter from it
SELECT dept_name, avg_salary
FROM (
SELECT d.name AS dept_name, AVG(e.salary) AS avg_salary
FROM employees e
JOIN departments d ON e.department_id = d.id
GROUP BY d.name
) AS dept_avg
WHERE avg_salary > 12000;
AS dept_avg above), otherwise SQL will throw an error.
SELECT Subquery (Scalar Subquery)
The subquery appears in the SELECT column list as a computed column. Every time the main query processes a row, the subquery executes once.
-- Display each employee's salary and the company average salary
SELECT name, salary,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;
Output:
name salary avg_salary
------ -------- ----------
Zhang San 15000.00 14000.00
Li Si 18000.00 14000.00
Wang Wu 12000.00 14000.00
Zhao Liu 13000.00 14000.00
Qian Qi 20000.00 14000.00
Sun Ba 11000.00 14000.00
Zhou Jiu 9000.00 14000.00
Wu Shi 14000.00 14000.00
EXISTS and NOT EXISTS
EXISTS checks whether a subquery returns at least one row. It doesn't care what values are returned—only whether "there are results."
-- Find departments that have employees
SELECT d.name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e WHERE e.department_id = d.id
);
-- Find departments with no employees
SELECT d.name
FROM departments d
WHERE NOT EXISTS (
SELECT 1 FROM employees e WHERE e.department_id = d.id
);
SELECT 1 in EXISTS is a conventional practice because EXISTS only cares about "whether rows exist," not the column values. Using SELECT * or SELECT NULL has the same effect.
Subquery vs JOIN Performance Comparison
| Comparison Item | Subquery | JOIN |
|---|---|---|
| Readability | Closer to natural language thinking | Requires understanding join logic |
| Performance | Little difference in simple cases; correlated subqueries may be slower | Generally better, with specialized optimizer |
| Flexibility | Can use aggregate function results as conditions | Requires GROUP BY |
| Recommended for | "Find the largest," "find non-existent" | "Display with association," "multi-table merge" |
EXPLAIN to analyze and optimize when performance issues arise.
2. Basic Syntax/Usage
WHERE Subquery Syntax
-- Scalar subquery (returns single value)
SELECT column_name FROM table_name
WHERE column_name comparison_operator (SELECT aggregate_function FROM table_name);
-- Multi-row subquery (returns list)
SELECT column_name FROM table_name
WHERE column_name IN (SELECT column_name FROM table_name WHERE condition);
FROM Subquery Syntax
SELECT column_name
FROM (SELECT ... FROM ... WHERE ...) AS alias
WHERE condition;
SELECT Scalar Subquery Syntax
SELECT column1, column2,
(SELECT aggregate_function FROM table_name WHERE condition) AS alias
FROM table_name;
EXISTS Syntax
SELECT column_name FROM table_A a
WHERE EXISTS (SELECT 1 FROM table_B b WHERE b.foreign_key = a.primary_key);
a.id), so the subquery executes once for each row in the outer query.
Example: Find Employees With Above-Average Salary (Difficulty ⭐)
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;
Output:
name salary
------ --------
Qian Qi 20000.00
Li Si 18000.00
Zhang San 15000.00
Wu Shi 14000.00
First calculate the average salary (approximately 14000), then filter employees above average.
Example: Find the Highest-Paid Employee in Each Department (Difficulty ⭐⭐)
SELECT e.name, d.name AS department, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);
Output:
name department salary
------ ---------- --------
Qian Qi Technology 20000.00
Wang Wu Marketing 12000.00
Wu Shi Finance 14000.00
e.department_id in the subquery references the current row of the outer query. The subquery executes once for each outer row, finding the highest salary in that employee's department, then checks if the current employee is the highest-paid.
Example: Use Derived Table to Analyze Department Salary Levels (Difficulty ⭐⭐⭐)
SELECT dept_name, emp_count, avg_salary,
CASE
WHEN avg_salary >= 15000 THEN 'High Salary'
WHEN avg_salary >= 12000 THEN 'Medium Salary'
ELSE 'Needs Improvement'
END AS level
FROM (
SELECT d.name AS dept_name,
COUNT(e.id) AS emp_count,
AVG(e.salary) AS avg_salary
FROM departments d
LEFT JOIN employees e ON d.id = e.department_id
GROUP BY d.id, d.name
) AS dept_stats
WHERE emp_count > 0
ORDER BY avg_salary DESC;
Output:
dept_name emp_count avg_salary level
--------- --------- ---------- ----------
Technology 3 17666.67 High Salary
Finance 2 13500.00 Medium Salary
Marketing 2 11500.00 Needs Improvement
3. Common Use Cases
Case 1: Find Products Never Ordered
SELECT p.name, p.category, p.price
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.product_id = p.id
);
Output (depends on data):
name category price
-------- -------- -------
Mac Mini Computer 4499.00
SELECT p.name, p.category, p.price
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
WHERE o.id IS NULL;
Both approaches have similar performance—choose the one with better readability.
Case 2: Find Customers Whose Order Amount Exceeds the Average
SELECT customer_name, total_spent
FROM (
SELECT o.customer_name,
SUM(o.quantity * p.price) AS total_spent
FROM orders o
JOIN products p ON o.product_id = p.id
GROUP BY o.customer_name
) AS customer_totals
WHERE total_spent > (
SELECT AVG(o.quantity * p.price)
FROM orders o
JOIN products p ON o.product_id = p.id
);
Output:
customer_name total_spent
------------- -----------
Xiao Li 14397.00
Xiao Gang 17998.00
Xiao Wang 11998.00
❓ FAQ
Q: How many levels can subqueries be nested? A: There's no theoretical limit, but in practice, it's recommended not to exceed 3 levels. Too many levels indicate overly complex query logic—consider splitting into multiple steps or rewriting with JOINs.
Q: Should I use IN or EXISTS? A: When the subquery result set is small and the outer table is large, IN is more efficient; when the outer table is small and the subquery table is large, EXISTS is more efficient (because EXISTS stops at the first match). The difference is minimal for simple scenarios—use EXPLAIN to compare for complex cases.
Q: What happens if a scalar subquery returns multiple rows? A: The database will throw an error. A scalar subquery must return exactly one row and one column. If you're unsure about the row count, use
LIMIT 1or an aggregate function (likeMAX,MIN) to ensure a single value.
Q: What's the difference between correlated and non-correlated subqueries? A: A non-correlated subquery executes independently once (e.g., "find average salary"), and its result is used by the main query. A correlated subquery references columns from the outer query and executes once for each outer row (e.g., "find the highest salary in each department"). Correlated subqueries may be slow with large datasets—watch for performance.
📖 Summary
- A subquery is a query nested inside another SQL statement; the inner query executes first, and its result is used by the outer query
- WHERE subquery: Used for dynamic filtering with operators like
IN,=,> - FROM subquery (derived table): Acts as a temporary data source; must have an alias; suitable for "aggregate first, analyze later"
- SELECT scalar subquery: Acts as a computed column, executing once for each outer row
- EXISTS/NOT EXISTS: Only cares whether the subquery has results; suitable for "find existing/missing associations"
- Subquery vs JOIN: Prioritize readability; use EXPLAIN to analyze performance issues
📝 Exercises
Exercise 1 (⭐): Use a subquery to find all employees in the "Technology" department (hint: first use a subquery to find the Technology department's ID).
Exercise 2 (⭐⭐): Use a FROM subquery (derived table) to calculate each customer's order count and total spending, then filter customers whose total spending exceeds 5000.
Exercise 3 (⭐⭐⭐): Use EXISTS to find "customers who have ordered all products"—i.e., customers for whom no "unordered product" exists (hint: double NOT EXISTS logic, or implement another way).
Next Lesson
👉 10-set-operations - Set Operations: Learn UNION, UNION ALL, INTERSECT, and EXCEPT, and master set operations on multiple query results!



