Subqueries

Subqueries

Imagine you're at a supermarket and ask the clerk: "What's the most expensive item in your store?" The clerk checks and says: "It's a MacBook Pro." Then you follow up: "Who are the customers that bought this product?"—What you just did is a subquery: using the answer to one question to ask the next. Subqueries in SQL work the same way: use the result of one query as a condition or data source for another query.


1. Core Concepts

What is a Subquery?

A subquery (also called an inner query or nested query) is a query embedded within another SQL statement. The outer query is called the main query, and the inner query is called the subquery. The subquery executes first, and its result is used by the main query.

SQL
-- Subquery: find the employee with the highest salary
SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

Execution order: First execute SELECT MAX(salary) FROM employees to get 20000, then execute the outer query to find the employee(s) with salary = 20000.

Subquery Classification

Based on where the subquery appears, there are three main types:

Type Location Purpose Returns
WHERE subquery In the WHERE clause Filter data Single value or list
FROM subquery (derived table) In the FROM clause Acts as a temporary table Result set
SELECT subquery (scalar subquery) In the SELECT column list Acts as a computed column Single value

WHERE Subquery

The most common usage. The subquery appears in the WHERE clause, dynamically determining the filter condition.

SQL
-- Find employees whose salary is above average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Subqueries can return:

SQL
-- Return multiple values: match with IN
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE location = 'Beijing'
);

FROM Subquery (Derived Table)

The subquery appears in the FROM clause, effectively creating a temporary table (also called a derived table). It must be given an alias.

SQL
-- First calculate each department's average salary, then filter from it
SELECT dept_name, avg_salary
FROM (
    SELECT d.name AS dept_name, AVG(e.salary) AS avg_salary
    FROM employees e
    JOIN departments d ON e.department_id = d.id
    GROUP BY d.name
) AS dept_avg
WHERE avg_salary > 12000;
💡 Tip: FROM subqueries must have an alias (like AS dept_avg above), otherwise SQL will throw an error.

SELECT Subquery (Scalar Subquery)

The subquery appears in the SELECT column list as a computed column. Every time the main query processes a row, the subquery executes once.

SQL
-- Display each employee's salary and the company average salary
SELECT name, salary,
       (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;

Output:

TEXT
name    salary    avg_salary
------  --------  ----------
Zhang San 15000.00  14000.00
Li Si   18000.00  14000.00
Wang Wu 12000.00  14000.00
Zhao Liu 13000.00  14000.00
Qian Qi 20000.00  14000.00
Sun Ba  11000.00  14000.00
Zhou Jiu 9000.00   14000.00
Wu Shi  14000.00  14000.00
💡 Tip: A scalar subquery must return exactly one row and one column, otherwise it will throw an error. It's suitable for adding "summary information" columns.

EXISTS and NOT EXISTS

EXISTS checks whether a subquery returns at least one row. It doesn't care what values are returned—only whether "there are results."

SQL
-- Find departments that have employees
SELECT d.name
FROM departments d
WHERE EXISTS (
    SELECT 1 FROM employees e WHERE e.department_id = d.id
);
SQL
-- Find departments with no employees
SELECT d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1 FROM employees e WHERE e.department_id = d.id
);
💡 Tip: SELECT 1 in EXISTS is a conventional practice because EXISTS only cares about "whether rows exist," not the column values. Using SELECT * or SELECT NULL has the same effect.

Subquery vs JOIN Performance Comparison

Comparison Item Subquery JOIN
Readability Closer to natural language thinking Requires understanding join logic
Performance Little difference in simple cases; correlated subqueries may be slower Generally better, with specialized optimizer
Flexibility Can use aggregate function results as conditions Requires GROUP BY
Recommended for "Find the largest," "find non-existent" "Display with association," "multi-table merge"
💡 Tip: Modern databases (like PostgreSQL, MySQL 8.0) have intelligent optimizers, and the performance difference between subqueries and JOINs is negligible in many scenarios. Prioritize better readability, and use EXPLAIN to analyze and optimize when performance issues arise.


2. Basic Syntax/Usage

WHERE Subquery Syntax

SQL
-- Scalar subquery (returns single value)
SELECT column_name FROM table_name
WHERE column_name comparison_operator (SELECT aggregate_function FROM table_name);

-- Multi-row subquery (returns list)
SELECT column_name FROM table_name
WHERE column_name IN (SELECT column_name FROM table_name WHERE condition);

FROM Subquery Syntax

SQL
SELECT column_name
FROM (SELECT ... FROM ... WHERE ...) AS alias
WHERE condition;
💡 Tip: The alias for a FROM subquery is required. Different databases have different requirements, but adding an alias is the safest approach.

SELECT Scalar Subquery Syntax

SQL
SELECT column1, column2,
       (SELECT aggregate_function FROM table_name WHERE condition) AS alias
FROM table_name;
💡 Tip: If a scalar subquery needs to reference columns from the outer query (correlated subquery), you can use the table alias directly from the outer query.

EXISTS Syntax

SQL
SELECT column_name FROM table_A a
WHERE EXISTS (SELECT 1 FROM table_B b WHERE b.foreign_key = a.primary_key);
💡 Tip: EXISTS is a "correlated subquery"—the subquery references columns from the outer query (like a.id), so the subquery executes once for each row in the outer query.

💡 Tip: For large datasets, EXISTS is typically more efficient than IN, because EXISTS stops as soon as it finds the first match, while IN needs to return all results.


Example: Find Employees With Above-Average Salary (Difficulty ⭐)

SQL
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;
▶ Try it Yourself

Output:

TEXT
name    salary
------  --------
Qian Qi 20000.00
Li Si   18000.00
Zhang San 15000.00
Wu Shi  14000.00

First calculate the average salary (approximately 14000), then filter employees above average.


Example: Find the Highest-Paid Employee in Each Department (Difficulty ⭐⭐)

SQL
SELECT e.name, d.name AS department, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE e.salary = (
    SELECT MAX(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);
▶ Try it Yourself

Output:

TEXT
name    department  salary
------  ----------  --------
Qian Qi Technology  20000.00
Wang Wu Marketing   12000.00
Wu Shi  Finance     14000.00
💡 Key: This is a correlated subquerye.department_id in the subquery references the current row of the outer query. The subquery executes once for each outer row, finding the highest salary in that employee's department, then checks if the current employee is the highest-paid.


Example: Use Derived Table to Analyze Department Salary Levels (Difficulty ⭐⭐⭐)

SQL
SELECT dept_name, emp_count, avg_salary,
       CASE 
           WHEN avg_salary >= 15000 THEN 'High Salary'
           WHEN avg_salary >= 12000 THEN 'Medium Salary'
           ELSE 'Needs Improvement'
       END AS level
FROM (
    SELECT d.name AS dept_name, 
           COUNT(e.id) AS emp_count,
           AVG(e.salary) AS avg_salary
    FROM departments d
    LEFT JOIN employees e ON d.id = e.department_id
    GROUP BY d.id, d.name
) AS dept_stats
WHERE emp_count > 0
ORDER BY avg_salary DESC;
▶ Try it Yourself

Output:

TEXT
dept_name  emp_count  avg_salary  level
---------  ---------  ----------  ----------
Technology 3          17666.67    High Salary
Finance    2          13500.00    Medium Salary
Marketing  2          11500.00    Needs Improvement
💡 Approach: The FROM subquery first calculates each department's employee count and average salary, then the outer query classifies and filters these statistics. This "aggregate first, analyze later" pattern is very common in report development.


3. Common Use Cases

Case 1: Find Products Never Ordered

SQL
SELECT p.name, p.category, p.price
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.product_id = p.id
);

Output (depends on data):

TEXT
name      category  price
--------  --------  -------
Mac Mini  Computer  4499.00
💡 Comparison: The same requirement implemented with LEFT JOIN:

SQL
SELECT p.name, p.category, p.price
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
WHERE o.id IS NULL;

Both approaches have similar performance—choose the one with better readability.

Case 2: Find Customers Whose Order Amount Exceeds the Average

SQL
SELECT customer_name, total_spent
FROM (
    SELECT o.customer_name, 
           SUM(o.quantity * p.price) AS total_spent
    FROM orders o
    JOIN products p ON o.product_id = p.id
    GROUP BY o.customer_name
) AS customer_totals
WHERE total_spent > (
    SELECT AVG(o.quantity * p.price)
    FROM orders o
    JOIN products p ON o.product_id = p.id
);

Output:

TEXT
customer_name  total_spent
-------------  -----------
Xiao Li        14397.00
Xiao Gang      17998.00
Xiao Wang      11998.00
💡 Approach: The FROM subquery calculates each customer's total spending, the WHERE subquery calculates the average order amount, and the outer query filters customers above the average.


❓ FAQ

Q: How many levels can subqueries be nested? A: There's no theoretical limit, but in practice, it's recommended not to exceed 3 levels. Too many levels indicate overly complex query logic—consider splitting into multiple steps or rewriting with JOINs.

Q: Should I use IN or EXISTS? A: When the subquery result set is small and the outer table is large, IN is more efficient; when the outer table is small and the subquery table is large, EXISTS is more efficient (because EXISTS stops at the first match). The difference is minimal for simple scenarios—use EXPLAIN to compare for complex cases.

Q: What happens if a scalar subquery returns multiple rows? A: The database will throw an error. A scalar subquery must return exactly one row and one column. If you're unsure about the row count, use LIMIT 1 or an aggregate function (like MAX, MIN) to ensure a single value.

Q: What's the difference between correlated and non-correlated subqueries? A: A non-correlated subquery executes independently once (e.g., "find average salary"), and its result is used by the main query. A correlated subquery references columns from the outer query and executes once for each outer row (e.g., "find the highest salary in each department"). Correlated subqueries may be slow with large datasets—watch for performance.


📖 Summary


📝 Exercises

Exercise 1 (⭐): Use a subquery to find all employees in the "Technology" department (hint: first use a subquery to find the Technology department's ID).

Exercise 2 (⭐⭐): Use a FROM subquery (derived table) to calculate each customer's order count and total spending, then filter customers whose total spending exceeds 5000.

Exercise 3 (⭐⭐⭐): Use EXISTS to find "customers who have ordered all products"—i.e., customers for whom no "unordered product" exists (hint: double NOT EXISTS logic, or implement another way).


Next Lesson

👉 10-set-operations - Set Operations: Learn UNION, UNION ALL, INTERSECT, and EXCEPT, and master set operations on multiple query results!

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏