Query Optimization
Query Optimization
💡 Life Analogy
Imagine you're looking for a book in a library:
- Full table scan: Search through every shelf in the library →slow
- Index: First check the catalog card, find the shelf number and go directly →fast
- Execution plan: The librarian tells you in advance "searching by author index is faster than by title" →EXPLAIN
- Slow query log: The library records "books that took more than 10 minutes to find" for shelf optimization →slow query log
- Query rewrite: Instead of asking "all books containing 'science'," first narrow down by category then search precisely →optimize the query approach
📖 Core Concepts
1. EXPLAIN Execution Plan
EXPLAIN is the core tool for query optimization. It tells you how the database executes a SQL statement.
EXPLAIN SELECT * FROM users WHERE username = 'alice';
Key Field Interpretation:
| Field | Meaning | Focus |
|---|---|---|
type |
Access type | ALL(full scan) →index →range →ref →eq_ref →const, the further right the better |
key |
Index actually used | NULL means no index used |
rows |
Estimated rows scanned | The smaller the better |
Extra |
Additional info | Using filesort(sorting needed), Using temporary(temp table needed) require attention |
possible_keys |
Potential indexes | Helps analyze whether indexes are being selected |
-- View execution plan
EXPLAIN SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name;
-- MySQL 8.0+ can view actual execution statistics
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 100;
Type Access Types Explained:
From worst to best:
ALL → Full table scan (must optimize)
index → Full index scan (slightly better than ALL)
range → Index range scan (WHERE id > 100)
ref → Non-unique index lookup (WHERE username = 'alice')
eq_ref → Unique index lookup (JOIN on primary key)
const → Constant lookup (WHERE id = 1, fastest)
system →System table (rarely appears)
2. Index Optimization Strategies
-- 1. Create indexes for commonly used WHERE, JOIN, ORDER BY fields
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_order_user_created ON orders(user_id, created_at);
-- 2. Covering index: query fields are all in the index, no table lookup needed
-- If frequently executing: SELECT id, user_id, created_at FROM orders WHERE user_id = ?
CREATE INDEX idx_order_covering ON orders(user_id, created_at, id);
-- 3. Prefix index: index only the first N characters of long string fields
CREATE INDEX idx_user_name_prefix ON users(username(10));
-- 4. Composite index follows the leftmost prefix rule
CREATE INDEX idx_abc ON table_name(a, b, c);
-- Can match: WHERE a=1 | WHERE a=1 AND b=2 | WHERE a=1 AND b=2 AND c=3
-- Cannot match: WHERE b=2 | WHERE c=3 | WHERE b=2 AND c=3
-- 5. View index usage for a table
SHOW INDEX FROM orders;
-- 6. View index cardinality (higher cardinality = better discrimination)
SELECT
INDEX_NAME,
COLUMN_NAME,
CARDINALITY
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'orders';
3. Avoiding Full Table Scans
-- ✅Patterns that cause full table scans
-- 1. Using functions on indexed columns
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- ✅Rewrite as range query
SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- 2. Performing arithmetic on indexed columns
SELECT * FROM orders WHERE id + 1 = 100;
-- ✅Rewrite as
SELECT * FROM orders WHERE id = 99;
-- 3. Implicit type conversion (phone is VARCHAR, passing INT)
SELECT * FROM users WHERE phone = 13800138000;
-- ✅Rewrite as
SELECT * FROM users WHERE phone = '13800138000';
-- 4. LIKE with leading wildcard
SELECT * FROM users WHERE name LIKE '%alice%';
-- ✅If fuzzy search is necessary, consider full-text index
ALTER TABLE users ADD FULLTEXT INDEX ft_name(name);
SELECT * FROM users WHERE MATCH(name) AGAINST('alice' IN BOOLEAN MODE);
-- 5. OR conditions may cause index to fail
SELECT * FROM users WHERE status = 1 OR age > 25;
-- ✅Use UNION instead
SELECT * FROM users WHERE status = 1
UNION
SELECT * FROM users WHERE age > 25;
-- 6. NOT IN / NOT EXISTS may cause full table scan
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- ✅Use LEFT JOIN + IS NULL instead
SELECT u.* FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL;
4. Query Rewriting Techniques
-- 1. Use EXISTS instead of IN (more efficient for large datasets)
-- Slow
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);
-- Fast
SELECT * FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);
-- 2. Use UNION ALL instead of UNION (when deduplication isn't needed)
-- UNION deduplicates and sorts; UNION ALL does not
SELECT name FROM users_2023 UNION ALL SELECT name FROM users_2024;
-- 3. Avoid SELECT *, only select needed columns
-- Slow
SELECT * FROM orders WHERE user_id = 1;
-- Fast
SELECT id, order_no, total_amount, status FROM orders WHERE user_id = 1;
-- 4. Pagination optimization (deep pagination problem)
-- Slow (OFFSET 100000 requires scanning 100100 rows)
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 100000;
-- Fast (cursor pagination, remember the last id from the previous page)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 10;
-- 5. Batch operations instead of loop operations
-- Slow (loop inserts)
INSERT INTO logs (msg) VALUES ('a');
INSERT INTO logs (msg) VALUES ('b');
INSERT INTO logs (msg) VALUES ('c');
-- Fast (batch insert)
INSERT INTO logs (msg) VALUES ('a'), ('b'), ('c');
-- 6. Avoid using <> or != on indexed columns in WHERE clauses
SELECT * FROM users WHERE status != 0;
-- ✅If there are only a few status values, rewrite as
SELECT * FROM users WHERE status IN (1, 2, 3);
5. Slow Query Log
-- View slow query log configuration
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries exceeding 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON'; -- Log queries not using indexes
-- Analyze slow query log
-- Using mysqldumpslow tool
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow.log
-- Using pt-query-digest (Percona Toolkit)
-- pt-query-digest /var/log/mysql/slow.log
6. Tuning Checklist
| Step | Action | Tool |
|---|---|---|
| 1 | Enable slow query log, find slow SQL | slow_query_log |
| 2 | Analyze execution plan with EXPLAIN | EXPLAIN |
| 3 | Check if indexes are being hit | type field |
| 4 | Check for filesort/temporary | Extra field |
| 5 | Rewrite SQL or add indexes | DDL / SQL rewrite |
| 6 | Verify optimization results | Compare execution times |
7. Common Performance Pitfalls
| Pitfall | Description | Solution |
|---|---|---|
| SELECT * | Selects all columns, cannot use covering index | Select only needed columns |
| Large OFFSET | Deep pagination scans large amounts of data | Cursor pagination |
| N+1 queries | Querying related data one by one in a loop | Batch query or JOIN |
| JOIN without indexes | Large table joins without indexes | Add indexes on JOIN columns |
| Large transactions | Holding locks for too long | Reduce transaction scope |
| Wrong data type | Using VARCHAR for numbers | Choose appropriate types |
💡 Basic Syntax
-- EXPLAIN basic usage
EXPLAIN SELECT ...;
EXPLAIN ANALYZE SELECT ...; -- MySQL 8.0+
-- View query execution time
SET profiling = 1;
SELECT ...;
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;
-- View indexes
SHOW INDEX FROM table_name;
-- Force use of a specific index
SELECT * FROM orders FORCE INDEX (idx_user_id) WHERE user_id = 100;
-- Ignore index (for comparison testing)
SELECT * FROM orders IGNORE INDEX (idx_user_id) WHERE user_id = 100;
EXPLAIN is your best friend.
Database Dialect Comparison
Different databases have significant syntax differences. Here's a comparison of common operations:
LIMIT Pagination
-- MySQL / PostgreSQL / SQLite
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 20;
-- SQL Server
SELECT * FROM orders ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
-- Before SQL Server 2012
SELECT TOP 10 * FROM orders WHERE id NOT IN (SELECT TOP 20 id FROM orders ORDER BY id);
-- Oracle 12c+
SELECT * FROM orders ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
-- Before Oracle 12c
SELECT * FROM (SELECT o.*, ROWNUM rn FROM orders o WHERE ROWNUM <= 30) WHERE rn > 20;
Auto-Increment Primary Key
-- MySQL
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50));
-- Or use default value
INSERT INTO users (name) VALUES ('Alice'); -- id auto-generated
-- PostgreSQL
CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR(50));
-- PostgreSQL 10+
CREATE TABLE users (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name VARCHAR(50));
-- SQLite
CREATE TABLE users (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT);
-- SQLite also supports ROWID
INSERT INTO users (name) VALUES ('Alice');
-- SQL Server
CREATE TABLE users (id INT IDENTITY(1,1) PRIMARY KEY, name NVARCHAR(50));
-- Oracle
CREATE TABLE users (id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name VARCHAR2(50));
String Functions
-- String concatenation
SELECT CONCAT('Hello', ' ', 'World'); -- MySQL, PostgreSQL
SELECT 'Hello' || ' ' || 'World'; -- PostgreSQL, SQLite, Oracle
SELECT name + ' ' + email FROM users; -- SQL Server
-- Substring
SELECT SUBSTRING('Hello World', 1, 5); -- MySQL, SQL Server
SELECT SUBSTR('Hello World', 1, 5); -- PostgreSQL, SQLite, Oracle
-- String length
SELECT LENGTH('Hello'); -- MySQL, PostgreSQL, SQLite
SELECT LEN('Hello'); -- SQL Server
-- Case conversion
SELECT UPPER('hello'), LOWER('HELLO'); -- All databases
SELECT UCASE('hello'), LCASE('HELLO'); -- MySQL additionally supports
-- Trim whitespace
SELECT TRIM(' Hello '); -- All databases
SELECT LTRIM(' Hello'), RTRIM('Hello '); -- MySQL, SQL Server, PostgreSQL
-- Replace
SELECT REPLACE('Hello World', 'World', 'SQL'); -- All databases
Date Functions
-- Current time
SELECT NOW(); -- MySQL, PostgreSQL
SELECT CURRENT_TIMESTAMP; -- All databases
SELECT GETDATE(); -- SQL Server
SELECT datetime('now'); -- SQLite
-- Date arithmetic
SELECT DATE_ADD('2024-01-01', INTERVAL 30 DAY); -- MySQL
SELECT '2024-01-01'::DATE + INTERVAL '30 days'; -- PostgreSQL
SELECT DATEADD(DAY, 30, '2024-01-01'); -- SQL Server
SELECT date('2024-01-01', '+30 days'); -- SQLite
-- Extract year/month/day
SELECT YEAR(created_at), MONTH(created_at), DAY(created_at) FROM orders; -- MySQL, SQL Server
SELECT EXTRACT(YEAR FROM created_at) FROM orders; -- PostgreSQL, MySQL 8.0+
SELECT strftime('%Y', created_at) FROM orders; -- SQLite
-- Date formatting
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') FROM orders; -- MySQL
SELECT TO_CHAR(created_at, 'YYYY-MM-DD') FROM orders; -- PostgreSQL, Oracle
SELECT FORMAT(created_at, 'yyyy-MM-dd') FROM orders; -- SQL Server
SELECT strftime('%Y-%m-%d', created_at) FROM orders; -- SQLite
Conditional Expressions
-- IF expression
SELECT IF(score >= 60, 'Pass', 'Fail') FROM exams; -- MySQL
SELECT IIF(score >= 60, 'Pass', 'Fail') FROM exams; -- SQL Server
SELECT CASE WHEN score >= 60 THEN 'Pass' ELSE 'Fail' END FROM exams; -- All databases
-- COALESCE (returns the first non-NULL value)
SELECT COALESCE(nickname, username, 'Anonymous') FROM users; -- All databases
-- NULLIF (returns NULL when equal)
SELECT NULLIF(a, b); -- All databases
Boolean Type
-- MySQL: No native BOOLEAN, use TINYINT(1) instead
CREATE TABLE users (is_active TINYINT(1) DEFAULT 1);
SELECT * FROM users WHERE is_active = TRUE; -- TRUE equals 1
-- PostgreSQL: Native BOOLEAN
CREATE TABLE users (is_active BOOLEAN DEFAULT TRUE);
SELECT * FROM users WHERE is_active = TRUE;
-- SQLite: No native BOOLEAN, use INTEGER
CREATE TABLE users (is_active INTEGER DEFAULT 1);
SELECT * FROM users WHERE is_active = 1;
-- SQL Server: Native BIT
CREATE TABLE users (is_active BIT DEFAULT 1);
SELECT * FROM users WHERE is_active = 1;
UPSERT (Update if exists, insert if not)
-- MySQL
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON DUPLICATE KEY UPDATE view_count = view_count + 1;
-- PostgreSQL
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON CONFLICT (article_id) DO UPDATE SET view_count = stats.view_count + 1;
-- SQLite
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON CONFLICT(article_id) DO UPDATE SET view_count = view_count + 1;
-- SQL Server
MERGE INTO stats AS target
USING (SELECT 1 AS article_id, 1 AS view_count) AS source
ON target.article_id = source.article_id
WHEN MATCHED THEN UPDATE SET view_count = target.view_count + 1
WHEN NOT MATCHED THEN INSERT (article_id, view_count) VALUES (source.article_id, source.view_count);
Window Function Support
-- All major databases support them (MySQL 8.0+, PostgreSQL, SQL Server, SQLite 3.25+)
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS ranking,
ROW_NUMBER() OVER (PARTITION BY class_id ORDER BY score DESC) AS class_rank
FROM students;
-- MySQL 5.7 and below don't support window functions; use variables to simulate
-- GROUP_CONCAT / STRING_AGG
SELECT category_id, GROUP_CONCAT(name SEPARATOR ',') FROM products GROUP BY category_id; -- MySQL
SELECT category_id, STRING_AGG(name, ',') FROM products GROUP BY category_id; -- PostgreSQL
Example: Locate and Optimize a Slow Query (Difficulty 🔥
Original slow query:
-- Query total order amount per user in the last 30 days (assume users table has 100K rows, orders table has 1M rows)
EXPLAIN
SELECT u.name, u.email, SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY u.id, u.name, u.email
ORDER BY total_spent DESC
LIMIT 20;
EXPLAIN analysis reveals:
type: ALL(full table scan on orders)rows: 1000000(scanning millions of rows)Extra: Using temporary; Using filesort
Optimization steps:
-- 1. Add composite index on orders table
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- 2. Optimize query: filter first, then join
SELECT u.name, u.email, sub.total_spent
FROM users u
INNER JOIN (
SELECT user_id, SUM(total_amount) AS total_spent
FROM orders
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 20
) sub ON u.id = sub.user_id
ORDER BY sub.total_spent DESC;
-- 3. EXPLAIN again to confirm optimization effect
EXPLAIN SELECT u.name, u.email, sub.total_spent ...
-- type: ref, rows: significantly reduced
Example: Deep Pagination Optimization (Difficulty ⭐⭐)
Problem query:
-- Query page 10000 (20 per page), OFFSET 200000
SELECT id, title, created_at
FROM articles
WHERE status = 1
ORDER BY created_at DESC
LIMIT 20 OFFSET 200000;
-- Even with an index, requires scanning 200020 rows, very slow
Solution 1: Cursor pagination (recommended)
-- Remember the last created_at and id from the previous page
-- Assume last item on previous page: created_at='2024-03-15 10:30:00', id=50001
SELECT id, title, created_at
FROM articles
WHERE status = 1
AND (created_at < '2024-03-15 10:30:00'
OR (created_at = '2024-03-15 10:30:00' AND id < 50001))
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Only scans 20 rows, extremely fast
Solution 2: Deferred join
-- First query primary keys, then join for full data
SELECT a.id, a.title, a.created_at
FROM articles a
INNER JOIN (
SELECT id FROM articles
WHERE status = 1
ORDER BY created_at DESC
LIMIT 20 OFFSET 200000
) b ON a.id = b.id;
-- Subquery uses covering index; main query fetches data by primary key
Solution 3: Business-layer optimization
-- If allowed, limit maximum pagination depth
-- Only allow viewing the first 1000 results, prompt users to use search to narrow down
SELECT id, title, created_at
FROM articles
WHERE status = 1
AND category_id = 5 -- Add filter conditions
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;
🔧 Scenario 1: E-commerce Product List Query Optimization
-- Original query: multi-condition filtering + sorting + pagination
SELECT p.id, p.name, p.price, p.sales_count, c.name AS category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
WHERE p.status = 1
AND p.category_id IN (10, 11, 12, 13)
AND p.price BETWEEN 50 AND 500
ORDER BY p.sales_count DESC
LIMIT 20;
-- Optimization:
-- 1. Create composite index covering main filter conditions
CREATE INDEX idx_product_filter ON products(status, category_id, price, sales_count);
-- 2. If EXPLAIN shows filesort, adjust index order
CREATE INDEX idx_product_sort ON products(status, category_id, sales_count DESC, price);
-- 3. If categories table is small, denormalize category name into products table
ALTER TABLE products ADD COLUMN category_name VARCHAR(50);
-- Maintain during INSERT/UPDATE
🔧 Scenario 2: Statistical Report Query Optimization
-- Original query: monthly order statistics (large dataset, slow every time)
SELECT
DATE_FORMAT(created_at, '%Y-%m') AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM orders
WHERE created_at >= '2023-01-01'
GROUP BY month
ORDER BY month;
-- Optimization option 1: Pre-computed summary table
CREATE TABLE monthly_order_stats (
month_key VARCHAR(7) PRIMARY KEY COMMENT 'Format: 2024-01',
order_count INT NOT NULL DEFAULT 0,
revenue DECIMAL(15,2) NOT NULL DEFAULT 0,
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- Scheduled job updates daily (incremental calculation)
INSERT INTO monthly_order_stats (month_key, order_count, revenue)
SELECT
DATE_FORMAT(created_at, '%Y-%m'),
COUNT(*),
SUM(total_amount)
FROM orders
WHERE created_at >= CURDATE() - INTERVAL 1 DAY
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
revenue = VALUES(revenue),
updated_at = NOW();
-- Query summary table (millisecond response)
SELECT * FROM monthly_order_stats ORDER BY month_key;
✅FAQ
Q: Are more indexes always better? A: No. Indexes consume storage space and slow down INSERT/UPDATE/DELETE operations (every write operation must update indexes). Only create indexes for queries that truly need acceleration, and regularly clean up unused indexes.
Q: How should I choose the field order for composite indexes? A: Place high-cardinality fields first (e.g., user_id has higher cardinality than status); place equality query fields before range query fields; decide based on actual query condition combinations.
Q: My query is already fast, should I still optimize? A: If response time is within acceptable range, over-optimization isn't necessary. But be aware that performance may degrade as data grows; plan ahead with load testing and index strategy.
Q: Are the rows shown by EXPLAIN accurate? A: Not entirely accurate; they're estimates based on statistics. Actual execution may scan more or fewer rows. Running
ANALYZE TABLEto update statistics can improve estimate accuracy.
📖 Summary
This lesson systematically covered SQL query optimization methods:
- EXPLAIN is the core tool for analyzing execution plans; focus on
type,key,rows,Extra - Index optimization: Create indexes wisely, follow the leftmost prefix rule, use covering indexes
- Avoid full table scans: Don't use functions/arithmetic on indexed columns, watch for implicit type conversions, use LIKE %xxx% cautiously
- Query rewriting: EXISTS instead of IN, cursor pagination instead of large OFFSET, batch operations instead of loops
- Slow query logs: Identify bottleneck SQL for targeted optimization
- Database dialects: MySQL/PostgreSQL/SQLite/SQL Server have significant syntax differences; be aware when writing cross-database SQL
📝 Exercises
- Run EXPLAIN analysis on the following SQL and optimize it:SQL
SELECT * FROM orders WHERE YEAR(created_at) = 2024 AND user_id IN (SELECT id FROM users WHERE status = 1); - Design an appropriate index strategy for an article list page (supporting category filtering, time sorting, and pagination).
- Rewrite a query using
LIMIT 20 OFFSET 100000to use cursor pagination.
Next Lesson →28-project.md



