Query Optimization

Query Optimization

💡 Life Analogy

Imagine you're looking for a book in a library:

📖 Core Concepts

1. EXPLAIN Execution Plan

EXPLAIN is the core tool for query optimization. It tells you how the database executes a SQL statement.

SQL
EXPLAIN SELECT * FROM users WHERE username = 'alice';

Key Field Interpretation:

Field Meaning Focus
type Access type ALL(full scan) →indexrangerefeq_refconst, the further right the better
key Index actually used NULL means no index used
rows Estimated rows scanned The smaller the better
Extra Additional info Using filesort(sorting needed), Using temporary(temp table needed) require attention
possible_keys Potential indexes Helps analyze whether indexes are being selected
SQL
-- View execution plan
EXPLAIN SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name;

-- MySQL 8.0+ can view actual execution statistics
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 100;

Type Access Types Explained:

TEXT
From worst to best:
ALL         → Full table scan (must optimize)
index       → Full index scan (slightly better than ALL)
range       → Index range scan (WHERE id > 100)
ref         → Non-unique index lookup (WHERE username = 'alice')
eq_ref      → Unique index lookup (JOIN on primary key)
const       → Constant lookup (WHERE id = 1, fastest)
system      →System table (rarely appears)

2. Index Optimization Strategies

SQL
-- 1. Create indexes for commonly used WHERE, JOIN, ORDER BY fields
CREATE INDEX idx_user_email ON users(email);
CREATE INDEX idx_order_user_created ON orders(user_id, created_at);

-- 2. Covering index: query fields are all in the index, no table lookup needed
-- If frequently executing: SELECT id, user_id, created_at FROM orders WHERE user_id = ?
CREATE INDEX idx_order_covering ON orders(user_id, created_at, id);

-- 3. Prefix index: index only the first N characters of long string fields
CREATE INDEX idx_user_name_prefix ON users(username(10));

-- 4. Composite index follows the leftmost prefix rule
CREATE INDEX idx_abc ON table_name(a, b, c);
-- Can match: WHERE a=1 | WHERE a=1 AND b=2 | WHERE a=1 AND b=2 AND c=3
-- Cannot match: WHERE b=2 | WHERE c=3 | WHERE b=2 AND c=3

-- 5. View index usage for a table
SHOW INDEX FROM orders;

-- 6. View index cardinality (higher cardinality = better discrimination)
SELECT
    INDEX_NAME,
    COLUMN_NAME,
    CARDINALITY
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = 'orders';

3. Avoiding Full Table Scans

SQL
-- ✅Patterns that cause full table scans

-- 1. Using functions on indexed columns
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- ✅Rewrite as range query
SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- 2. Performing arithmetic on indexed columns
SELECT * FROM orders WHERE id + 1 = 100;
-- ✅Rewrite as
SELECT * FROM orders WHERE id = 99;

-- 3. Implicit type conversion (phone is VARCHAR, passing INT)
SELECT * FROM users WHERE phone = 13800138000;
-- ✅Rewrite as
SELECT * FROM users WHERE phone = '13800138000';

-- 4. LIKE with leading wildcard
SELECT * FROM users WHERE name LIKE '%alice%';
-- ✅If fuzzy search is necessary, consider full-text index
ALTER TABLE users ADD FULLTEXT INDEX ft_name(name);
SELECT * FROM users WHERE MATCH(name) AGAINST('alice' IN BOOLEAN MODE);

-- 5. OR conditions may cause index to fail
SELECT * FROM users WHERE status = 1 OR age > 25;
-- ✅Use UNION instead
SELECT * FROM users WHERE status = 1
UNION
SELECT * FROM users WHERE age > 25;

-- 6. NOT IN / NOT EXISTS may cause full table scan
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- ✅Use LEFT JOIN + IS NULL instead
SELECT u.* FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.id IS NULL;

4. Query Rewriting Techniques

SQL
-- 1. Use EXISTS instead of IN (more efficient for large datasets)
-- Slow
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);
-- Fast
SELECT * FROM users u WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000
);

-- 2. Use UNION ALL instead of UNION (when deduplication isn't needed)
-- UNION deduplicates and sorts; UNION ALL does not
SELECT name FROM users_2023 UNION ALL SELECT name FROM users_2024;

-- 3. Avoid SELECT *, only select needed columns
-- Slow
SELECT * FROM orders WHERE user_id = 1;
-- Fast
SELECT id, order_no, total_amount, status FROM orders WHERE user_id = 1;

-- 4. Pagination optimization (deep pagination problem)
-- Slow (OFFSET 100000 requires scanning 100100 rows)
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 100000;
-- Fast (cursor pagination, remember the last id from the previous page)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 10;

-- 5. Batch operations instead of loop operations
-- Slow (loop inserts)
INSERT INTO logs (msg) VALUES ('a');
INSERT INTO logs (msg) VALUES ('b');
INSERT INTO logs (msg) VALUES ('c');
-- Fast (batch insert)
INSERT INTO logs (msg) VALUES ('a'), ('b'), ('c');

-- 6. Avoid using <> or != on indexed columns in WHERE clauses
SELECT * FROM users WHERE status != 0;
-- ✅If there are only a few status values, rewrite as
SELECT * FROM users WHERE status IN (1, 2, 3);

5. Slow Query Log

SQL
-- View slow query log configuration
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries exceeding 1 second
SET GLOBAL log_queries_not_using_indexes = 'ON';  -- Log queries not using indexes

-- Analyze slow query log
-- Using mysqldumpslow tool
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow.log

-- Using pt-query-digest (Percona Toolkit)
-- pt-query-digest /var/log/mysql/slow.log

6. Tuning Checklist

Step Action Tool
1 Enable slow query log, find slow SQL slow_query_log
2 Analyze execution plan with EXPLAIN EXPLAIN
3 Check if indexes are being hit type field
4 Check for filesort/temporary Extra field
5 Rewrite SQL or add indexes DDL / SQL rewrite
6 Verify optimization results Compare execution times

7. Common Performance Pitfalls

Pitfall Description Solution
SELECT * Selects all columns, cannot use covering index Select only needed columns
Large OFFSET Deep pagination scans large amounts of data Cursor pagination
N+1 queries Querying related data one by one in a loop Batch query or JOIN
JOIN without indexes Large table joins without indexes Add indexes on JOIN columns
Large transactions Holding locks for too long Reduce transaction scope
Wrong data type Using VARCHAR for numbers Choose appropriate types

💡 Basic Syntax

SQL
-- EXPLAIN basic usage
EXPLAIN SELECT ...;
EXPLAIN ANALYZE SELECT ...;  -- MySQL 8.0+

-- View query execution time
SET profiling = 1;
SELECT ...;
SHOW PROFILES;
SHOW PROFILE FOR QUERY 1;

-- View indexes
SHOW INDEX FROM table_name;

-- Force use of a specific index
SELECT * FROM orders FORCE INDEX (idx_user_id) WHERE user_id = 100;

-- Ignore index (for comparison testing)
SELECT * FROM orders IGNORE INDEX (idx_user_id) WHERE user_id = 100;
💡 Tip: Measure before optimizing, measure after optimizing. Don't optimize by feeling; let data speak. EXPLAIN is your best friend.

Database Dialect Comparison

Different databases have significant syntax differences. Here's a comparison of common operations:

LIMIT Pagination

SQL
-- MySQL / PostgreSQL / SQLite
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 20;

-- SQL Server
SELECT * FROM orders ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
-- Before SQL Server 2012
SELECT TOP 10 * FROM orders WHERE id NOT IN (SELECT TOP 20 id FROM orders ORDER BY id);

-- Oracle 12c+
SELECT * FROM orders ORDER BY id OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
-- Before Oracle 12c
SELECT * FROM (SELECT o.*, ROWNUM rn FROM orders o WHERE ROWNUM <= 30) WHERE rn > 20;

Auto-Increment Primary Key

SQL
-- MySQL
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50));
-- Or use default value
INSERT INTO users (name) VALUES ('Alice');  -- id auto-generated

-- PostgreSQL
CREATE TABLE users (id SERIAL PRIMARY KEY, name VARCHAR(50));
-- PostgreSQL 10+
CREATE TABLE users (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name VARCHAR(50));

-- SQLite
CREATE TABLE users (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT);
-- SQLite also supports ROWID
INSERT INTO users (name) VALUES ('Alice');

-- SQL Server
CREATE TABLE users (id INT IDENTITY(1,1) PRIMARY KEY, name NVARCHAR(50));

-- Oracle
CREATE TABLE users (id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name VARCHAR2(50));

String Functions

SQL
-- String concatenation
SELECT CONCAT('Hello', ' ', 'World');           -- MySQL, PostgreSQL
SELECT 'Hello' || ' ' || 'World';                -- PostgreSQL, SQLite, Oracle
SELECT name + ' ' + email FROM users;            -- SQL Server

-- Substring
SELECT SUBSTRING('Hello World', 1, 5);           -- MySQL, SQL Server
SELECT SUBSTR('Hello World', 1, 5);              -- PostgreSQL, SQLite, Oracle

-- String length
SELECT LENGTH('Hello');                           -- MySQL, PostgreSQL, SQLite
SELECT LEN('Hello');                              -- SQL Server

-- Case conversion
SELECT UPPER('hello'), LOWER('HELLO');           -- All databases
SELECT UCASE('hello'), LCASE('HELLO');           -- MySQL additionally supports

-- Trim whitespace
SELECT TRIM('  Hello  ');                         -- All databases
SELECT LTRIM('  Hello'), RTRIM('Hello  ');       -- MySQL, SQL Server, PostgreSQL

-- Replace
SELECT REPLACE('Hello World', 'World', 'SQL');   -- All databases

Date Functions

SQL
-- Current time
SELECT NOW();                                     -- MySQL, PostgreSQL
SELECT CURRENT_TIMESTAMP;                         -- All databases
SELECT GETDATE();                                 -- SQL Server
SELECT datetime('now');                           -- SQLite

-- Date arithmetic
SELECT DATE_ADD('2024-01-01', INTERVAL 30 DAY);  -- MySQL
SELECT '2024-01-01'::DATE + INTERVAL '30 days';  -- PostgreSQL
SELECT DATEADD(DAY, 30, '2024-01-01');            -- SQL Server
SELECT date('2024-01-01', '+30 days');             -- SQLite

-- Extract year/month/day
SELECT YEAR(created_at), MONTH(created_at), DAY(created_at) FROM orders;  -- MySQL, SQL Server
SELECT EXTRACT(YEAR FROM created_at) FROM orders;                          -- PostgreSQL, MySQL 8.0+
SELECT strftime('%Y', created_at) FROM orders;                             -- SQLite

-- Date formatting
SELECT DATE_FORMAT(created_at, '%Y-%m-%d') FROM orders;                    -- MySQL
SELECT TO_CHAR(created_at, 'YYYY-MM-DD') FROM orders;                      -- PostgreSQL, Oracle
SELECT FORMAT(created_at, 'yyyy-MM-dd') FROM orders;                       -- SQL Server
SELECT strftime('%Y-%m-%d', created_at) FROM orders;                       -- SQLite

Conditional Expressions

SQL
-- IF expression
SELECT IF(score >= 60, 'Pass', 'Fail') FROM exams;                        -- MySQL
SELECT IIF(score >= 60, 'Pass', 'Fail') FROM exams;                       -- SQL Server
SELECT CASE WHEN score >= 60 THEN 'Pass' ELSE 'Fail' END FROM exams;     -- All databases

-- COALESCE (returns the first non-NULL value)
SELECT COALESCE(nickname, username, 'Anonymous') FROM users;              -- All databases

-- NULLIF (returns NULL when equal)
SELECT NULLIF(a, b);                                                        -- All databases

Boolean Type

SQL
-- MySQL: No native BOOLEAN, use TINYINT(1) instead
CREATE TABLE users (is_active TINYINT(1) DEFAULT 1);
SELECT * FROM users WHERE is_active = TRUE;  -- TRUE equals 1

-- PostgreSQL: Native BOOLEAN
CREATE TABLE users (is_active BOOLEAN DEFAULT TRUE);
SELECT * FROM users WHERE is_active = TRUE;

-- SQLite: No native BOOLEAN, use INTEGER
CREATE TABLE users (is_active INTEGER DEFAULT 1);
SELECT * FROM users WHERE is_active = 1;

-- SQL Server: Native BIT
CREATE TABLE users (is_active BIT DEFAULT 1);
SELECT * FROM users WHERE is_active = 1;

UPSERT (Update if exists, insert if not)

SQL
-- MySQL
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON DUPLICATE KEY UPDATE view_count = view_count + 1;

-- PostgreSQL
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON CONFLICT (article_id) DO UPDATE SET view_count = stats.view_count + 1;

-- SQLite
INSERT INTO stats (article_id, view_count) VALUES (1, 1)
ON CONFLICT(article_id) DO UPDATE SET view_count = view_count + 1;

-- SQL Server
MERGE INTO stats AS target
USING (SELECT 1 AS article_id, 1 AS view_count) AS source
ON target.article_id = source.article_id
WHEN MATCHED THEN UPDATE SET view_count = target.view_count + 1
WHEN NOT MATCHED THEN INSERT (article_id, view_count) VALUES (source.article_id, source.view_count);

Window Function Support

SQL
-- All major databases support them (MySQL 8.0+, PostgreSQL, SQL Server, SQLite 3.25+)
SELECT
    name,
    score,
    RANK() OVER (ORDER BY score DESC) AS ranking,
    ROW_NUMBER() OVER (PARTITION BY class_id ORDER BY score DESC) AS class_rank
FROM students;

-- MySQL 5.7 and below don't support window functions; use variables to simulate
-- GROUP_CONCAT / STRING_AGG
SELECT category_id, GROUP_CONCAT(name SEPARATOR ',') FROM products GROUP BY category_id;  -- MySQL
SELECT category_id, STRING_AGG(name, ',') FROM products GROUP BY category_id;             -- PostgreSQL

Example: Locate and Optimize a Slow Query (Difficulty 🔥

Original slow query:

SQL
-- Query total order amount per user in the last 30 days (assume users table has 100K rows, orders table has 1M rows)
EXPLAIN
SELECT u.name, u.email, SUM(o.total_amount) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY u.id, u.name, u.email
ORDER BY total_spent DESC
LIMIT 20;
▶ Try it Yourself

EXPLAIN analysis reveals:

Optimization steps:

SQL
-- 1. Add composite index on orders table
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

-- 2. Optimize query: filter first, then join
SELECT u.name, u.email, sub.total_spent
FROM users u
INNER JOIN (
    SELECT user_id, SUM(total_amount) AS total_spent
    FROM orders
    WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
    GROUP BY user_id
    ORDER BY total_spent DESC
    LIMIT 20
) sub ON u.id = sub.user_id
ORDER BY sub.total_spent DESC;

-- 3. EXPLAIN again to confirm optimization effect
EXPLAIN SELECT u.name, u.email, sub.total_spent ...
-- type: ref, rows: significantly reduced

Example: Deep Pagination Optimization (Difficulty ⭐⭐)

Problem query:

SQL
-- Query page 10000 (20 per page), OFFSET 200000
SELECT id, title, created_at
FROM articles
WHERE status = 1
ORDER BY created_at DESC
LIMIT 20 OFFSET 200000;
-- Even with an index, requires scanning 200020 rows, very slow
▶ Try it Yourself

Solution 1: Cursor pagination (recommended)

SQL
-- Remember the last created_at and id from the previous page
-- Assume last item on previous page: created_at='2024-03-15 10:30:00', id=50001
SELECT id, title, created_at
FROM articles
WHERE status = 1
  AND (created_at < '2024-03-15 10:30:00'
       OR (created_at = '2024-03-15 10:30:00' AND id < 50001))
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Only scans 20 rows, extremely fast

Solution 2: Deferred join

SQL
-- First query primary keys, then join for full data
SELECT a.id, a.title, a.created_at
FROM articles a
INNER JOIN (
    SELECT id FROM articles
    WHERE status = 1
    ORDER BY created_at DESC
    LIMIT 20 OFFSET 200000
) b ON a.id = b.id;
-- Subquery uses covering index; main query fetches data by primary key

Solution 3: Business-layer optimization

SQL
-- If allowed, limit maximum pagination depth
-- Only allow viewing the first 1000 results, prompt users to use search to narrow down
SELECT id, title, created_at
FROM articles
WHERE status = 1
  AND category_id = 5  -- Add filter conditions
ORDER BY created_at DESC
LIMIT 20 OFFSET 0;

🔧 Scenario 1: E-commerce Product List Query Optimization

SQL
-- Original query: multi-condition filtering + sorting + pagination
SELECT p.id, p.name, p.price, p.sales_count, c.name AS category_name
FROM products p
LEFT JOIN categories c ON p.category_id = c.id
WHERE p.status = 1
  AND p.category_id IN (10, 11, 12, 13)
  AND p.price BETWEEN 50 AND 500
ORDER BY p.sales_count DESC
LIMIT 20;

-- Optimization:
-- 1. Create composite index covering main filter conditions
CREATE INDEX idx_product_filter ON products(status, category_id, price, sales_count);

-- 2. If EXPLAIN shows filesort, adjust index order
CREATE INDEX idx_product_sort ON products(status, category_id, sales_count DESC, price);

-- 3. If categories table is small, denormalize category name into products table
ALTER TABLE products ADD COLUMN category_name VARCHAR(50);
-- Maintain during INSERT/UPDATE

🔧 Scenario 2: Statistical Report Query Optimization

SQL
-- Original query: monthly order statistics (large dataset, slow every time)
SELECT
    DATE_FORMAT(created_at, '%Y-%m') AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue
FROM orders
WHERE created_at >= '2023-01-01'
GROUP BY month
ORDER BY month;

-- Optimization option 1: Pre-computed summary table
CREATE TABLE monthly_order_stats (
    month_key VARCHAR(7) PRIMARY KEY COMMENT 'Format: 2024-01',
    order_count INT NOT NULL DEFAULT 0,
    revenue DECIMAL(15,2) NOT NULL DEFAULT 0,
    updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

-- Scheduled job updates daily (incremental calculation)
INSERT INTO monthly_order_stats (month_key, order_count, revenue)
SELECT
    DATE_FORMAT(created_at, '%Y-%m'),
    COUNT(*),
    SUM(total_amount)
FROM orders
WHERE created_at >= CURDATE() - INTERVAL 1 DAY
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
ON DUPLICATE KEY UPDATE
    order_count = VALUES(order_count),
    revenue = VALUES(revenue),
    updated_at = NOW();

-- Query summary table (millisecond response)
SELECT * FROM monthly_order_stats ORDER BY month_key;

✅FAQ

Q: Are more indexes always better? A: No. Indexes consume storage space and slow down INSERT/UPDATE/DELETE operations (every write operation must update indexes). Only create indexes for queries that truly need acceleration, and regularly clean up unused indexes.

Q: How should I choose the field order for composite indexes? A: Place high-cardinality fields first (e.g., user_id has higher cardinality than status); place equality query fields before range query fields; decide based on actual query condition combinations.

Q: My query is already fast, should I still optimize? A: If response time is within acceptable range, over-optimization isn't necessary. But be aware that performance may degrade as data grows; plan ahead with load testing and index strategy.

Q: Are the rows shown by EXPLAIN accurate? A: Not entirely accurate; they're estimates based on statistics. Actual execution may scan more or fewer rows. Running ANALYZE TABLE to update statistics can improve estimate accuracy.

📖 Summary

This lesson systematically covered SQL query optimization methods:

📝 Exercises

  1. Run EXPLAIN analysis on the following SQL and optimize it:
    SQL
    SELECT * FROM orders WHERE YEAR(created_at) = 2024 AND user_id IN (SELECT id FROM users WHERE status = 1);
    
  2. Design an appropriate index strategy for an article list page (supporting category filtering, time sorting, and pagination).
  3. Rewrite a query using LIMIT 20 OFFSET 100000 to use cursor pagination.

Next Lesson →28-project.md

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏