Database Design
Database Design
🎯 Life Analogy
Imagine you're opening a library:
- Requirements Analysis: First figure out what books to lend, who borrows them, and how to register them → understand business requirements
- ER Diagram: Draw a relationship diagram of "readers — borrowing — books" → entity-relationship modeling
- Normalization: Don't copy reader addresses repeatedly on library cards; store the address only in the reader's file → eliminate redundancy
- Denormalization: Calculate the popular book rankings and post them on the wall directly, instead of searching through borrowing records every time → trade space for time
- Naming Convention: Use a unified shelf number format like "A-01", not sometimes "Shelf 1" and sometimes "Row 1" → unified naming
📚 Core Concepts
1. Requirements Analysis
Before writing any SQL, answer these questions:
| Question | Example |
|---|---|
| What entities does the system manage? | Users, articles, comments, tags |
| What are the relationships between entities? | One user writes many articles (one-to-many) |
| What queries need to be supported? | Search by tag, sort by time |
| Estimated data volume? | Millions of articles, tens of millions of comments |
| Read-heavy or write-heavy? | Blog systems are read-heavy |
2. ER Diagram (Entity-Relationship Diagram)
ER diagrams describe data models using three core elements:
[Entity] —— Attribute1, Attribute2, Attribute3
|
(Relationship) cardinality: 1:1, 1:N, M:N
|
[Entity] —— Attribute1, Attribute2
Common Relationship Types:
| Relationship | Example | Implementation |
|---|---|---|
| One-to-One (1:1) | User ↔ User Details | Foreign key or merge into one table |
| One-to-Many (1:N) | User → Article | Add user_id foreign key in the articles table |
| Many-to-Many (M:N) | Article ↔ Tag | Junction table article_tag |
3. Normalization Theory
First Normal Form (1NF): Columns must be atomic
-- ❌ Violates 1NF: phone column stores multiple values
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(50),
phones VARCHAR(200) -- '138xxx,139xxx,137xxx'
);
-- ✅ Conforms to 1NF: each field stores a single atomic value
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE user_phones (
id INT PRIMARY KEY,
user_id INT,
phone VARCHAR(20)
);
Second Normal Form (2NF): Non-key columns must fully depend on the entire primary key
-- ❌ Violates 2NF: in order items, product_name depends only on product_id, not on order_id
CREATE TABLE order_items (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- Partial dependency
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- ✅ Conforms to 2NF: split into a products table
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
Third Normal Form (3NF): Non-key columns must not have transitive dependencies
-- ❌ Violates 3NF: city_name depends on city_id, city_id depends on id → transitive dependency
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(50),
city_id INT,
city_name VARCHAR(50) -- Transitive dependency
);
-- ✅ Conforms to 3NF: split into a cities table
CREATE TABLE cities (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(50),
city_id INT
);
BCNF (Boyce-Codd Normal Form)
Stricter than 3NF: every determinant must be a candidate key. In practice, satisfying 3NF is usually sufficient.
4. Denormalization
Sometimes for query performance, redundancy is intentionally introduced:
-- Store comment count redundantly in the articles table to avoid COUNT every time
ALTER TABLE articles ADD COLUMN comment_count INT DEFAULT 0;
-- Update the redundant field
UPDATE articles SET comment_count = (
SELECT COUNT(*) FROM comments WHERE comments.article_id = articles.id
);
When to denormalize:
- Reads far outnumber writes
- Aggregate queries are extremely frequent
- Index optimization alone doesn't meet performance requirements
5. Table Relationship Design
Primary Key Choices:
| Type | Pros | Cons | Use Cases |
|---|---|---|---|
Auto-increment AUTO_INCREMENT |
Ordered, compact, fast inserts | Predictable, sharding issues | Most business tables |
| UUID | Globally unique, unpredictable | Large size, unordered indexes are slow | Distributed systems |
| Composite primary key | Semantically clear | Foreign key references are cumbersome | Junction tables, association tables |
| Snowflake ID | Ordered, globally unique | Requires extra dependencies | Distributed high-concurrency |
-- Auto-increment primary key
CREATE TABLE users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50)
);
-- UUID primary key
CREATE TABLE users (
id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
name VARCHAR(50)
);
-- Composite primary key (junction table)
CREATE TABLE article_tag (
article_id BIGINT,
tag_id BIGINT,
PRIMARY KEY (article_id, tag_id)
);
6. Field Naming Conventions
| Convention | ✅ Recommended | ❌ Avoid |
|---|---|---|
| Use lowercase + underscores | user_name |
UserName, username |
| Table names in plural | users |
user, tbl_user |
| Foreign key field names | user_id |
uid, userId |
| Boolean fields | is_deleted |
deleted, flag |
| Timestamp fields | created_at |
createTime, add_time |
| Monetary fields | DECIMAL(10,2) |
FLOAT, DOUBLE |
7. Design Best Practices
- Every table must have a primary key
- Add indexes on foreign key columns (even if the database doesn't enforce foreign key constraints)
- Choose the smallest sufficient data type:
TINYINTinstead ofINTfor status values - Use
DECIMALfor monetary values, not floating-point types - Use
DATETIMEorTIMESTAMPfor time fields, not strings - Reserve extension fields:
extra JSONorstatuswith bitwise operations - Soft delete instead of physical delete:
is_deleted TINYINT DEFAULT 0
💡 Basic Syntax
-- Basic template for creating a table
CREATE TABLE table_name (
id BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT 'Primary key',
-- Business fields
name VARCHAR(100) NOT NULL COMMENT 'Name',
status TINYINT NOT NULL DEFAULT 1 COMMENT 'Status: 1-active 0-disabled',
-- Foreign key field
user_id BIGINT NOT NULL COMMENT 'User ID',
-- Timestamp fields
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Created time',
updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Updated time',
-- Indexes
INDEX idx_user_id (user_id),
INDEX idx_status (status)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Table comment';
Example: Design a Simple Student Course Selection System (Difficulty ⭐)
Requirements: Manage students, courses, and enrollment records.
-- Students table
CREATE TABLE students (
id BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT 'Student ID',
student_no VARCHAR(20) NOT NULL UNIQUE COMMENT 'Student number',
name VARCHAR(50) NOT NULL COMMENT 'Name',
gender TINYINT NOT NULL DEFAULT 0 COMMENT 'Gender: 0-unknown 1-male 2-female',
enrollment_year INT NOT NULL COMMENT 'Enrollment year',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Students table';
-- Courses table
CREATE TABLE courses (
id BIGINT AUTO_INCREMENT PRIMARY KEY COMMENT 'Course ID',
course_no VARCHAR(20) NOT NULL UNIQUE COMMENT 'Course number',
name VARCHAR(100) NOT NULL COMMENT 'Course name',
credit DECIMAL(3,1) NOT NULL COMMENT 'Credits',
teacher VARCHAR(50) COMMENT 'Instructor',
max_students INT NOT NULL DEFAULT 60 COMMENT 'Max enrollment',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Courses table';
-- Enrollments table (junction table, many-to-many relationship)
CREATE TABLE enrollments (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
student_id BIGINT NOT NULL COMMENT 'Student ID',
course_id BIGINT NOT NULL COMMENT 'Course ID',
score DECIMAL(5,2) COMMENT 'Score',
enrolled_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Enrollment time',
UNIQUE KEY uk_student_course (student_id, course_id),
INDEX idx_course_id (course_id),
INDEX idx_student_id (student_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Enrollments table';
Example: Design Core Tables for an E-commerce Order System (Difficulty ⭐⭐)
-- Product categories table
CREATE TABLE categories (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) NOT NULL COMMENT 'Category name',
parent_id BIGINT DEFAULT NULL COMMENT 'Parent category ID, NULL for top-level',
sort_order INT NOT NULL DEFAULT 0 COMMENT 'Sort order',
INDEX idx_parent_id (parent_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Product categories table';
-- Products table
CREATE TABLE products (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
category_id BIGINT NOT NULL COMMENT 'Category ID',
name VARCHAR(200) NOT NULL COMMENT 'Product name',
price DECIMAL(10,2) NOT NULL COMMENT 'Selling price',
stock INT NOT NULL DEFAULT 0 COMMENT 'Stock',
status TINYINT NOT NULL DEFAULT 1 COMMENT 'Status: 1-listed 0-unlisted',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_category (category_id),
INDEX idx_status_price (status, price)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Products table';
-- Orders table (denormalization: redundantly store shipping address snapshot)
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
order_no VARCHAR(32) NOT NULL UNIQUE COMMENT 'Order number',
user_id BIGINT NOT NULL COMMENT 'User ID',
total_amount DECIMAL(12,2) NOT NULL COMMENT 'Total order amount',
status TINYINT NOT NULL DEFAULT 0 COMMENT 'Status: 0-pending payment 1-paid 2-shipped 3-completed 4-cancelled',
receiver_name VARCHAR(50) NOT NULL COMMENT 'Receiver name',
receiver_phone VARCHAR(20) NOT NULL COMMENT 'Receiver phone',
receiver_address VARCHAR(500) NOT NULL COMMENT 'Shipping address',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
paid_at DATETIME COMMENT 'Payment time',
INDEX idx_user_id (user_id),
INDEX idx_status (status),
INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Orders table';
-- Order items table
CREATE TABLE order_items (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
order_id BIGINT NOT NULL COMMENT 'Order ID',
product_id BIGINT NOT NULL COMMENT 'Product ID',
product_name VARCHAR(200) NOT NULL COMMENT 'Product name (snapshot)',
product_price DECIMAL(10,2) NOT NULL COMMENT 'Unit price (snapshot)',
quantity INT NOT NULL COMMENT 'Quantity',
subtotal DECIMAL(12,2) NOT NULL COMMENT 'Subtotal',
INDEX idx_order_id (order_id),
INDEX idx_product_id (product_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Order items table';
Design Highlights:
- The orders table redundantly stores the shipping address (snapshot at order time), so historical orders are unaffected even if the user later changes their address
- The order items table redundantly stores product name and price, so existing orders are unaffected even if product prices change
- Monetary fields uniformly use
DECIMAL(12,2) - Status fields use
TINYINTwith comments explaining each value
🏢 Scenario 1: Design a Blog System Database
Requirements: Support user registration, writing articles, tagging, and posting comments.
-- Users table
CREATE TABLE blog_users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(100) NOT NULL UNIQUE,
password_hash VARCHAR(255) NOT NULL,
avatar_url VARCHAR(500),
bio TEXT,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Blog users table';
-- Articles table
CREATE TABLE blog_articles (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT NOT NULL,
category_id BIGINT,
title VARCHAR(200) NOT NULL,
content TEXT NOT NULL,
status TINYINT NOT NULL DEFAULT 0 COMMENT '0-draft 1-published 2-unlisted',
view_count INT NOT NULL DEFAULT 0,
like_count INT NOT NULL DEFAULT 0,
published_at DATETIME,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_category (category_id),
INDEX idx_status_published (status, published_at DESC)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Articles table';
-- Tags table
CREATE TABLE blog_tags (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Tags table';
-- Article-tag junction table
CREATE TABLE blog_article_tag (
article_id BIGINT NOT NULL,
tag_id BIGINT NOT NULL,
PRIMARY KEY (article_id, tag_id),
INDEX idx_tag_id (tag_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Article-tag association table';
-- Comments table (supports nested comments)
CREATE TABLE blog_comments (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
article_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
parent_id BIGINT COMMENT 'Parent comment ID, NULL for top-level comments',
content TEXT NOT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_article_id (article_id),
INDEX idx_parent_id (parent_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Comments table';
🏢 Scenario 2: Multi-Tenant SaaS System Design
Requirements: A single system serving multiple enterprise clients with data isolation.
-- Tenants table
CREATE TABLE tenants (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL COMMENT 'Company name',
plan VARCHAR(20) NOT NULL DEFAULT 'free' COMMENT 'Plan: free/basic/pro',
max_users INT NOT NULL DEFAULT 10,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Tenants table';
-- Users table (each record belongs to one tenant)
CREATE TABLE tenant_users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
tenant_id BIGINT NOT NULL COMMENT 'Tenant ID',
username VARCHAR(50) NOT NULL,
role VARCHAR(20) NOT NULL DEFAULT 'member',
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
UNIQUE KEY uk_tenant_username (tenant_id, username),
INDEX idx_tenant_id (tenant_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Tenant users table';
-- All business tables include tenant_id field
CREATE TABLE tenant_orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
tenant_id BIGINT NOT NULL,
user_id BIGINT NOT NULL,
amount DECIMAL(10,2) NOT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX idx_tenant_id (tenant_id),
INDEX idx_tenant_user (tenant_id, user_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Tenant orders table';
WHERE tenant_id = ? condition to prevent data leakage.
❓ FAQ
Q: When should I split tables, and when should I keep one large table? A: Split when certain fields are updated much more frequently than others (e.g., user basic info vs. login logs); keep one table when fields are frequently queried together.
Q: How do I choose between soft delete and physical delete? A: Use soft delete (
is_deleted) for core data like finances and orders; use physical delete for auxiliary data like logs and caches.
Q: What scenarios are JSON fields suitable for? A: Storing extension attributes with inconsistent structure and low query frequency, such as user preferences or product specifications. Don't put fields that need indexed queries into JSON.
Q: Which is better, auto-increment IDs or UUIDs? A: Prefer auto-increment IDs for single-server systems (ordered, compact); consider UUIDs or snowflake algorithms for distributed systems (globally unique).
📖 Summary
This lesson systematically covered the complete database design methodology:
- Requirements Analysis is the starting point of design, clarifying entities, relationships, and query patterns
- ER Diagrams help visualize one-to-one, one-to-many, and many-to-many relationships between entities
- Normalization (1NF→2NF→3NF→BCNF) progressively eliminates data redundancy
- Denormalization moderately introduces redundancy at performance bottlenecks
- Primary Key Selection requires weighing the pros and cons of auto-increment, UUID, and composite keys
- Naming Conventions ensure team collaboration consistency
📝 Exercises
- Design a database for an online exam system, including: users, exam papers, questions, options, answer records, and scores. Draw an ER diagram and write the CREATE TABLE statements.
- Normalize the following table:
orders(order_id, customer_name, customer_phone, product_name, product_price, quantity), splitting it to at least 3NF. - Choose an appropriate primary key strategy (auto-increment vs. UUID) for your project and explain the reasoning.
Next Lesson →26-sql-injection.md



