Redis HyperLogLog
HyperLogLog is a probabilistic data structure used for cardinality estimation. This lesson covers its usage.
What is HyperLogLog?
HyperLogLog features:
- Cardinality estimation: counts the number of unique elements
- Extremely small memory: each HyperLogLog uses only 12KB
- Approximate counting: standard error of about 0.81%, suitable for large data
- Does not store the elements themselves: only stores statistical information
💡 Comparison:
- Using Set: stores all elements, large memory usage
- Using HyperLogLog: only 12KB, but results are approximate
Set: Store 1 million unique elements → about 10MB memory
HyperLogLog: Count 1 million unique elements → only 12KB memory
PFADD: Add Elements
Basic Usage
REDIS
# Add a single element
PFADD uv:20260623 "user:1"
(integer) 1 # Returns 1 if cardinality may have changed
# Add multiple elements
PFADD uv:20260623 "user:2" "user:3" "user:4"
(integer) 1
# Add an existing element
PFADD uv:20260623 "user:1"
(integer) 0 # Returns 0 if cardinality hasn't changed
Automatic Deduplication
REDIS
# HyperLogLog auto-deduplicates
PFADD uv:20260623 "user:1" "user:1" "user:2"
(integer) 0 # user:1 already exists, only user:2 is new
⚠️ Note: HyperLogLog does not store the elements themselves; it only updates internal statistics. You cannot retrieve the actual element list.
PFCOUNT: Get Cardinality
REDIS
PFADD uv:20260623 "user:1" "user:2" "user:3" "user:4"
# Get the number of unique elements
PFCOUNT uv:20260623
(integer) 4
# Add a duplicate element
PFADD uv:20260623 "user:1" "user:5"
# Cardinality increased by 1 (user:5)
PFCOUNT uv:20260623
(integer) 5
Counting the Union of Multiple HyperLogLogs
REDIS
# Create multiple HyperLogLogs
PFADD uv:20260622 "user:1" "user:2" "user:3"
PFADD uv:20260623 "user:2" "user:3" "user:4"
# Count the union of two days' UV
PFCOUNT uv:20260622 uv:20260623
(integer) 4 # user:1, user:2, user:3, user:4
💡 Use case: PFCOUNT can count the union of multiple HyperLogLogs without merging them first.
PFMERGE: Merge HyperLogLogs
PFMERGE merges multiple HyperLogLogs into one.
Basic Usage
REDIS
# Create multiple HyperLogLogs
PFADD uv:20260622 "user:1" "user:2"
PFADD uv:20260623 "user:2" "user:3"
PFADD uv:20260624 "user:3" "user:4"
# Merge into a new HyperLogLog
PFMERGE uv:week uv:20260622 uv:20260623 uv:20260624
OK
# View the merged cardinality
PFCOUNT uv:week
(integer) 4 # user:1, user:2, user:3, user:4
Merging into an Existing HyperLogLog
REDIS
# Merge into an existing HyperLogLog (overwrites)
PFMERGE uv:20260622 uv:20260623
OK
PFCOUNT uv:20260622
(integer) 3 # user:1, user:2, user:3
HyperLogLog Use Cases
Use Case 1: Website UV Statistics
REDIS
# Daily UV
PFADD uv:daily:20260623 "user:1"
PFADD uv:daily:20260623 "user:2"
PFADD uv:daily:20260623 "user:3"
# View today's UV
PFCOUNT uv:daily:20260623
(integer) 3
# Weekly UV
PFMERGE uv:weekly:2026w25 uv:daily:20260617 uv:daily:20260618 ... uv:daily:20260623
PFCOUNT uv:weekly:2026w25
Use Case 2: Article Reader Count
REDIS
# Article reader count
PFADD article:123:readers "user:1"
PFADD article:123:readers "user:2"
PFADD article:123:readers "user:3"
# View reader count
PFCOUNT article:123:readers
(integer) 3
Use Case 3: Search Keyword Statistics
REDIS
# Today's search keywords (unique)
PFADD search:keywords:20260623 "redis"
PFADD search:keywords:20260623 "mysql"
PFADD search:keywords:20260623 "redis"
# View unique keyword count
PFCOUNT search:keywords:20260623
(integer) 2
Use Case 4: Online User Count
REDIS
# Current online users
PFADD online:users "user:1"
PFADD online:users "user:2"
PFADD online:users "user:3"
# View online user count
PFCOUNT online:users
(integer) 3
# User goes offline (needs to rebuild HyperLogLog, which is cumbersome)
# HyperLogLog does not support deleting individual elements
Use Case 5: API Call Statistics
REDIS
# API calling users
PFADD api:users:get:user 1001
PFADD api:users:get:user 1002
PFADD api:users:get:user 1003
# View unique caller count
PFCOUNT api:users:get:user
(integer) 3
HyperLogLog vs Set Comparison
Memory Comparison
| Data Size | Set Memory | HyperLogLog Memory |
|---|---|---|
| 10K | ~800KB | 12KB |
| 100K | ~8MB | 12KB |
| 1M | ~80MB | 12KB |
| 10M | ~800MB | 12KB |
💡 Conclusion: HyperLogLog memory usage is a constant 12KB, regardless of data size.
Accuracy Comparison
| Data Structure | Accuracy | Use Case |
|---|---|---|
| Set | Exact | Need exact counts, need element list |
| HyperLogLog | Approximate (0.81% error) | Large data, cardinality only, memory-sensitive |
Feature Comparison
| Feature | Set | HyperLogLog |
|---|---|---|
| Add elements | ✅ SADD | ✅ PFADD |
| Remove elements | ✅ SREM | ❌ Not supported |
| Get element list | ✅ SMEMBERS | ❌ Not supported |
| Check element existence | ✅ SISMEMBER | ❌ Not supported |
| Get cardinality | ✅ SCARD | ✅ PFCOUNT |
| Merge | ✅ SUNIONSTORE | ✅ PFMERGE |
HyperLogLog Accuracy Test
Test Code
REDIS
# Add 1 million different elements
for i in range(1000000):
PFADD test:hll f"user:{i}"
# View the result
PFCOUNT test:hll
(integer) 1000123 # Slightly off, error about 0.012%
ℹ️ Note: The standard error of HyperLogLog is about 0.81%, but actual error is often smaller.
HyperLogLog Limitations
1. Cannot Remove Elements
REDIS
# HyperLogLog does not support removing individual elements
# If removal is needed, you must rebuild the entire HyperLogLog
2. Cannot Get Element List
REDIS
# HyperLogLog does not store elements themselves
# Cannot use SMEMBERS like a Set to retrieve all elements
3. Results Are Approximate
REDIS
# HyperLogLog returns an approximate cardinality
# Not exact, with about 0.81% error
4. Cannot Check Element Existence
REDIS
# HyperLogLog does not support checking if an element exists
# Cannot use SISMEMBER like a Set
When to Use HyperLogLog?
Use HyperLogLog When:
- Counting website UV (unique visitors)
- Counting article readers
- Counting search keywords
- Counting API users
- Large-scale cardinality estimation
- Memory-sensitive scenarios
Use Set When:
- Need exact counting
- Need to retrieve element list
- Need to delete elements
- Need to check element existence
- Data size is small
❓ FAQ
Q How large is the error of HyperLogLog?
A Standard error is about 0.81%. For 1 million data points, the error is around 8000. Actual error is often smaller.
Q How much memory does HyperLogLog use?
A Each HyperLogLog uses a fixed 12KB of memory, regardless of data size.
Q How many elements can HyperLogLog store?
A Theoretically unlimited (2^64), but practically limited by accuracy. Larger data sets have smaller relative error.
Q How do I delete an element from HyperLogLog?
A Not supported. You can only delete the entire HyperLogLog (DEL command) or rebuild it.
Q How to choose between HyperLogLog and Set?
A Use Set for exact counting or when you need the element list. Use HyperLogLog for large-scale cardinality estimation.
📖 Summary
- HyperLogLog is for cardinality estimation, using a constant 12KB of memory
- PFADD adds elements with automatic deduplication
- PFCOUNT gets the cardinality, can count the union of multiple HyperLogLogs
- PFMERGE merges multiple HyperLogLogs
- Standard error is about 0.81%, suitable for large datasets
- Cannot delete elements, cannot get element list, results are approximate
- Use cases: UV statistics, reader counts, search keywords, online users
📝 Exercises
- UV statistics: Use HyperLogLog to track daily UV, simulate multiple user visits
- Multi-day statistics: Create multiple days' UV HyperLogLogs, use PFMERGE to get weekly UV
- Accuracy test: Add 1000 different elements, compare PFCOUNT result with actual count
- Comparison test: Compare memory usage between Set and HyperLogLog for the same data
Next Lesson
In the next lesson, we will learn Redis Pub/Sub, covering message publishing and subscription.



