Redis HyperLogLog

HyperLogLog is a probabilistic data structure used for cardinality estimation. This lesson covers its usage.

What is HyperLogLog?

HyperLogLog features:

💡 Comparison:

  • Using Set: stores all elements, large memory usage
  • Using HyperLogLog: only 12KB, but results are approximate
Set: Store 1 million unique elements → about 10MB memory
HyperLogLog: Count 1 million unique elements → only 12KB memory

PFADD: Add Elements

Basic Usage

REDIS
# Add a single element
PFADD uv:20260623 "user:1"
(integer) 1  # Returns 1 if cardinality may have changed

# Add multiple elements
PFADD uv:20260623 "user:2" "user:3" "user:4"
(integer) 1

# Add an existing element
PFADD uv:20260623 "user:1"
(integer) 0  # Returns 0 if cardinality hasn't changed

Automatic Deduplication

REDIS
# HyperLogLog auto-deduplicates
PFADD uv:20260623 "user:1" "user:1" "user:2"
(integer) 0  # user:1 already exists, only user:2 is new
⚠️ Note: HyperLogLog does not store the elements themselves; it only updates internal statistics. You cannot retrieve the actual element list.

PFCOUNT: Get Cardinality

REDIS
PFADD uv:20260623 "user:1" "user:2" "user:3" "user:4"

# Get the number of unique elements
PFCOUNT uv:20260623
(integer) 4

# Add a duplicate element
PFADD uv:20260623 "user:1" "user:5"

# Cardinality increased by 1 (user:5)
PFCOUNT uv:20260623
(integer) 5

Counting the Union of Multiple HyperLogLogs

REDIS
# Create multiple HyperLogLogs
PFADD uv:20260622 "user:1" "user:2" "user:3"
PFADD uv:20260623 "user:2" "user:3" "user:4"

# Count the union of two days' UV
PFCOUNT uv:20260622 uv:20260623
(integer) 4  # user:1, user:2, user:3, user:4
💡 Use case: PFCOUNT can count the union of multiple HyperLogLogs without merging them first.

PFMERGE: Merge HyperLogLogs

PFMERGE merges multiple HyperLogLogs into one.

Basic Usage

REDIS
# Create multiple HyperLogLogs
PFADD uv:20260622 "user:1" "user:2"
PFADD uv:20260623 "user:2" "user:3"
PFADD uv:20260624 "user:3" "user:4"

# Merge into a new HyperLogLog
PFMERGE uv:week uv:20260622 uv:20260623 uv:20260624
OK

# View the merged cardinality
PFCOUNT uv:week
(integer) 4  # user:1, user:2, user:3, user:4

Merging into an Existing HyperLogLog

REDIS
# Merge into an existing HyperLogLog (overwrites)
PFMERGE uv:20260622 uv:20260623
OK

PFCOUNT uv:20260622
(integer) 3  # user:1, user:2, user:3

HyperLogLog Use Cases

Use Case 1: Website UV Statistics

REDIS
# Daily UV
PFADD uv:daily:20260623 "user:1"
PFADD uv:daily:20260623 "user:2"
PFADD uv:daily:20260623 "user:3"

# View today's UV
PFCOUNT uv:daily:20260623
(integer) 3

# Weekly UV
PFMERGE uv:weekly:2026w25 uv:daily:20260617 uv:daily:20260618 ... uv:daily:20260623
PFCOUNT uv:weekly:2026w25

Use Case 2: Article Reader Count

REDIS
# Article reader count
PFADD article:123:readers "user:1"
PFADD article:123:readers "user:2"
PFADD article:123:readers "user:3"

# View reader count
PFCOUNT article:123:readers
(integer) 3

Use Case 3: Search Keyword Statistics

REDIS
# Today's search keywords (unique)
PFADD search:keywords:20260623 "redis"
PFADD search:keywords:20260623 "mysql"
PFADD search:keywords:20260623 "redis"

# View unique keyword count
PFCOUNT search:keywords:20260623
(integer) 2

Use Case 4: Online User Count

REDIS
# Current online users
PFADD online:users "user:1"
PFADD online:users "user:2"
PFADD online:users "user:3"

# View online user count
PFCOUNT online:users
(integer) 3

# User goes offline (needs to rebuild HyperLogLog, which is cumbersome)
# HyperLogLog does not support deleting individual elements

Use Case 5: API Call Statistics

REDIS
# API calling users
PFADD api:users:get:user 1001
PFADD api:users:get:user 1002
PFADD api:users:get:user 1003

# View unique caller count
PFCOUNT api:users:get:user
(integer) 3

HyperLogLog vs Set Comparison

Memory Comparison

Data Size Set Memory HyperLogLog Memory
10K ~800KB 12KB
100K ~8MB 12KB
1M ~80MB 12KB
10M ~800MB 12KB
💡 Conclusion: HyperLogLog memory usage is a constant 12KB, regardless of data size.

Accuracy Comparison

Data Structure Accuracy Use Case
Set Exact Need exact counts, need element list
HyperLogLog Approximate (0.81% error) Large data, cardinality only, memory-sensitive

Feature Comparison

Feature Set HyperLogLog
Add elements ✅ SADD ✅ PFADD
Remove elements ✅ SREM ❌ Not supported
Get element list ✅ SMEMBERS ❌ Not supported
Check element existence ✅ SISMEMBER ❌ Not supported
Get cardinality ✅ SCARD ✅ PFCOUNT
Merge ✅ SUNIONSTORE ✅ PFMERGE

HyperLogLog Accuracy Test

Test Code

REDIS
# Add 1 million different elements
for i in range(1000000):
    PFADD test:hll f"user:{i}"

# View the result
PFCOUNT test:hll
(integer) 1000123  # Slightly off, error about 0.012%
ℹ️ Note: The standard error of HyperLogLog is about 0.81%, but actual error is often smaller.

HyperLogLog Limitations

1. Cannot Remove Elements

REDIS
# HyperLogLog does not support removing individual elements
# If removal is needed, you must rebuild the entire HyperLogLog

2. Cannot Get Element List

REDIS
# HyperLogLog does not store elements themselves
# Cannot use SMEMBERS like a Set to retrieve all elements

3. Results Are Approximate

REDIS
# HyperLogLog returns an approximate cardinality
# Not exact, with about 0.81% error

4. Cannot Check Element Existence

REDIS
# HyperLogLog does not support checking if an element exists
# Cannot use SISMEMBER like a Set

When to Use HyperLogLog?

Use HyperLogLog When:

Use Set When:

❓ FAQ

Q How large is the error of HyperLogLog?
A Standard error is about 0.81%. For 1 million data points, the error is around 8000. Actual error is often smaller.
Q How much memory does HyperLogLog use?
A Each HyperLogLog uses a fixed 12KB of memory, regardless of data size.
Q How many elements can HyperLogLog store?
A Theoretically unlimited (2^64), but practically limited by accuracy. Larger data sets have smaller relative error.
Q How do I delete an element from HyperLogLog?
A Not supported. You can only delete the entire HyperLogLog (DEL command) or rebuild it.
Q How to choose between HyperLogLog and Set?
A Use Set for exact counting or when you need the element list. Use HyperLogLog for large-scale cardinality estimation.

📖 Summary

📝 Exercises

  1. UV statistics: Use HyperLogLog to track daily UV, simulate multiple user visits
  2. Multi-day statistics: Create multiple days' UV HyperLogLogs, use PFMERGE to get weekly UV
  3. Accuracy test: Add 1000 different elements, compare PFCOUNT result with actual count
  4. Comparison test: Compare memory usage between Set and HyperLogLog for the same data

Next Lesson

In the next lesson, we will learn Redis Pub/Sub, covering message publishing and subscription.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏