数据分片(Chunk) #
一、分片概述 #
1.1 什么是分片 #
分片(Chunk)是TimescaleDB存储时序数据的基本单位,每个分片存储特定时间范围的数据。
text
分片结构:
Hypertable(超表)
│
├── Chunk 1
│ ├── 时间范围:2024-01-01 ~ 2024-01-07
│ ├── 空间分区1(device_id: 1-250)
│ ├── 空间分区2(device_id: 251-500)
│ ├── 空间分区3(device_id: 501-750)
│ └── 空间分区4(device_id: 751-1000)
│
├── Chunk 2
│ ├── 时间范围:2024-01-08 ~ 2024-01-14
│ └── ...
│
└── Chunk N
└── ...
1.2 分片优势 #
text
分片优势:
查询性能
├── 分区裁剪 - 只扫描相关分片
├── 并行查询 - 多分片并行处理
├── 索引优化 - 每个分片独立索引
└── 缓存效率 - 热数据缓存
写入性能
├── 并行写入 - 多分片同时写入
├── 索引维护 - 分片级别维护
├── 锁粒度 - 分片级别锁
└── WAL优化 - 分片级别WAL
管理便利
├── 数据生命周期 - 按分片删除
├── 压缩策略 - 按分片压缩
├── 备份恢复 - 分片级别操作
└── 数据迁移 - 分片级别迁移
1.3 分片命名规则 #
text
分片命名规则:
格式:_hyper_<hypertable_id>_<chunk_id>
示例:
├── _hyper_1_1_chunk
├── _hyper_1_2_chunk
├── _hyper_1_3_chunk
└── ...
命名组成
├── _hyper_ - 固定前缀
├── <hypertable_id> - 超表ID
└── <chunk_id> - 分片ID
二、分片创建 #
2.1 自动创建 #
sql
-- 创建超表时自动创建分片
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION
);
SELECT create_hypertable(
'sensor_data',
'time',
chunk_time_interval => INTERVAL '7 days'
);
-- 插入数据时自动创建分片
INSERT INTO sensor_data VALUES
('2024-01-01 00:00:00+00', 1, 25.5),
('2024-01-08 00:00:00+00', 1, 26.0);
-- 查看创建的分片
SELECT
chunk_name,
range_start,
range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
2.2 手动创建 #
sql
-- 手动创建分片(通常不需要)
SELECT create_chunk(
'sensor_data',
ARRAY['2024-02-01 00:00:00+00'::timestamptz]
);
-- 查看分片信息
SELECT * FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
2.3 分片时间范围 #
sql
-- 查看分片时间范围
SELECT
chunk_name,
range_start,
range_end,
range_start::date as start_date,
range_end::date as end_date
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
-- 示例输出
-- chunk_name | range_start | range_end | start_date | end_date
-- ----------------+----------------------+----------------------+-------------+----------
-- _hyper_1_1_chunk| 2024-01-01 00:00:00+00| 2024-01-08 00:00:00+00| 2024-01-01 | 2024-01-08
-- _hyper_1_2_chunk| 2024-01-08 00:00:00+00| 2024-01-15 00:00:00+00| 2024-01-08 | 2024-01-15
三、分片管理 #
3.1 查看分片信息 #
sql
-- 查看所有分片
SELECT
hypertable_name,
chunk_name,
is_compressed,
range_start,
range_end
FROM timescaledb_information.chunks
ORDER BY hypertable_name, range_start;
-- 查看分片大小
SELECT
chunk_name,
pg_size_pretty(total_bytes) as size,
total_bytes
FROM chunk_relation_size('sensor_data')
ORDER BY total_bytes DESC;
-- 查看分片详细信息
SELECT * FROM chunk_relation_size('sensor_data');
-- 查看分片索引
SELECT
chunk_name,
index_name,
pg_size_pretty(total_bytes) as index_size
FROM chunk_index_relation_size('sensor_data')
ORDER BY chunk_name, total_bytes DESC;
3.2 分片大小统计 #
sql
-- 超表总大小
SELECT pg_size_pretty(hypertable_size('sensor_data'));
-- 分片大小统计
SELECT
hypertable_name,
count(*) as chunk_count,
pg_size_pretty(sum(total_bytes)) as total_size,
pg_size_pretty(avg(total_bytes)::bigint) as avg_chunk_size,
pg_size_pretty(min(total_bytes)::bigint) as min_chunk_size,
pg_size_pretty(max(total_bytes)::bigint) as max_chunk_size
FROM chunk_relation_size('sensor_data')
GROUP BY hypertable_name;
-- 按时间范围统计
SELECT
date_trunc('month', range_start) as month,
count(*) as chunk_count,
pg_size_pretty(sum(total_bytes)) as size
FROM chunk_relation_size('sensor_data') c
JOIN timescaledb_information.chunks ch
ON c.chunk_name = ch.chunk_name
GROUP BY date_trunc('month', range_start)
ORDER BY month;
3.3 分片合并 #
sql
-- 合并小分片
SELECT merge_chunks(
'sensor_data',
older_than => INTERVAL '30 days'
);
-- 合并指定时间范围
SELECT merge_chunks(
'sensor_data',
newer_than => INTERVAL '90 days',
older_than => INTERVAL '30 days'
);
-- 查看合并结果
SELECT count(*) FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
3.4 分片分割 #
sql
-- 分割大分片
SELECT split_chunk('_hyper_1_10_chunk');
-- 查看分割结果
SELECT
chunk_name,
range_start,
range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start DESC
LIMIT 5;
四、空间分区 #
4.1 创建空间分区 #
sql
-- 创建带空间分区的超表
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id INTEGER NOT NULL,
metric_name VARCHAR(100),
value DOUBLE PRECISION
);
SELECT create_hypertable(
'metrics',
'time',
partitioning_column => 'device_id',
number_partitions => 4
);
-- 查看空间分区
SELECT
hypertable_name,
column_name,
num_slices
FROM timescaledb_information.dimensions;
4.2 空间分区原理 #
text
空间分区原理:
时间分区
├── Chunk 1 (2024-01-01 ~ 2024-01-07)
│ ├── 空间分片1 (device_id hash % 4 = 0)
│ ├── 空间分片2 (device_id hash % 4 = 1)
│ ├── 空间分片3 (device_id hash % 4 = 2)
│ └── 空间分片4 (device_id hash % 4 = 3)
│
└── Chunk 2 (2024-01-08 ~ 2024-01-14)
├── 空间分片1
├── 空间分片2
├── 空间分片3
└── 空间分片4
空间分区数量 = number_partitions
总分片数 = 时间分片数 × 空间分区数
4.3 空间分区优势 #
text
空间分区优势:
写入性能
├── 并行写入 - 多设备同时写入
├── 减少锁竞争 - 不同分区独立
└── 分散I/O - 多磁盘并行
查询性能
├── 设备查询优化 - 只扫描相关分区
├── 并行查询 - 多分区并行
└── 索引优化 - 分区级别索引
适用场景
├── 多设备并发写入
├── 设备级别查询
├── 大规模部署
└── 高并发场景
五、分区裁剪 #
5.1 自动分区裁剪 #
sql
-- 时间范围查询(自动裁剪)
EXPLAIN ANALYZE
SELECT * FROM sensor_data
WHERE time > NOW() - INTERVAL '7 days';
-- 执行计划中显示
-- -> Custom Scan (ChunkAppend)
-- -> Index Scan using _hyper_1_1_chunk_time_idx on _hyper_1_1_chunk
-- -> Index Scan using _hyper_1_2_chunk_time_idx on _hyper_1_2_chunk
-- 只扫描相关分片
-- 设备查询(空间分区裁剪)
EXPLAIN ANALYZE
SELECT * FROM metrics
WHERE device_id = 100
AND time > NOW() - INTERVAL '1 day';
5.2 强制分区裁剪 #
sql
-- 启用分区裁剪
SET timescaledb.enable_chunk_append = on;
-- 禁用分区裁剪(调试用)
SET timescaledb.enable_chunk_append = off;
-- 查看裁剪效果
SELECT
chunk_name,
range_start,
range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
AND range_end > NOW() - INTERVAL '7 days'
AND range_start < NOW();
5.3 分区裁剪优化 #
sql
-- 使用时间条件
SELECT * FROM sensor_data
WHERE time BETWEEN '2024-01-01' AND '2024-01-31';
-- 使用时间函数
SELECT * FROM sensor_data
WHERE time > time_bucket('1 day', NOW()) - INTERVAL '7 days';
-- 避免函数阻止裁剪
-- 不推荐
SELECT * FROM sensor_data
WHERE EXTRACT(MONTH FROM time) = 1;
-- 推荐
SELECT * FROM sensor_data
WHERE time >= '2024-01-01' AND time < '2024-02-01';
六、分片压缩 #
6.1 压缩状态 #
sql
-- 查看分片压缩状态
SELECT
chunk_name,
is_compressed,
range_start,
range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start;
-- 查看压缩统计
SELECT
chunk_name,
before_compression_total_bytes,
after_compression_total_bytes,
round(100.0 * (before_compression_total_bytes - after_compression_total_bytes)
/ before_compression_total_bytes, 2) as compression_ratio
FROM chunk_compression_stats('sensor_data');
6.2 手动压缩 #
sql
-- 压缩指定分片
SELECT compress_chunk('_hyper_1_1_chunk');
-- 压缩指定时间范围
SELECT compress_chunk(c.chunk_name)
FROM timescaledb_information.chunks c
WHERE c.hypertable_name = 'sensor_data'
AND c.range_end < NOW() - INTERVAL '7 days'
AND c.is_compressed = false;
-- 解压缩分片
SELECT decompress_chunk('_hyper_1_1_chunk');
6.3 自动压缩策略 #
sql
-- 添加压缩策略
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time DESC'
);
SELECT add_compression_policy(
'sensor_data',
INTERVAL '7 days'
);
-- 查看压缩策略
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression';
七、分片删除 #
7.1 删除旧分片 #
sql
-- 删除指定时间范围的数据
SELECT drop_chunks(
'sensor_data',
older_than => INTERVAL '90 days'
);
-- 删除指定日期之前的数据
SELECT drop_chunks(
'sensor_data',
older_than => '2024-01-01'::timestamptz
);
-- 查看删除结果
SELECT count(*) FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
7.2 保留策略 #
sql
-- 添加保留策略
SELECT add_retention_policy(
'sensor_data',
INTERVAL '90 days'
);
-- 查看保留策略
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
-- 删除保留策略
SELECT remove_retention_policy('sensor_data');
八、分片监控 #
8.1 分片统计 #
sql
-- 分片数量统计
SELECT
hypertable_name,
count(*) as total_chunks,
sum(CASE WHEN is_compressed THEN 1 ELSE 0 END) as compressed_chunks,
sum(CASE WHEN NOT is_compressed THEN 1 ELSE 0 END) as uncompressed_chunks
FROM timescaledb_information.chunks
GROUP BY hypertable_name;
-- 分片大小趋势
SELECT
date_trunc('week', range_start) as week,
count(*) as chunk_count,
pg_size_pretty(sum(total_bytes)) as size
FROM chunk_relation_size('sensor_data') c
JOIN timescaledb_information.chunks ch
ON c.chunk_name = ch.chunk_name
GROUP BY date_trunc('week', range_start)
ORDER BY week DESC;
-- 分片行数统计
SELECT
chunk_name,
(SELECT count(*) FROM _hyper_1_1_chunk) as row_count
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
LIMIT 10;
8.2 分片健康检查 #
sql
-- 检查分片大小是否均匀
SELECT
hypertable_name,
avg(total_bytes) as avg_size,
stddev(total_bytes) as stddev_size,
min(total_bytes) as min_size,
max(total_bytes) as max_size,
max(total_bytes)::float / min(total_bytes) as size_ratio
FROM chunk_relation_size('sensor_data')
GROUP BY hypertable_name;
-- 检查压缩比例
SELECT
hypertable_name,
count(*) as total_chunks,
sum(CASE WHEN is_compressed THEN 1 ELSE 0 END) as compressed,
round(100.0 * sum(CASE WHEN is_compressed THEN 1 ELSE 0 END) / count(*), 2) as compression_rate
FROM timescaledb_information.chunks
GROUP BY hypertable_name;
九、分片最佳实践 #
9.1 分片大小建议 #
text
分片大小建议:
理想大小
├── 最小:1GB
├── 推荐:1-50GB
└── 最大:100GB
考虑因素
├── 数据量
├── 查询模式
├── 保留时间
└── 压缩策略
调整方法
├── 调整分区间隔
├── 合并小分片
└── 分割大分片
9.2 分片数量建议 #
text
分片数量建议:
活跃分片
├── 不超过1000个
├── 建议在100-500个
└── 根据查询频率调整
历史分片
├── 可以更多
├── 建议压缩
└── 定期清理
监控指标
├── 查询性能
├── 写入性能
├── 管理开销
└── 备份时间
9.3 空间分区建议 #
text
空间分区建议:
分区数量
├── 2-4倍CPU核心数
├── 不超过设备数量
└── 根据写入并发调整
适用场景
├── 多设备并发写入
├── 设备级别查询
├── 大规模部署
└── 高并发场景
不适用场景
├── 单一数据源
├── 全局聚合查询
├── 数据量小
└── 低并发场景
十、常见问题 #
10.1 分片过多 #
sql
-- 检查分片数量
SELECT count(*) FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
-- 解决方案1:增大分区间隔
SELECT set_chunk_time_interval('sensor_data', INTERVAL '14 days');
-- 解决方案2:合并旧分片
SELECT merge_chunks(
'sensor_data',
older_than => INTERVAL '30 days'
);
10.2 分片大小不均 #
sql
-- 检查分片大小分布
SELECT
chunk_name,
pg_size_pretty(total_bytes) as size
FROM chunk_relation_size('sensor_data')
ORDER BY total_bytes DESC
LIMIT 10;
-- 解决方案:调整分区间隔
SELECT set_chunk_time_interval('sensor_data', INTERVAL '1 day');
10.3 查询性能差 #
sql
-- 检查分区裁剪
EXPLAIN ANALYZE
SELECT * FROM sensor_data
WHERE time > NOW() - INTERVAL '7 days';
-- 确保时间条件在WHERE子句中
-- 避免使用阻止裁剪的函数
十一、总结 #
分片管理要点:
| 操作 | 命令 | 说明 |
|---|---|---|
| 查看分片 | timescaledb_information.chunks | 分片信息 |
| 合并分片 | merge_chunks() | 合并小分片 |
| 分割分片 | split_chunk() | 分割大分片 |
| 删除分片 | drop_chunks() | 删除旧数据 |
| 压缩分片 | compress_chunk() | 压缩数据 |
最佳实践:
- 分片大小:保持在1-50GB之间
- 分片数量:活跃分片不超过1000个
- 空间分区:根据并发需求选择
- 分区裁剪:确保查询使用时间条件
- 监控维护:定期检查分片状态
下一步,让我们学习时间分区的详细知识!
最后更新:2026-03-27