数据压缩 #
一、压缩概述 #
1.1 什么是数据压缩 #
TimescaleDB的数据压缩功能可以将历史数据转换为列式存储格式,大幅减少存储空间。
text
压缩原理:
原始数据(行式存储)
├── time | sensor_id | temp | humidity
├── 10:00 | 1 | 25.5 | 60.0
├── 10:01 | 1 | 25.6 | 60.5
├── 10:02 | 1 | 25.4 | 59.8
└── ...
压缩数据(列式存储)
├── time: [10:00, 10:01, 10:02, ...]
├── sensor_id: [1, 1, 1, ...]
├── temp: [25.5, 25.6, 25.4, ...]
└── humidity: [60.0, 60.5, 59.8, ...]
压缩优势
├── 列式存储
├── 相似数据压缩
├── 减少存储空间
└── 降低成本
1.2 压缩优势 #
text
压缩优势:
存储优化
├── 减少存储空间 90%+
├── 降低存储成本
├── 减少备份时间
└── 提高缓存效率
查询性能
├── 列式查询优化
├── 减少I/O
├── 提高聚合性能
└── 支持压缩数据查询
管理便利
├── 自动压缩策略
├── 透明化操作
├── 无需手动干预
└── 支持解压操作
1.3 压缩限制 #
text
压缩限制:
更新限制
├── 压缩数据不能直接更新
├── 需要先解压再更新
└── 建议避免更新压缩数据
删除限制
├── 可以删除整个分片
├── 不能删除单行数据
└── 需要解压后删除
索引限制
├── 压缩后索引不同
├── 使用压缩索引
└── 查询性能可能变化
二、启用压缩 #
2.1 基本配置 #
sql
-- 创建超表
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time');
-- 启用压缩
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time DESC'
);
-- 查看压缩配置
SELECT
hypertable_name,
compression_state,
compressed_hypertable_name
FROM timescaledb_information.hypertables
WHERE hypertable_name = 'sensor_data';
2.2 压缩参数 #
sql
-- 完整压缩配置
ALTER TABLE sensor_data SET (
timescaledb.compress, -- 启用压缩
timescaledb.compress_segmentby = 'sensor_id, location', -- 分段列
timescaledb.compress_orderby = 'time DESC, temperature ASC' -- 排序列
);
-- 参数说明
-- compress_segmentby: 分段列,相同值的数据压缩在一起
-- compress_orderby: 排序列,压缩数据内部排序
2.3 选择分段列 #
text
分段列选择原则:
查询模式
├── 选择常用过滤列
├── 选择低基数列
└── 避免高基数列
常用选择
├── device_id
├── sensor_id
├── location
├── type
└── 不超过3列
示例
├── 单列: sensor_id
├── 多列: sensor_id, location
└── 不推荐: time(基数太高)
三、压缩策略 #
3.1 添加压缩策略 #
sql
-- 添加自动压缩策略
SELECT add_compression_policy(
'sensor_data',
INTERVAL '7 days' -- 压缩7天前的数据
);
-- 查看压缩策略
SELECT
job_id,
schedule_interval,
config
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression';
-- 查看策略详情
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression';
3.2 修改压缩策略 #
sql
-- 修改压缩时间
SELECT remove_compression_policy('sensor_data');
SELECT add_compression_policy(
'sensor_data',
INTERVAL '14 days'
);
-- 或使用alter_job
SELECT alter_job(
job_id,
schedule_interval => INTERVAL '1 day'
) FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
AND hypertable_name = 'sensor_data';
3.3 删除压缩策略 #
sql
-- 删除压缩策略
SELECT remove_compression_policy('sensor_data');
-- 验证删除
SELECT * FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
AND hypertable_name = 'sensor_data';
四、手动压缩 #
4.1 压缩分片 #
sql
-- 查看未压缩的分片
SELECT
chunk_name,
range_start,
range_end,
is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
AND is_compressed = false;
-- 压缩单个分片
SELECT compress_chunk('_hyper_1_1_chunk');
-- 压缩多个分片
SELECT compress_chunk(chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
AND is_compressed = false
AND range_end < NOW() - INTERVAL '7 days';
-- 压缩指定时间范围
SELECT compress_chunk(c.chunk_name)
FROM timescaledb_information.chunks c
WHERE c.hypertable_name = 'sensor_data'
AND c.is_compressed = false
AND c.range_end < '2024-01-01'::timestamptz;
4.2 解压分片 #
sql
-- 解压单个分片
SELECT decompress_chunk('_hyper_1_1_chunk');
-- 解压多个分片
SELECT decomcompress_chunk(chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
AND is_compressed = true
AND range_start > NOW() - INTERVAL '30 days';
-- 解压指定时间范围
SELECT decomcompress_chunk(c.chunk_name)
FROM timescaledb_information.chunks c
WHERE c.hypertable_name = 'sensor_data'
AND c.is_compressed = true
AND c.range_start BETWEEN '2024-01-01' AND '2024-01-31';
五、压缩监控 #
5.1 查看压缩状态 #
sql
-- 查看分片压缩状态
SELECT
chunk_name,
is_compressed,
range_start,
range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start DESC;
-- 查看压缩统计
SELECT
hypertable_name,
count(*) as total_chunks,
sum(CASE WHEN is_compressed THEN 1 ELSE 0 END) as compressed_chunks,
round(100.0 * sum(CASE WHEN is_compressed THEN 1 ELSE 0 END) / count(*), 2) as compression_rate
FROM timescaledb_information.chunks
GROUP BY hypertable_name;
5.2 查看压缩比例 #
sql
-- 查看单个分片压缩比例
SELECT
chunk_name,
before_compression_total_bytes,
after_compression_total_bytes,
pg_size_pretty(before_compression_total_bytes) as before_size,
pg_size_pretty(after_compression_total_bytes) as after_size,
round(100.0 * (before_compression_total_bytes - after_compression_total_bytes)
/ before_compression_total_bytes, 2) as compression_ratio
FROM chunk_compression_stats('sensor_data');
-- 查看整体压缩比例
SELECT
hypertable_name,
pg_size_pretty(sum(before_compression_total_bytes)) as total_before,
pg_size_pretty(sum(after_compression_total_bytes)) as total_after,
round(100.0 * (sum(before_compression_total_bytes) - sum(after_compression_total_bytes))
/ sum(before_compression_total_bytes), 2) as compression_ratio
FROM chunk_compression_stats('sensor_data')
GROUP BY hypertable_name;
5.3 查看压缩作业状态 #
sql
-- 查看压缩作业执行情况
SELECT
job_id,
last_run,
next_run,
last_run_success,
total_runs,
total_failures
FROM timescaledb_information.job_stats
WHERE job_id IN (
SELECT job_id FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
);
-- 查看压缩错误
SELECT * FROM timescaledb_information.job_errors
WHERE job_id IN (
SELECT job_id FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
);
六、查询压缩数据 #
6.1 透明查询 #
sql
-- 压缩数据查询与普通查询相同
SELECT * FROM sensor_data
WHERE time > NOW() - INTERVAL '30 days'
AND sensor_id = 1;
-- TimescaleDB自动处理压缩数据
-- 无需特殊语法
6.2 查询性能 #
text
压缩数据查询性能:
优势
├── 列式存储
├── 减少I/O
├── 提高聚合性能
└── 更好的缓存效率
考虑因素
├── 分段列过滤
├── 排序列范围查询
└── 避免全表扫描
6.3 查询优化 #
sql
-- 使用分段列过滤
SELECT * FROM sensor_data
WHERE sensor_id = 1
AND time > NOW() - INTERVAL '30 days';
-- 使用排序列范围查询
SELECT * FROM sensor_data
WHERE time BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY time DESC;
-- 避免全表扫描
-- 不推荐
SELECT * FROM sensor_data WHERE temperature > 30;
-- 推荐
SELECT * FROM sensor_data
WHERE sensor_id = 1
AND temperature > 30
AND time > NOW() - INTERVAL '30 days';
七、压缩与更新 #
7.1 更新压缩数据 #
sql
-- 压缩数据不能直接更新
-- 需要先解压
-- 解压分片
SELECT decompress_chunk('_hyper_1_1_chunk');
-- 执行更新
UPDATE sensor_data
SET temperature = 26.0
WHERE time BETWEEN '2024-01-01' AND '2024-01-07';
-- 重新压缩
SELECT compress_chunk('_hyper_1_1_chunk');
7.2 删除压缩数据 #
sql
-- 删除整个分片(推荐)
SELECT drop_chunks(
'sensor_data',
older_than => INTERVAL '90 days'
);
-- 删除特定数据(需要解压)
SELECT decompress_chunk('_hyper_1_1_chunk');
DELETE FROM sensor_data WHERE ...;
SELECT compress_chunk('_hyper_1_1_chunk');
八、压缩最佳实践 #
8.1 压缩时机 #
text
压缩时机选择:
数据热度
├── 热数据(最近7天)- 不压缩
├── 温数据(7-30天)- 考虑压缩
└── 冷数据(30天以上)- 必须压缩
查询模式
├── 频繁查询 - 延迟压缩
├── 偶尔查询 - 正常压缩
└── 很少查询 - 尽早压缩
存储成本
├── 存储紧张 - 尽早压缩
├── 存储充足 - 延迟压缩
└── 平衡性能和成本
8.2 分段列选择 #
text
分段列选择最佳实践:
选择原则
├── 低基数列
├── 常用过滤列
├── 查询模式匹配
└── 不超过3列
推荐选择
├── device_id
├── sensor_id
├── location
├── type
└── region
不推荐
├── time(基数太高)
├── id(基数太高)
└── timestamp(基数太高)
8.3 压缩策略配置 #
sql
-- 推荐配置
-- 1. 启用压缩
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time DESC'
);
-- 2. 添加压缩策略
SELECT add_compression_policy(
'sensor_data',
INTERVAL '7 days'
);
-- 3. 监控压缩效果
SELECT * FROM chunk_compression_stats('sensor_data');
九、常见问题 #
9.1 压缩失败 #
sql
-- 问题:压缩失败
-- 检查错误日志
SELECT * FROM timescaledb_information.job_errors
WHERE job_id IN (
SELECT job_id FROM timescaledb_information.jobs
WHERE proc_name = 'policy_compression'
);
-- 常见原因
-- 1. 内存不足
-- 2. 磁盘空间不足
-- 3. 配置错误
-- 解决方案
-- 1. 增加work_mem
SET work_mem = '256MB';
-- 2. 清理磁盘空间
-- 3. 检查压缩配置
SELECT * FROM timescaledb_information.compression_settings
WHERE hypertable_name = 'sensor_data';
9.2 压缩比例低 #
sql
-- 问题:压缩比例低
-- 检查数据特征
SELECT
count(DISTINCT sensor_id) as distinct_sensors,
count(*) as total_rows
FROM sensor_data;
-- 原因分析
-- 1. 数据基数高
-- 2. 分段列选择不当
-- 3. 数据类型不适合压缩
-- 解决方案
-- 1. 重新选择分段列
ALTER TABLE sensor_data SET (
timescaledb.compress_segmentby = 'sensor_id, location'
);
-- 2. 重新压缩
SELECT decomcompress_chunk(chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data' AND is_compressed = true;
SELECT compress_chunk(chunk_name)
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data' AND is_compressed = false;
9.3 查询性能下降 #
sql
-- 问题:压缩后查询性能下降
-- 检查查询计划
EXPLAIN ANALYZE
SELECT * FROM sensor_data
WHERE sensor_id = 1 AND time > NOW() - INTERVAL '30 days';
-- 原因分析
-- 1. 未使用分段列过滤
-- 2. 全表扫描
-- 3. 压缩索引不匹配
-- 解决方案
-- 1. 优化查询,使用分段列
-- 2. 检查压缩配置
-- 3. 考虑解压部分数据
十、总结 #
压缩操作要点:
| 操作 | 命令 | 说明 |
|---|---|---|
| 启用压缩 | ALTER TABLE SET | 配置压缩参数 |
| 添加策略 | add_compression_policy | 自动压缩 |
| 手动压缩 | compress_chunk | 压缩分片 |
| 解压 | decompress_chunk | 解压分片 |
最佳实践:
- 压缩时机:根据数据热度选择
- 分段列:选择低基数、常用过滤列
- 监控效果:定期检查压缩比例
- 查询优化:使用分段列过滤
下一步,让我们学习保留策略!
最后更新:2026-03-27