超函数 #
一、超函数概述 #
1.1 什么是超函数 #
超函数(Hyperfunctions)是TimescaleDB提供的专门用于时序数据分析的函数库。
text
超函数分类:
时间桶函数
├── time_bucket
├── time_bucket_ng
└── time_bucket_gapfill
统计分析函数
├── stats_agg
├── percentile_agg
└── counter_agg
时间分析函数
├── time_weight
├── state_agg
└── interval_agg
近似计算函数
├── approx_percentile
├── approx_row_count
└── distinct_count
1.2 启用超函数 #
sql
-- 超函数已内置在TimescaleDB 2.6+
-- 无需额外安装
-- 检查超函数是否可用
SELECT * FROM pg_proc
WHERE proname LIKE '%time_bucket%'
LIMIT 5;
-- 查看超函数扩展
SELECT * FROM pg_extension
WHERE extname LIKE '%timescaledb%';
二、时间桶函数 #
2.1 time_bucket #
sql
-- 基本用法
SELECT
time_bucket('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
-- 带偏移量
SELECT
time_bucket('1 hour', time, '30 minutes'::interval) AS hour,
AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY hour;
-- 带起始时间
SELECT
time_bucket('1 day', time, NULL, '2024-01-01'::timestamptz) AS day,
AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY day;
2.2 time_bucket_ng #
sql
-- time_bucket_ng(Next Generation)
-- 支持更多时间单位和origin参数
SELECT
time_bucket_ng('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
-- 使用origin参数
SELECT
time_bucket_ng('1 week', time, origin => '2024-01-01'::timestamptz) AS week,
AVG(temperature) AS avg_temp
FROM sensor_data
GROUP BY week;
2.3 time_bucket_gapfill #
sql
-- 填充时间间隙
SELECT
time_bucket_gapfill('1 hour', time, NOW() - INTERVAL '24 hours', NOW()) AS hour,
sensor_id,
locf(AVG(temperature)) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
AND sensor_id = 1
GROUP BY hour, sensor_id
ORDER BY hour;
-- gapfill参数
-- start: 开始时间
-- end: 结束时间
-- 填充缺失的时间桶
2.4 填充方法 #
sql
-- locf: Last Observation Carried Forward
SELECT
time_bucket_gapfill('1 hour', time, start, end) AS hour,
locf(AVG(temperature)) AS avg_temp
FROM sensor_data
GROUP BY hour;
-- interpolate: 线性插值
SELECT
time_bucket_gapfill('1 hour', time, start, end) AS hour,
interpolate(AVG(temperature)) AS avg_temp
FROM sensor_data
GROUP BY hour;
三、统计分析函数 #
3.1 stats_agg #
sql
-- 创建统计聚合
SELECT
time_bucket('1 hour', time) AS hour,
stats_agg(temperature) AS stats
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
-- 提取统计值
SELECT
time_bucket('1 hour', time) AS hour,
average(stats_agg(temperature)) AS avg_temp,
stddev(stats_agg(temperature)) AS std_temp,
variance(stats_agg(temperature)) AS var_temp,
min(stats_agg(temperature)) AS min_temp,
max(stats_agg(temperature)) AS max_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
3.2 percentile_agg #
sql
-- 计算百分位数
SELECT
time_bucket('1 hour', time) AS hour,
percentile_agg(temperature) AS percentiles
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
-- 提取百分位数
SELECT
time_bucket('1 hour', time) AS hour,
approx_percentile(0.5, percentile_agg(temperature)) AS median,
approx_percentile(0.95, percentile_agg(temperature)) AS p95,
approx_percentile(0.99, percentile_agg(temperature)) AS p99
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
3.3 counter_agg #
sql
-- 计数器聚合(处理重置)
SELECT
time_bucket('1 hour', time) AS hour,
counter_agg(time, counter_value) AS counter_stats
FROM counter_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
-- 提取计数器值
SELECT
time_bucket('1 hour', time) AS hour,
delta(counter_agg(time, counter_value)) AS delta_value,
rate(counter_agg(time, counter_value)) AS rate_value
FROM counter_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
四、时间加权函数 #
4.1 time_weight #
sql
-- 时间加权平均
SELECT
time_bucket('1 hour', time) AS hour,
time_weight('Linear', time, temperature) AS tw_avg
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
-- 提取时间加权值
SELECT
time_bucket('1 hour', time) AS hour,
average(time_weight('Linear', time, temperature)) AS tw_avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
4.2 时间加权类型 #
sql
-- Linear: 线性加权
SELECT
time_bucket('1 hour', time) AS hour,
average(time_weight('Linear', time, temperature)) AS avg_temp
FROM sensor_data
GROUP BY hour;
-- LOCF: Last Observation Carried Forward
SELECT
time_bucket('1 hour', time) AS hour,
average(time_weight('LOCF', time, temperature)) AS avg_temp
FROM sensor_data
GROUP BY hour;
五、状态分析函数 #
5.1 state_agg #
sql
-- 状态聚合
SELECT
time_bucket('1 hour', time) AS hour,
state_agg(time, status) AS state_stats
FROM status_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;
-- 提取状态持续时间
SELECT
time_bucket('1 hour', time) AS hour,
duration_in(state_agg(time, status), 'active') AS active_duration,
duration_in(state_agg(time, status), 'inactive') AS inactive_duration
FROM status_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
5.2 interval_agg #
sql
-- 区间聚合
SELECT
time_bucket('1 day', time) AS day,
interval_agg(time, end_time) AS interval_stats
FROM interval_data
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY day;
-- 提取区间统计
SELECT
time_bucket('1 day', time) AS day,
range(interval_agg(time, end_time)) AS total_range,
duration(interval_agg(time, end_time)) AS total_duration
FROM interval_data
GROUP BY day;
六、近似计算函数 #
6.1 approx_percentile #
sql
-- 近似百分位数
SELECT
time_bucket('1 hour', time) AS hour,
approx_percentile(0.5, temperature) AS median,
approx_percentile(0.95, temperature) AS p95,
approx_percentile(0.99, temperature) AS p99
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
6.2 approx_row_count #
sql
-- 近似行数
SELECT approx_row_count('sensor_data');
-- 比较精确计数和近似计数
SELECT
count(*) as exact_count,
approx_row_count('sensor_data') as approx_count;
6.3 distinct_count #
sql
-- 近似去重计数
SELECT
time_bucket('1 day', time) AS day,
distinct_count(sensor_id) AS unique_sensors
FROM sensor_data
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;
七、聚合组合 #
7.1 多级聚合 #
sql
-- 小时聚合
WITH hourly_stats AS (
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
stats_agg(temperature) AS temp_stats
FROM sensor_data
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY hour, sensor_id
)
-- 日聚合
SELECT
time_bucket('1 day', hour) AS day,
sensor_id,
average(rollup(temp_stats)) AS avg_temp,
stddev(rollup(temp_stats)) AS std_temp
FROM hourly_stats
GROUP BY day, sensor_id
ORDER BY day, sensor_id;
7.2 滚动聚合 #
sql
-- 使用rollup组合多个聚合
SELECT
time_bucket('1 day', time) AS day,
sensor_id,
average(rollup(stats_agg(temperature))) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY day, sensor_id
ORDER BY day, sensor_id;
八、实际应用示例 #
8.1 监控仪表板 #
sql
-- 系统监控统计
SELECT
time_bucket('5 minutes', time) AS bucket,
average(stats_agg(cpu_usage)) AS avg_cpu,
approx_percentile(0.95, percentile_agg(cpu_usage)) AS p95_cpu,
max(cpu_usage) AS max_cpu,
average(stats_agg(memory_usage)) AS avg_memory
FROM system_metrics
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY bucket
ORDER BY bucket;
8.2 IoT数据分析 #
sql
-- 传感器数据分析
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
average(time_weight('Linear', time, temperature)) AS avg_temp,
approx_percentile(0.5, percentile_agg(temperature)) AS median_temp,
approx_percentile(0.95, percentile_agg(temperature)) AS p95_temp,
stddev(stats_agg(temperature)) AS std_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour, sensor_id
ORDER BY hour, sensor_id;
8.3 金融数据分析 #
sql
-- 股票价格分析
SELECT
time_bucket('1 day', time) AS day,
symbol,
min(price) AS low,
max(price) AS high,
average(time_weight('Linear', time, price)) AS vwap,
sum(volume) AS total_volume
FROM stock_data
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY day, symbol
ORDER BY day, symbol;
九、性能优化 #
9.1 使用连续聚合 #
sql
-- 创建连续聚合预计算
CREATE MATERIALIZED VIEW hourly_sensor_stats
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
stats_agg(temperature) AS temp_stats,
percentile_agg(temperature) AS temp_percentiles
FROM sensor_data
GROUP BY hour, sensor_id;
-- 查询预计算结果
SELECT
hour,
sensor_id,
average(temp_stats) AS avg_temp,
approx_percentile(0.95, temp_percentiles) AS p95_temp
FROM hourly_sensor_stats
WHERE hour > NOW() - INTERVAL '24 hours'
ORDER BY hour, sensor_id;
9.2 索引优化 #
sql
-- 为超函数查询创建索引
CREATE INDEX idx_sensor_time ON sensor_data (sensor_id, time DESC);
-- 使用时间范围查询
SELECT
time_bucket('1 hour', time) AS hour,
average(stats_agg(temperature)) AS avg_temp
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
AND sensor_id = 1
GROUP BY hour;
十、总结 #
超函数要点:
| 函数类别 | 函数名 | 用途 |
|---|---|---|
| 时间桶 | time_bucket | 时间分组 |
| 统计 | stats_agg | 统计分析 |
| 百分位 | percentile_agg | 百分位数 |
| 时间加权 | time_weight | 时间加权平均 |
| 状态 | state_agg | 状态分析 |
最佳实践:
- 时间桶:使用time_bucket进行时间分组
- 统计分析:使用stats_agg进行统计计算
- 连续聚合:预计算超函数结果
- 索引优化:为查询创建合适的索引
恭喜你完成TimescaleDB学习之旅!
最后更新:2026-03-27