聚合函数 #

一、聚合函数概述 #

1.1 支持的聚合函数 #

函数 描述
COUNT 计数
SUM 求和
AVG 平均值
MIN 最小值
MAX 最大值

1.2 基本语法 #

sql
SELECT aggregate_function(column_name)
FROM table_name
[WHERE condition]
[GROUP BY column_name];

二、COUNT函数 #

2.1 基本计数 #

sql
-- 计算总行数
SELECT COUNT(*) FROM users;

-- 计算特定列非空值数量
SELECT COUNT(email) FROM users;

-- 带条件计数
SELECT COUNT(*) FROM orders 
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000;

2.2 分组计数 #

sql
-- 按分区键分组计数
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id;

-- 按聚簇列分组计数
SELECT user_id, order_date, COUNT(*) as daily_count
FROM orders
GROUP BY user_id, order_date;

三、SUM函数 #

3.1 基本求和 #

sql
-- 计算总和
SELECT SUM(amount) FROM orders 
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000;

-- 多列求和
SELECT SUM(amount), SUM(quantity) FROM order_items 
WHERE order_id = ?;

3.2 分组求和 #

sql
-- 按用户分组求和
SELECT user_id, SUM(amount) as total_amount
FROM orders
GROUP BY user_id;

-- 按日期分组求和
SELECT user_id, order_date, SUM(amount) as daily_total
FROM orders
GROUP BY user_id, order_date;

四、AVG函数 #

4.1 基本平均值 #

sql
-- 计算平均值
SELECT AVG(age) FROM users;

SELECT AVG(price) FROM products;

-- 带条件平均值
SELECT AVG(amount) FROM orders 
WHERE user_id = ? AND order_date >= '2024-01-01';

4.2 分组平均值 #

sql
-- 按产品类别分组平均价格
SELECT category_id, AVG(price) as avg_price
FROM products
GROUP BY category_id;

五、MIN/MAX函数 #

5.1 基本使用 #

sql
-- 最小值和最大值
SELECT MIN(age), MAX(age) FROM users;

SELECT MIN(price), MAX(price) FROM products;

-- 时间范围
SELECT MIN(created_at), MAX(created_at) FROM users;

-- 带条件
SELECT MIN(amount), MAX(amount) FROM orders 
WHERE user_id = ?;

5.2 分组使用 #

sql
-- 按用户分组获取订单金额范围
SELECT user_id, MIN(amount), MAX(amount)
FROM orders
GROUP BY user_id;

六、GROUP BY #

6.1 基本分组 #

sql
-- 按分区键分组
SELECT user_id, COUNT(*), SUM(amount)
FROM orders
GROUP BY user_id;

-- 按聚簇列分组
SELECT user_id, order_date, COUNT(*), SUM(amount)
FROM orders
GROUP BY user_id, order_date;

6.2 分组规则 #

text
GROUP BY规则:

必须按主键顺序
├── 分区键必须完整
├── 聚簇列按顺序使用
└── 不能跳过主键列

示例(PRIMARY KEY (user_id, order_date, order_id)):
✓ GROUP BY user_id
✓ GROUP BY user_id, order_date
✓ GROUP BY user_id, order_date, order_id
✗ GROUP BY order_date(缺少分区键)
✗ GROUP BY user_id, order_id(跳过order_date)

6.3 分组聚合组合 #

sql
-- 完整分组聚合
SELECT 
    user_id,
    order_date,
    COUNT(*) as order_count,
    SUM(amount) as total_amount,
    AVG(amount) as avg_amount,
    MIN(amount) as min_amount,
    MAX(amount) as max_amount
FROM orders
GROUP BY user_id, order_date;

七、组合使用 #

7.1 多聚合函数 #

sql
-- 组合多个聚合函数
SELECT 
    COUNT(*) as total,
    SUM(amount) as total_amount,
    AVG(amount) as avg_amount,
    MIN(amount) as min_amount,
    MAX(amount) as max_amount
FROM orders
WHERE user_id = ?;

7.2 聚合与分组 #

sql
-- 统计报表
SELECT 
    user_id,
    order_date,
    COUNT(*) as order_count,
    SUM(amount) as total_amount,
    AVG(amount) as avg_amount
FROM orders
WHERE user_id = ?
AND order_date >= '2024-01-01'
AND order_date <= '2024-01-31'
GROUP BY user_id, order_date;

八、限制与注意事项 #

8.1 不支持的功能 #

text
不支持的功能:

HAVING子句
├── 不能过滤聚合结果
└── 需要在应用层处理

嵌套聚合
├── 不能嵌套聚合函数
└── 如 AVG(SUM(amount))

自定义聚合函数
├── 不支持UDAF
└── 使用内置函数

复杂分组
├── 不支持ROLLUP
├── 不支持CUBE
└── 不支持GROUPING SETS

8.2 性能考虑 #

text
聚合性能考虑:

分区聚合
├── 单分区聚合性能好
└── 数据量可控

全表聚合
├── 需要扫描所有分区
├── 性能较差
└── 可能超时

建议
├── 限制聚合范围
├── 使用分区键过滤
└── 大数据量考虑预聚合

九、预聚合设计 #

9.1 预聚合表 #

sql
-- 创建预聚合表
CREATE TABLE daily_order_stats (
    stat_date DATE,
    user_id UUID,
    order_count COUNTER,
    total_amount COUNTER,
    PRIMARY KEY (stat_date, user_id)
);

-- 更新统计数据
UPDATE daily_order_stats 
SET order_count = order_count + 1,
    total_amount = total_amount + ?
WHERE stat_date = ? AND user_id = ?;

-- 查询统计数据
SELECT * FROM daily_order_stats 
WHERE stat_date = '2024-01-15';

9.2 物化视图 #

sql
-- 创建物化视图用于聚合
CREATE MATERIALIZED VIEW orders_by_date AS
SELECT user_id, order_date, order_id, amount
FROM orders
WHERE user_id IS NOT NULL 
AND order_date IS NOT NULL
AND order_id IS NOT NULL
PRIMARY KEY (order_date, user_id, order_id);

-- 按日期查询
SELECT user_id, COUNT(*), SUM(amount)
FROM orders_by_date
WHERE order_date = '2024-01-15'
GROUP BY user_id;

十、总结 #

聚合函数要点:

  1. 基本函数:COUNT、SUM、AVG、MIN、MAX
  2. GROUP BY:必须按主键顺序分组
  3. 不支持HAVING:需要在应用层过滤
  4. 性能考虑:限制聚合范围
  5. 预聚合:大数据量考虑预聚合表
  6. 计数器:使用计数器表实现实时统计

下一步,让我们学习排序与分页!

最后更新:2026-03-27