聚合函数 #
一、聚合函数概述 #
1.1 什么是聚合函数 #
聚合函数对一组值进行计算,返回单个汇总值。
text
聚合函数特点:
├── 对多行数据进行汇总
├── 返回单个结果
├── 常与GROUP BY配合使用
├── 忽略NULL值(除COUNT(*)外)
└── 可与DISTINCT配合使用
1.2 可用聚合函数 #
| 函数 | 说明 |
|---|---|
| COUNT() | 计数 |
| SUM() | 求和 |
| AVG() | 平均值 |
| MIN() | 最小值 |
| MAX() | 最大值 |
| COLLECT() | 收集为列表 |
| PERCENTILE_CONT() | 百分位数 |
| PERCENTILE_DISC() | 离散百分位数 |
| STDEV() | 标准差 |
| STDEVP() | 总体标准差 |
二、COUNT函数 #
2.1 基本计数 #
cypher
MATCH (p:Person)
RETURN count(p) AS person_count
2.2 COUNT(*) #
cypher
MATCH (p:Person)-[r:KNOWS]->(f:Person)
RETURN count(*) AS relationship_count
2.3 COUNT(DISTINCT) #
cypher
MATCH (p:Person)-[:KNOWS]->(f:Person)
RETURN count(DISTINCT p) AS unique_persons
2.4 条件计数 #
cypher
MATCH (p:Person)
RETURN count(CASE WHEN p.age > 30 THEN 1 END) AS over_30_count
2.5 分组计数 #
cypher
MATCH (p:Person)
RETURN p.city, count(p) AS count
ORDER BY count DESC
2.6 关系计数 #
cypher
MATCH (p:Person)
RETURN p.name, size((p)-[:KNOWS]->()) AS friend_count
三、SUM函数 #
3.1 基本求和 #
cypher
MATCH (p:Person)
RETURN sum(p.age) AS total_age
3.2 SUM(DISTINCT) #
cypher
MATCH (p:Person)
RETURN sum(DISTINCT p.age) AS unique_age_sum
3.3 分组求和 #
cypher
MATCH (p:Person)
RETURN p.city, sum(p.age) AS total_age
ORDER BY total_age DESC
3.4 条件求和 #
cypher
MATCH (p:Person)
RETURN sum(CASE WHEN p.age > 30 THEN p.age ELSE 0 END) AS over_30_age_sum
3.5 计算金额 #
cypher
MATCH (o:Order)
RETURN sum(o.totalAmount) AS total_revenue
四、AVG函数 #
4.1 基本平均值 #
cypher
MATCH (p:Person)
RETURN avg(p.age) AS avg_age
4.2 AVG(DISTINCT) #
cypher
MATCH (p:Person)
RETURN avg(DISTINCT p.age) AS unique_avg_age
4.3 分组平均值 #
cypher
MATCH (p:Person)
RETURN p.city, avg(p.age) AS avg_age
ORDER BY avg_age DESC
4.4 四舍五入 #
cypher
MATCH (p:Person)
RETURN round(avg(p.age), 2) AS avg_age_rounded
五、MIN和MAX函数 #
5.1 MIN最小值 #
cypher
MATCH (p:Person)
RETURN min(p.age) AS min_age
5.2 MAX最大值 #
cypher
MATCH (p:Person)
RETURN max(p.age) AS max_age
5.3 同时获取 #
cypher
MATCH (p:Person)
RETURN min(p.age) AS min_age, max(p.age) AS max_age
5.4 字符串MIN/MAX #
cypher
MATCH (p:Person)
RETURN min(p.name) AS first_name, max(p.name) AS last_name
5.5 日期MIN/MAX #
cypher
MATCH (e:Event)
RETURN min(e.date) AS earliest, max(e.date) AS latest
5.6 分组MIN/MAX #
cypher
MATCH (p:Person)
RETURN p.city, min(p.age) AS min_age, max(p.age) AS max_age
六、COLLECT函数 #
6.1 基本收集 #
cypher
MATCH (p:Person)
RETURN collect(p.name) AS names
6.2 收集对象 #
cypher
MATCH (p:Person)
RETURN collect(p {.*}) AS persons
6.3 分组收集 #
cypher
MATCH (p:Person)
RETURN p.city, collect(p.name) AS names
6.4 收集去重 #
cypher
MATCH (p:Person)
RETURN collect(DISTINCT p.city) AS cities
6.5 收集关系 #
cypher
MATCH (p:Person)-[r:KNOWS]->(f:Person)
RETURN p.name, collect(f.name) AS friends
6.6 收集并限制 #
cypher
MATCH (p:Person)-[r:KNOWS]->(f:Person)
RETURN p.name, collect(f.name)[0..5] AS top_5_friends
七、分组 #
7.1 隐式分组 #
Cypher中非聚合字段自动成为分组键:
cypher
MATCH (p:Person)
RETURN p.city, count(p) AS count
7.2 多字段分组 #
cypher
MATCH (p:Person)
RETURN p.city, p.status, count(p) AS count
7.3 分组后过滤(HAVING) #
使用WITH子句实现HAVING功能:
cypher
MATCH (p:Person)
WITH p.city AS city, count(p) AS count
WHERE count > 10
RETURN city, count
ORDER BY count DESC
7.4 分组后排序 #
cypher
MATCH (p:Person)
RETURN p.city, count(p) AS count
ORDER BY count DESC
7.5 分组后限制 #
cypher
MATCH (p:Person)
RETURN p.city, count(p) AS count
ORDER BY count DESC
LIMIT 10
八、PERCENTILE函数 #
8.1 PERCENTILE_CONT #
连续百分位数:
cypher
MATCH (p:Person)
RETURN percentileCont(p.age, 0.5) AS median_age
8.2 PERCENTILE_DISC #
离散百分位数:
cypher
MATCH (p:Person)
RETURN percentileDisc(p.age, 0.5) AS median_age
8.3 多百分位数 #
cypher
MATCH (p:Person)
RETURN
percentileCont(p.age, 0.25) AS p25,
percentileCont(p.age, 0.50) AS p50,
percentileCont(p.age, 0.75) AS p75
九、STDEV函数 #
9.1 标准差 #
cypher
MATCH (p:Person)
RETURN stdev(p.age) AS age_stdev
9.2 总体标准差 #
cypher
MATCH (p:Person)
RETURN stdevp(p.age) AS age_stdevp
十、组合聚合 #
10.1 多聚合函数 #
cypher
MATCH (p:Person)
RETURN
count(p) AS total,
avg(p.age) AS avg_age,
min(p.age) AS min_age,
max(p.age) AS max_age,
sum(p.age) AS total_age
10.2 分组多聚合 #
cypher
MATCH (p:Person)
RETURN
p.city,
count(p) AS count,
avg(p.age) AS avg_age,
min(p.age) AS min_age,
max(p.age) AS max_age
ORDER BY count DESC
10.3 嵌套聚合 #
cypher
MATCH (p:Person)
WITH p.city AS city, collect(p.age) AS ages
RETURN
city,
size(ages) AS count,
avg([x IN ages WHERE x > 30]) AS avg_over_30
十一、聚合与WITH #
11.1 聚合后继续处理 #
cypher
MATCH (p:Person)
WITH p.city AS city, count(p) AS count, avg(p.age) AS avg_age
WHERE count > 5
RETURN city, count, avg_age
ORDER BY avg_age DESC
11.2 多阶段聚合 #
cypher
MATCH (p:Person)
WITH p.city AS city, count(p) AS city_count
WITH avg(city_count) AS avg_city_count
RETURN avg_city_count
11.3 聚合后连接 #
cypher
MATCH (p:Person)
WITH p, p.city AS city
MATCH (c:City {name: city})
RETURN c.name, count(p) AS population
十二、条件聚合 #
12.1 CASE聚合 #
cypher
MATCH (p:Person)
RETURN
count(CASE WHEN p.age < 18 THEN 1 END) AS minors,
count(CASE WHEN p.age >= 18 AND p.age < 60 THEN 1 END) AS adults,
count(CASE WHEN p.age >= 60 THEN 1 END) AS seniors
12.2 分组条件聚合 #
cypher
MATCH (p:Person)
RETURN
p.city,
count(CASE WHEN p.status = 'active' THEN 1 END) AS active_count,
count(CASE WHEN p.status = 'inactive' THEN 1 END) AS inactive_count
12.3 条件求和 #
cypher
MATCH (o:Order)
RETURN
sum(CASE WHEN o.status = 'completed' THEN o.amount ELSE 0 END) AS completed_total,
sum(CASE WHEN o.status = 'pending' THEN o.amount ELSE 0 END) AS pending_total
十三、聚合最佳实践 #
13.1 使用索引 #
cypher
CREATE INDEX FOR (p:Person) ON (p.city)
MATCH (p:Person)
WHERE p.city = 'New York'
RETURN count(p)
13.2 避免重复计算 #
cypher
MATCH (p:Person)
WITH count(p) AS total, avg(p.age) AS avg_age
RETURN total, avg_age
13.3 使用WITH优化 #
cypher
MATCH (p:Person)
WHERE p.status = 'active'
WITH p
RETURN count(p) AS active_count
13.4 性能建议 #
text
建议:
├── 使用索引加速过滤
├── 合理使用DISTINCT
├── 避免不必要的聚合
├── 使用WITH分阶段处理
└── 限制聚合数据量
十四、实际应用示例 #
14.1 用户统计 #
cypher
MATCH (u:User)
RETURN
count(u) AS total_users,
count(CASE WHEN u.isActive THEN 1 END) AS active_users,
avg(u.loginCount) AS avg_logins,
max(u.lastLogin) AS last_activity
14.2 销售统计 #
cypher
MATCH (o:Order)
WHERE o.createdAt >= datetime() - duration('P30D')
RETURN
count(o) AS order_count,
sum(o.totalAmount) AS total_revenue,
avg(o.totalAmount) AS avg_order_value,
min(o.totalAmount) AS min_order,
max(o.totalAmount) AS max_order
14.3 分类统计 #
cypher
MATCH (p:Product)-[:BELONGS_TO]->(c:Category)
RETURN
c.name AS category,
count(p) AS product_count,
avg(p.price) AS avg_price,
collect(p.name)[0..5] AS sample_products
ORDER BY product_count DESC
14.4 社交分析 #
cypher
MATCH (u:User)
OPTIONAL MATCH (u)-[:FOLLOWS]->(f:User)
WITH u, count(f) AS following_count
OPTIONAL MATCH (u)<-[:FOLLOWS]-(f:User)
RETURN
avg(following_count) AS avg_following,
min(following_count) AS min_following,
max(following_count) AS max_following,
percentileCont(following_count, 0.5) AS median_following
14.5 时间序列聚合 #
cypher
MATCH (o:Order)
WHERE o.createdAt >= datetime() - duration('P7D')
WITH o.createdAt.date AS date, count(o) AS order_count, sum(o.totalAmount) AS daily_revenue
RETURN date, order_count, daily_revenue
ORDER BY date
十五、总结 #
聚合函数要点:
| 函数 | 用途 | 示例 |
|---|---|---|
| COUNT | 计数 | count(p) |
| SUM | 求和 | sum(p.age) |
| AVG | 平均值 | avg(p.age) |
| MIN | 最小值 | min(p.age) |
| MAX | 最大值 | max(p.age) |
| COLLECT | 收集列表 | collect(p.name) |
最佳实践:
- 使用索引加速聚合查询
- 使用WITH进行分阶段聚合
- 合理使用DISTINCT
- 使用条件聚合实现复杂统计
- 注意NULL值处理
下一步,让我们学习高级查询!
最后更新:2026-03-27