Elasticsearch聚合分析 #
一、聚合概述 #
1.1 聚合类型 #
text
聚合类型
├── 指标聚合(Metrics)
│ ├── 单值指标
│ │ ├── avg, sum, min, max
│ │ ├── cardinality, value_count
│ │ └── stats, extended_stats
│ └── 多值指标
│ ├── percentiles, percentile_ranks
│ ├── top_hits
│ └── scripted_metric
├── 桶聚合(Bucket)
│ ├── terms, filter, filters
│ ├── range, date_range, histogram
│ └── nested, reverse_nested
└── 管道聚合(Pipeline)
├── avg_bucket, sum_bucket
├── max_bucket, min_bucket
└── stats_bucket, extended_stats_bucket
1.2 聚合结构 #
json
{
"aggs": {
"<aggregation_name>": {
"<aggregation_type>": {
<aggregation_body>
},
"aggs": {
"<sub_aggregation>": {}
}
}
}
}
二、指标聚合 #
2.1 平均值(avg) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
响应:
json
{
"aggregations": {
"avg_price": {
"value": 899.5
}
}
}
2.2 总和(sum) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"total_sales": {
"sum": {
"field": "sales"
}
}
}
}
2.3 最大值和最小值 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"max_price": {
"max": {
"field": "price"
}
},
"min_price": {
"min": {
"field": "price"
}
}
}
}
2.4 统计信息(stats) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
响应:
json
{
"aggregations": {
"price_stats": {
"count": 100,
"min": 99.0,
"max": 1999.0,
"avg": 899.5,
"sum": 89950.0
}
}
}
2.5 扩展统计(extended_stats) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"price_extended_stats": {
"extended_stats": {
"field": "price"
}
}
}
}
响应包含:count, min, max, avg, sum, sum_of_squares, variance, std_deviation等。
2.6 基数统计(cardinality) #
统计唯一值数量:
bash
GET /products/_search
{
"size": 0,
"aggs": {
"unique_brands": {
"cardinality": {
"field": "brand",
"precision_threshold": 100
}
}
}
}
2.7 文档计数(value_count) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"total_ratings": {
"value_count": {
"field": "rating"
}
}
}
}
2.8 百分位数(percentiles) #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [1, 5, 25, 50, 75, 95, 99]
}
}
}
}
2.9 top_hits #
获取每个桶中的顶部文档:
bash
GET /products/_search
{
"size": 0,
"aggs": {
"top_products": {
"terms": {
"field": "brand"
},
"aggs": {
"top_sales": {
"top_hits": {
"sort": [
{ "sales": "desc" }
],
"size": 3,
"_source": ["name", "price", "sales"]
}
}
}
}
}
}
三、桶聚合 #
3.1 terms聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"terms": {
"field": "brand",
"size": 10,
"order": {
"_count": "desc"
}
}
}
}
}
参数说明:
| 参数 | 说明 |
|---|---|
| size | 返回桶数量 |
| order | 排序方式 |
| min_doc_count | 最小文档数 |
| missing | 缺失值处理 |
3.2 filter聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"apple_products": {
"filter": {
"term": { "brand": "Apple" }
},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
3.3 filters聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"filters": {
"filters": {
"apple": { "term": { "brand": "Apple" } },
"samsung": { "term": { "brand": "Samsung" } },
"other": { "bool": { "must_not": [
{ "terms": { "brand": ["Apple", "Samsung"] } }
]}}
}
}
}
}
}
3.4 range聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 500, "key": "cheap" },
{ "from": 500, "to": 1000, "key": "medium" },
{ "from": 1000, "key": "expensive" }
]
}
}
}
}
3.5 date_range聚合 #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"date_ranges": {
"date_range": {
"field": "@timestamp",
"format": "yyyy-MM-dd",
"ranges": [
{ "from": "2024-01-01", "to": "2024-03-31", "key": "Q1" },
{ "from": "2024-04-01", "to": "2024-06-30", "key": "Q2" },
{ "from": "2024-07-01", "to": "2024-09-30", "key": "Q3" },
{ "from": "2024-10-01", "to": "2024-12-31", "key": "Q4" }
]
}
}
}
}
3.6 histogram聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"price_histogram": {
"histogram": {
"field": "price",
"interval": 200,
"min_doc_count": 1
}
}
}
}
3.7 date_histogram聚合 #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day",
"format": "yyyy-MM-dd"
}
}
}
}
时间间隔选项:
| 参数 | 说明 |
|---|---|
| calendar_interval | 日历间隔(minute, hour, day, week, month, year) |
| fixed_interval | 固定间隔(如 1h, 30m) |
3.8 nested聚合 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs": {
"avg_rating": {
"avg": {
"field": "comments.rating"
}
}
}
}
}
}
四、管道聚合 #
4.1 avg_bucket #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_per_day": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
}
},
"avg_logs": {
"avg_bucket": {
"buckets_path": "logs_per_day>_count"
}
}
}
}
4.2 sum_bucket #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"terms": { "field": "brand" },
"aggs": {
"total_sales": {
"sum": { "field": "sales" }
}
}
},
"total_all_sales": {
"sum_bucket": {
"buckets_path": "brands>total_sales"
}
}
}
}
4.3 derivative(导数) #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"daily_change": {
"derivative": {
"buckets_path": "_count"
}
}
}
}
}
}
4.4 cumulative_sum(累计求和) #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"cumulative": {
"cumulative_sum": {
"buckets_path": "_count"
}
}
}
}
}
}
4.5 moving_avg(移动平均) #
bash
GET /logs/_search
{
"size": 0,
"aggs": {
"logs_over_time": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"moving_avg": {
"moving_avg": {
"buckets_path": "_count",
"window": 7
}
}
}
}
}
}
五、聚合排序 #
5.1 按文档数排序 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"terms": {
"field": "brand",
"order": { "_count": "desc" }
}
}
}
}
5.2 按指标排序 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"terms": {
"field": "brand",
"order": { "avg_price": "desc" }
},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
5.3 按键排序 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"brands": {
"terms": {
"field": "brand",
"order": { "_key": "asc" }
}
}
}
}
六、聚合与查询组合 #
6.1 全局聚合 #
bash
GET /products/_search
{
"size": 0,
"query": {
"term": { "brand": "Apple" }
},
"aggs": {
"all_products": {
"global": {},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
6.2 过滤聚合范围 #
bash
GET /products/_search
{
"size": 0,
"aggs": {
"apple_products": {
"filter": {
"term": { "brand": "Apple" }
},
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
七、聚合最佳实践 #
7.1 性能优化 #
text
性能优化建议
├── 设置size: 0
│ └── 不返回文档,只返回聚合结果
├── 限制桶数量
│ └── 合理设置size参数
├── 使用filter
│ └── 先过滤再聚合
└── 避免深度聚合
└── 控制聚合嵌套层级
7.2 内存管理 #
text
内存优化
├── 控制桶数量
│ └── 避免高基数字段
├── 使用cardinality
│ └── 估算唯一值数量
├── 设置shard_size
│ └── 控制分片级别桶数量
└── 使用预聚合
└── 减少实时计算
八、总结 #
本章介绍了Elasticsearch聚合分析:
- 指标聚合计算统计值
- 桶聚合对数据进行分组
- 管道聚合基于其他聚合结果计算
- 聚合可以嵌套使用
- 合理设置参数优化性能
- 注意内存使用和性能
下一步,我们将学习高亮显示。
最后更新:2026-03-27