Elasticsearch聚合分析 #

一、聚合概述 #

1.1 聚合类型 #

text
聚合类型
├── 指标聚合(Metrics)
│   ├── 单值指标
│   │   ├── avg, sum, min, max
│   │   ├── cardinality, value_count
│   │   └── stats, extended_stats
│   └── 多值指标
│       ├── percentiles, percentile_ranks
│       ├── top_hits
│       └── scripted_metric
├── 桶聚合(Bucket)
│   ├── terms, filter, filters
│   ├── range, date_range, histogram
│   └── nested, reverse_nested
└── 管道聚合(Pipeline)
    ├── avg_bucket, sum_bucket
    ├── max_bucket, min_bucket
    └── stats_bucket, extended_stats_bucket

1.2 聚合结构 #

json
{
  "aggs": {
    "<aggregation_name>": {
      "<aggregation_type>": {
        <aggregation_body>
      },
      "aggs": {
        "<sub_aggregation>": {}
      }
    }
  }
}

二、指标聚合 #

2.1 平均值(avg) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

响应:

json
{
  "aggregations": {
    "avg_price": {
      "value": 899.5
    }
  }
}

2.2 总和(sum) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "total_sales": {
      "sum": {
        "field": "sales"
      }
    }
  }
}

2.3 最大值和最小值 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "max_price": {
      "max": {
        "field": "price"
      }
    },
    "min_price": {
      "min": {
        "field": "price"
      }
    }
  }
}

2.4 统计信息(stats) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_stats": {
      "stats": {
        "field": "price"
      }
    }
  }
}

响应:

json
{
  "aggregations": {
    "price_stats": {
      "count": 100,
      "min": 99.0,
      "max": 1999.0,
      "avg": 899.5,
      "sum": 89950.0
    }
  }
}

2.5 扩展统计(extended_stats) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_extended_stats": {
      "extended_stats": {
        "field": "price"
      }
    }
  }
}

响应包含:count, min, max, avg, sum, sum_of_squares, variance, std_deviation等。

2.6 基数统计(cardinality) #

统计唯一值数量:

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "unique_brands": {
      "cardinality": {
        "field": "brand",
        "precision_threshold": 100
      }
    }
  }
}

2.7 文档计数(value_count) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "total_ratings": {
      "value_count": {
        "field": "rating"
      }
    }
  }
}

2.8 百分位数(percentiles) #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

2.9 top_hits #

获取每个桶中的顶部文档:

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "top_products": {
      "terms": {
        "field": "brand"
      },
      "aggs": {
        "top_sales": {
          "top_hits": {
            "sort": [
              { "sales": "desc" }
            ],
            "size": 3,
            "_source": ["name", "price", "sales"]
          }
        }
      }
    }
  }
}

三、桶聚合 #

3.1 terms聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "terms": {
        "field": "brand",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

参数说明

参数 说明
size 返回桶数量
order 排序方式
min_doc_count 最小文档数
missing 缺失值处理

3.2 filter聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "apple_products": {
      "filter": {
        "term": { "brand": "Apple" }
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

3.3 filters聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "filters": {
        "filters": {
          "apple": { "term": { "brand": "Apple" } },
          "samsung": { "term": { "brand": "Samsung" } },
          "other": { "bool": { "must_not": [
            { "terms": { "brand": ["Apple", "Samsung"] } }
          ]}}
        }
      }
    }
  }
}

3.4 range聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 500, "key": "cheap" },
          { "from": 500, "to": 1000, "key": "medium" },
          { "from": 1000, "key": "expensive" }
        ]
      }
    }
  }
}

3.5 date_range聚合 #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "date_ranges": {
      "date_range": {
        "field": "@timestamp",
        "format": "yyyy-MM-dd",
        "ranges": [
          { "from": "2024-01-01", "to": "2024-03-31", "key": "Q1" },
          { "from": "2024-04-01", "to": "2024-06-30", "key": "Q2" },
          { "from": "2024-07-01", "to": "2024-09-30", "key": "Q3" },
          { "from": "2024-10-01", "to": "2024-12-31", "key": "Q4" }
        ]
      }
    }
  }
}

3.6 histogram聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "price",
        "interval": 200,
        "min_doc_count": 1
      }
    }
  }
}

3.7 date_histogram聚合 #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

时间间隔选项

参数 说明
calendar_interval 日历间隔(minute, hour, day, week, month, year)
fixed_interval 固定间隔(如 1h, 30m)

3.8 nested聚合 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "comments": {
      "nested": {
        "path": "comments"
      },
      "aggs": {
        "avg_rating": {
          "avg": {
            "field": "comments.rating"
          }
        }
      }
    }
  }
}

四、管道聚合 #

4.1 avg_bucket #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "logs_per_day": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day"
      }
    },
    "avg_logs": {
      "avg_bucket": {
        "buckets_path": "logs_per_day>_count"
      }
    }
  }
}

4.2 sum_bucket #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "terms": { "field": "brand" },
      "aggs": {
        "total_sales": {
          "sum": { "field": "sales" }
        }
      }
    },
    "total_all_sales": {
      "sum_bucket": {
        "buckets_path": "brands>total_sales"
      }
    }
  }
}

4.3 derivative(导数) #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_change": {
          "derivative": {
            "buckets_path": "_count"
          }
        }
      }
    }
  }
}

4.4 cumulative_sum(累计求和) #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "cumulative": {
          "cumulative_sum": {
            "buckets_path": "_count"
          }
        }
      }
    }
  }
}

4.5 moving_avg(移动平均) #

bash
GET /logs/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "day"
      },
      "aggs": {
        "moving_avg": {
          "moving_avg": {
            "buckets_path": "_count",
            "window": 7
          }
        }
      }
    }
  }
}

五、聚合排序 #

5.1 按文档数排序 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "terms": {
        "field": "brand",
        "order": { "_count": "desc" }
      }
    }
  }
}

5.2 按指标排序 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "terms": {
        "field": "brand",
        "order": { "avg_price": "desc" }
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

5.3 按键排序 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "brands": {
      "terms": {
        "field": "brand",
        "order": { "_key": "asc" }
      }
    }
  }
}

六、聚合与查询组合 #

6.1 全局聚合 #

bash
GET /products/_search
{
  "size": 0,
  "query": {
    "term": { "brand": "Apple" }
  },
  "aggs": {
    "all_products": {
      "global": {},
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

6.2 过滤聚合范围 #

bash
GET /products/_search
{
  "size": 0,
  "aggs": {
    "apple_products": {
      "filter": {
        "term": { "brand": "Apple" }
      },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        }
      }
    }
  }
}

七、聚合最佳实践 #

7.1 性能优化 #

text
性能优化建议
├── 设置size: 0
│   └── 不返回文档,只返回聚合结果
├── 限制桶数量
│   └── 合理设置size参数
├── 使用filter
│   └── 先过滤再聚合
└── 避免深度聚合
    └── 控制聚合嵌套层级

7.2 内存管理 #

text
内存优化
├── 控制桶数量
│   └── 避免高基数字段
├── 使用cardinality
│   └── 估算唯一值数量
├── 设置shard_size
│   └── 控制分片级别桶数量
└── 使用预聚合
    └── 减少实时计算

八、总结 #

本章介绍了Elasticsearch聚合分析:

  1. 指标聚合计算统计值
  2. 桶聚合对数据进行分组
  3. 管道聚合基于其他聚合结果计算
  4. 聚合可以嵌套使用
  5. 合理设置参数优化性能
  6. 注意内存使用和性能

下一步,我们将学习高亮显示。

最后更新:2026-03-27