写在前面
本文是 Elasticsearch 学习笔记系列的第三篇,介绍 ES 的聚合分析框架:指标聚合、桶聚合、嵌套聚合和实战统计场景。前置知识:文档操作与搜索(第二篇)。
一、聚合分析概述
1.1 什么是聚合
聚合(Aggregation)是对数据进行统计计算和分析,类似 SQL 的 GROUP BY + 聚合函数。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
SQL:
SELECT brand, COUNT(*), AVG(price)
FROM products
WHERE category = '手机'
GROUP BY brand
ORDER BY COUNT(*) DESC
ES 聚合:
"query": { "term": { "category": "手机" } },
"aggs": {
"by_brand": {
"terms": { "field": "brand" },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
|
1.2 聚合分类
1
2
3
|
指标聚合(Metric) — 计算数值指标(avg、sum、max、min)
桶聚合(Bucket) — 按规则分组,每组一个桶(terms、date_histogram)
管道聚合(Pipeline) — 基于其他聚合的结果再聚合
|
二、指标聚合(Metric)
2.1 基本统计
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
// 单个指标
GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"max_price": { "max": { "field": "price" } },
"min_price": { "min": { "field": "price" } },
"sum_price": { "sum": { "field": "price" } }
}
}
// stats 一次返回多个指标
GET /products/_search
{
"size": 0,
"aggs": {
"price_stats": { "stats": { "field": "price" } }
}
}
// 返回:count, min, max, avg, sum
|
2.2 去重计数
1
2
3
4
5
6
7
8
9
10
|
GET /products/_search
{
"size": 0,
"aggs": {
"unique_brands": { "cardinality": { "field": "brand" } }
}
}
// 类似 SQL 的 COUNT(DISTINCT brand)
// 注意:cardinality 是近似值(HyperLogLog 算法)
// precision_threshold 控制精度(默认 3000)
|
2.3 百分位统计
1
2
3
4
5
6
7
8
9
10
11
12
13
|
GET /products/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [25, 50, 75, 95, 99]
}
}
}
}
// 返回各百分位的价格值
|
2.4 文档计数
1
2
3
4
5
6
7
8
|
GET /products/_search
{
"size": 0,
"aggs": {
"value_count": { "value_count": { "field": "price" } }
}
}
// 有 price 字段的文档数量
|
三、桶聚合(Bucket)
3.1 terms(按字段值分组)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
GET /products/_search
{
"size": 0,
"aggs": {
"by_brand": {
"terms": {
"field": "brand",
"size": 20
}
}
}
}
// 按品牌分组,返回每个品牌的文档数
// size 控制返回多少个桶(默认 10)
// 排序
GET /products/_search
{
"size": 0,
"aggs": {
"by_brand": {
"terms": {
"field": "brand",
"size": 20,
"order": { "_key": "asc" } // 按品牌名排序
// "order": { "_count": "desc" } // 按文档数排序(默认)
}
}
}
}
|
3.2 date_histogram(按时间分组)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
// 按月统计商品数量
GET /products/_search
{
"size": 0,
"aggs": {
"by_month": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "month",
"format": "yyyy-MM"
}
}
}
}
// 按天统计
GET /products/_search
{
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "day",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2026-01-01",
"max": "2026-06-30"
}
}
}
}
}
// min_doc_count: 0 — 没有数据的日期也返回(补零)
// extended_bounds — 扩展时间范围
|
1
2
3
4
5
6
7
|
时间间隔选项:
calendar_interval:
"minute", "hour", "day", "week", "month", "quarter", "year"
fixed_interval:
"30s", "1m", "5m", "1h", "12h", "1d"
(固定时长,不考虑日历,适合精确间隔)
|
3.3 histogram(按数值区间分组)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
// 按价格区间统计(每 2000 元一个区间)
GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"histogram": {
"field": "price",
"interval": 2000,
"min_doc_count": 0
}
}
}
}
// 返回:0-2000, 2000-4000, 4000-6000, ...
|
3.4 range(自定义区间)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
GET /products/_search
{
"size": 0,
"aggs": {
"price_bands": {
"range": {
"field": "price",
"ranges": [
{ "key": "低价位", "to": 2000 },
{ "key": "中低价位", "from": 2000, "to": 5000 },
{ "key": "中高价位", "from": 5000, "to": 10000 },
{ "key": "高价位", "from": 10000 }
]
}
}
}
}
|
3.5 filter(过滤桶)
1
2
3
4
5
6
7
8
9
10
11
12
|
GET /products/_search
{
"size": 0,
"aggs": {
"in_stock_count": {
"filter": {
"term": { "in_stock": true }
}
}
}
}
// 统计有库存的商品数量
|
3.6 filters(多过滤桶)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
GET /products/_search
{
"size": 0,
"aggs": {
"stock_status": {
"filters": {
"filters": {
"in_stock": { "term": { "in_stock": true } },
"out_of_stock": { "term": { "in_stock": false } }
}
}
}
}
}
|
四、嵌套聚合
4.1 桶内嵌套指标聚合
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
// 按品牌分组,每组计算平均价格和最高评分
GET /products/_search
{
"size": 0,
"aggs": {
"by_brand": {
"terms": { "field": "brand", "size": 20 },
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"max_rating": { "max": { "field": "rating" } },
"product_count": { "value_count": { "field": "name.keyword" } }
}
}
}
}
|
返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
"buckets": [
{
"key": "华为",
"doc_count": 3,
"avg_price": { "value": 5765.67 },
"max_rating": { "value": 4.8 },
"product_count": { "value": 3 }
},
{
"key": "苹果",
"doc_count": 2,
"avg_price": { "value": 11999.0 },
"max_rating": { "value": 4.9 },
"product_count": { "value": 2 }
}
]
|
4.2 多级嵌套
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
// 按品牌分组 → 按分类再分组 → 计算平均价格
GET /products/_search
{
"size": 0,
"aggs": {
"by_brand": {
"terms": { "field": "brand" },
"aggs": {
"by_category": {
"terms": { "field": "category" },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
}
}
}
|
4.3 按品牌分组 + 按月统计趋势
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
GET /products/_search
{
"size": 0,
"aggs": {
"by_brand": {
"terms": { "field": "brand" },
"aggs": {
"by_month": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "month",
"format": "yyyy-MM"
}
}
}
}
}
}
|
五、实战场景
5.1 电商数据分析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
// 需求:手机分类的各品牌销量占比和平均价格
GET /products/_search
{
"size": 0,
"query": {
"term": { "category": "手机" }
},
"aggs": {
"brands": {
"terms": { "field": "brand", "size": 10 },
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"price_distribution": {
"histogram": {
"field": "price",
"interval": 1000
}
}
}
}
}
}
|
5.2 日志分析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
// 假设索引 logs-2026.05
// 1. 每个服务的错误数
GET /logs-*/_search
{
"size": 0,
"query": {
"term": { "level": "ERROR" }
},
"aggs": {
"by_service": {
"terms": { "field": "service", "size": 20 }
}
}
}
// 2. 错误数时间趋势(每小时)
GET /logs-*/_search
{
"size": 0,
"query": {
"term": { "level": "ERROR" }
},
"aggs": {
"errors_over_time": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1h",
"format": "yyyy-MM-dd HH:mm"
}
}
}
}
// 3. 错误数 TOP 5 服务 + 每个服务的错误类型分布
GET /logs-*/_search
{
"size": 0,
"query": {
"term": { "level": "ERROR" }
},
"aggs": {
"top_services": {
"terms": { "field": "service", "size": 5 },
"aggs": {
"by_error_type": {
"terms": { "field": "error_type", "size": 10 }
}
}
}
}
}
|
5.3 用户行为分析
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
// 假设索引 user_actions
// 每天的活跃用户数
GET /user_actions/_search
{
"size": 0,
"aggs": {
"daily_active": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day",
"format": "yyyy-MM-dd"
},
"aggs": {
"unique_users": {
"cardinality": { "field": "user_id" }
}
}
}
}
}
// 各操作的占比
GET /user_actions/_search
{
"size": 0,
"aggs": {
"by_action": {
"terms": { "field": "action" }
}
}
}
|
六、聚合注意事项
6.1 text 字段不能直接聚合
1
2
3
4
5
6
7
8
9
10
11
12
13
|
text 字段会分词,无法用于聚合。
要对文本字段聚合,需要用它的 keyword 子字段:
错误: "field": "name"
正确: "field": "name.keyword"
或者在 Mapping 中定义多字段:
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword" }
}
}
|
6.2 size 的影响
1
2
3
4
5
6
7
8
9
|
查询的 size 控制返回的文档数(文档级别的结果)
聚合的 size 控制返回的桶数(聚合级别的结果)
// size: 0 表示不返回文档,只返回聚合结果
GET /products/_search
{
"size": 0,
"aggs": { ... }
}
|
6.3 性能建议
1
2
3
4
5
|
1. 聚合查询加 size: 0(不需要文档只要统计结果)
2. 用 filter 代替 query 减少评分开销
3. 控制桶的数量(terms 的 size 不要太大)
4. 避免深度嵌套聚合(最多 2-3 层)
5. 对聚合字段开启 doc_values(keyword 默认开启)
|
七、小结
本文学习了聚合分析:
- 聚合分类(指标、桶、管道)
- 指标聚合(avg、sum、max、min、cardinality、percentiles)
- 桶聚合(terms、date_histogram、histogram、range、filter)
- 嵌套聚合(桶内嵌套指标和多级嵌套)
- 实战场景(电商分析、日志分析、用户行为分析)
下一篇将学习进阶查询与优化:深度分页、分词器、中文分词和搜索调优。