Elasticsearch 学习笔记(三):聚合分析

写在前面

本文是 Elasticsearch 学习笔记系列的第三篇,介绍 ES 的聚合分析框架:指标聚合、桶聚合、嵌套聚合和实战统计场景。前置知识:文档操作与搜索(第二篇)。


一、聚合分析概述

1.1 什么是聚合

聚合(Aggregation)是对数据进行统计计算和分析,类似 SQL 的 GROUP BY + 聚合函数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
SQL:
  SELECT brand, COUNT(*), AVG(price)
  FROM products
  WHERE category = '手机'
  GROUP BY brand
  ORDER BY COUNT(*) DESC

ES 聚合:
  "query": { "term": { "category": "手机" } },
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }

1.2 聚合分类

1
2
3
指标聚合(Metric)    — 计算数值指标(avg、sum、max、min)
桶聚合(Bucket)      — 按规则分组,每组一个桶(terms、date_histogram)
管道聚合(Pipeline)  — 基于其他聚合的结果再聚合

二、指标聚合(Metric)

2.1 基本统计

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// 单个指标
GET /products/_search
{
  "size": 0,
  "aggs": {
    "avg_price": { "avg": { "field": "price" } },
    "max_price": { "max": { "field": "price" } },
    "min_price": { "min": { "field": "price" } },
    "sum_price": { "sum": { "field": "price" } }
  }
}

// stats 一次返回多个指标
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_stats": { "stats": { "field": "price" } }
  }
}
// 返回:count, min, max, avg, sum

2.2 去重计数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
GET /products/_search
{
  "size": 0,
  "aggs": {
    "unique_brands": { "cardinality": { "field": "brand" } }
  }
}
// 类似 SQL 的 COUNT(DISTINCT brand)
// 注意:cardinality 是近似值(HyperLogLog 算法)
// precision_threshold 控制精度(默认 3000)

2.3 百分位统计

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_percentiles": {
      "percentiles": {
        "field": "price",
        "percents": [25, 50, 75, 95, 99]
      }
    }
  }
}
// 返回各百分位的价格值

2.4 文档计数

1
2
3
4
5
6
7
8
GET /products/_search
{
  "size": 0,
  "aggs": {
    "value_count": { "value_count": { "field": "price" } }
  }
}
// 有 price 字段的文档数量

三、桶聚合(Bucket)

3.1 terms(按字段值分组)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": {
        "field": "brand",
        "size": 20
      }
    }
  }
}
// 按品牌分组,返回每个品牌的文档数
// size 控制返回多少个桶(默认 10)

// 排序
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": {
        "field": "brand",
        "size": 20,
        "order": { "_key": "asc" }       // 按品牌名排序
        // "order": { "_count": "desc" }  // 按文档数排序(默认)
      }
    }
  }
}

3.2 date_histogram(按时间分组)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// 按月统计商品数量
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_month": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "month",
        "format": "yyyy-MM"
      }
    }
  }
}

// 按天统计
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2026-01-01",
          "max": "2026-06-30"
        }
      }
    }
  }
}
// min_doc_count: 0     — 没有数据的日期也返回(补零)
// extended_bounds      — 扩展时间范围
1
2
3
4
5
6
7
时间间隔选项:
  calendar_interval:
    "minute", "hour", "day", "week", "month", "quarter", "year"

  fixed_interval:
    "30s", "1m", "5m", "1h", "12h", "1d"
    (固定时长,不考虑日历,适合精确间隔)

3.3 histogram(按数值区间分组)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// 按价格区间统计(每 2000 元一个区间)
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_ranges": {
      "histogram": {
        "field": "price",
        "interval": 2000,
        "min_doc_count": 0
      }
    }
  }
}
// 返回:0-2000, 2000-4000, 4000-6000, ...

3.4 range(自定义区间)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_bands": {
      "range": {
        "field": "price",
        "ranges": [
          { "key": "低价位", "to": 2000 },
          { "key": "中低价位", "from": 2000, "to": 5000 },
          { "key": "中高价位", "from": 5000, "to": 10000 },
          { "key": "高价位", "from": 10000 }
        ]
      }
    }
  }
}

3.5 filter(过滤桶)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GET /products/_search
{
  "size": 0,
  "aggs": {
    "in_stock_count": {
      "filter": {
        "term": { "in_stock": true }
      }
    }
  }
}
// 统计有库存的商品数量

3.6 filters(多过滤桶)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
GET /products/_search
{
  "size": 0,
  "aggs": {
    "stock_status": {
      "filters": {
        "filters": {
          "in_stock":  { "term": { "in_stock": true } },
          "out_of_stock": { "term": { "in_stock": false } }
        }
      }
    }
  }
}

四、嵌套聚合

4.1 桶内嵌套指标聚合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// 按品牌分组,每组计算平均价格和最高评分
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand", "size": 20 },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } },
        "max_rating": { "max": { "field": "rating" } },
        "product_count": { "value_count": { "field": "name.keyword" } }
      }
    }
  }
}

返回结果:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"buckets": [
  {
    "key": "华为",
    "doc_count": 3,
    "avg_price": { "value": 5765.67 },
    "max_rating": { "value": 4.8 },
    "product_count": { "value": 3 }
  },
  {
    "key": "苹果",
    "doc_count": 2,
    "avg_price": { "value": 11999.0 },
    "max_rating": { "value": 4.9 },
    "product_count": { "value": 2 }
  }
]

4.2 多级嵌套

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// 按品牌分组 → 按分类再分组 → 计算平均价格
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand" },
      "aggs": {
        "by_category": {
          "terms": { "field": "category" },
          "aggs": {
            "avg_price": { "avg": { "field": "price" } }
          }
        }
      }
    }
  }
}

4.3 按品牌分组 + 按月统计趋势

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand" },
      "aggs": {
        "by_month": {
          "date_histogram": {
            "field": "created_at",
            "calendar_interval": "month",
            "format": "yyyy-MM"
          }
        }
      }
    }
  }
}

五、实战场景

5.1 电商数据分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// 需求:手机分类的各品牌销量占比和平均价格
GET /products/_search
{
  "size": 0,
  "query": {
    "term": { "category": "手机" }
  },
  "aggs": {
    "brands": {
      "terms": { "field": "brand", "size": 10 },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } },
        "price_distribution": {
          "histogram": {
            "field": "price",
            "interval": 1000
          }
        }
      }
    }
  }
}

5.2 日志分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// 假设索引 logs-2026.05

// 1. 每个服务的错误数
GET /logs-*/_search
{
  "size": 0,
  "query": {
    "term": { "level": "ERROR" }
  },
  "aggs": {
    "by_service": {
      "terms": { "field": "service", "size": 20 }
    }
  }
}

// 2. 错误数时间趋势(每小时)
GET /logs-*/_search
{
  "size": 0,
  "query": {
    "term": { "level": "ERROR" }
  },
  "aggs": {
    "errors_over_time": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1h",
        "format": "yyyy-MM-dd HH:mm"
      }
    }
  }
}

// 3. 错误数 TOP 5 服务 + 每个服务的错误类型分布
GET /logs-*/_search
{
  "size": 0,
  "query": {
    "term": { "level": "ERROR" }
  },
  "aggs": {
    "top_services": {
      "terms": { "field": "service", "size": 5 },
      "aggs": {
        "by_error_type": {
          "terms": { "field": "error_type", "size": 10 }
        }
      }
    }
  }
}

5.3 用户行为分析

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// 假设索引 user_actions

// 每天的活跃用户数
GET /user_actions/_search
{
  "size": 0,
  "aggs": {
    "daily_active": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "day",
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "unique_users": {
          "cardinality": { "field": "user_id" }
        }
      }
    }
  }
}

// 各操作的占比
GET /user_actions/_search
{
  "size": 0,
  "aggs": {
    "by_action": {
      "terms": { "field": "action" }
    }
  }
}

六、聚合注意事项

6.1 text 字段不能直接聚合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
text 字段会分词,无法用于聚合。
要对文本字段聚合,需要用它的 keyword 子字段:

错误:  "field": "name"
正确:  "field": "name.keyword"

或者在 Mapping 中定义多字段:
  "name": {
    "type": "text",
    "fields": {
      "keyword": { "type": "keyword" }
    }
  }

6.2 size 的影响

1
2
3
4
5
6
7
8
9
查询的 size 控制返回的文档数(文档级别的结果)
聚合的 size 控制返回的桶数(聚合级别的结果)

// size: 0 表示不返回文档,只返回聚合结果
GET /products/_search
{
  "size": 0,
  "aggs": { ... }
}

6.3 性能建议

1
2
3
4
5
1. 聚合查询加 size: 0(不需要文档只要统计结果)
2. 用 filter 代替 query 减少评分开销
3. 控制桶的数量(terms 的 size 不要太大)
4. 避免深度嵌套聚合(最多 2-3 层)
5. 对聚合字段开启 doc_values(keyword 默认开启)

七、小结

本文学习了聚合分析:

  • 聚合分类(指标、桶、管道)
  • 指标聚合(avg、sum、max、min、cardinality、percentiles)
  • 桶聚合(terms、date_histogram、histogram、range、filter)
  • 嵌套聚合(桶内嵌套指标和多级嵌套)
  • 实战场景(电商分析、日志分析、用户行为分析)

下一篇将学习进阶查询与优化:深度分页、分词器、中文分词和搜索调优。