Elasticsearch 学习笔记(二):文档操作与搜索

写在前面

本文是 Elasticsearch 学习笔记系列的第二篇,介绍文档 CRUD 操作和各种搜索查询。前置知识:基础入门(第一篇)。


一、文档 CRUD

1.1 准备测试数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
PUT /products
{
  "mappings": {
    "properties": {
      "name":        { "type": "text", "analyzer": "ik_max_word", "fields": { "keyword": { "type": "keyword" } } },
      "brand":       { "type": "keyword" },
      "price":       { "type": "float" },
      "category":    { "type": "keyword" },
      "in_stock":    { "type": "boolean" },
      "rating":      { "type": "float" },
      "created_at":  { "type": "date", "format": "yyyy-MM-dd" },
      "description": { "type": "text", "analyzer": "ik_max_word" },
      "tags":        { "type": "keyword" }
    }
  }
}

1.2 Index(创建/覆盖)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// 指定 ID 创建(如果已存在则覆盖)
PUT /products/_doc/1
{
  "name": "华为 Mate 60 Pro",
  "brand": "华为",
  "price": 6999,
  "category": "手机",
  "in_stock": true,
  "rating": 4.8,
  "created_at": "2026-01-15",
  "description": "华为旗舰手机,搭载麒麟芯片,支持卫星通信",
  "tags": ["5G", "旗舰", "国产"]
}

// 自动生成 ID
POST /products/_doc
{
  "name": "小米 14",
  "brand": "小米",
  "price": 3999,
  "category": "手机",
  "in_stock": true
}

1.3 Get(查询)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// 按 ID 查询
GET /products/_doc/1

// 只返回 source(不要元数据)
GET /products/_source/1

// 只要特定字段
GET /products/_doc/1?_source=name,price

// 检查文档是否存在(不返回内容)
HEAD /products/_doc/1

1.4 Update(更新)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// 部分更新(只更新指定字段)
POST /products/_update/1
{
  "doc": {
    "price": 6499,
    "in_stock": false
  }
}

// 脚本更新
POST /products/_update/1
{
  "script": {
    "source": "ctx._source.price -= params.discount",
    "params": {
      "discount": 500
    }
  }
}

// upsert(存在则更新,不存在则插入)
POST /products/_update/1
{
  "doc": { "price": 5999 },
  "upsert": {
    "name": "华为 Mate 60 Pro",
    "brand": "华为",
    "price": 5999,
    "category": "手机"
  }
}

1.5 Delete(删除)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// 按 ID 删除
DELETE /products/_doc/1

// 按查询删除
POST /products/_delete_by_query
{
  "query": {
    "match": {
      "brand": "已下架"
    }
  }
}

1.6 Bulk(批量操作)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// 批量操作(注意:每行之间不能有空行)
POST /_bulk
{"index":{"_index":"products","_id":"2"}}
{"name":"小米 14","brand":"小米","price":3999,"category":"手机","in_stock":true,"rating":4.5,"created_at":"2026-02-01","tags":["5G","性价比"]}
{"index":{"_index":"products","_id":"3"}}
{"name":"苹果 iPhone 15 Pro","brand":"苹果","price":8999,"category":"手机","in_stock":true,"rating":4.7,"created_at":"2026-01-20","tags":["5G","旗舰"]}
{"index":{"_index":"products","_id":"4"}}
{"name":"华为 MatePad Pro","brand":"华为","price":3299,"category":"平板","in_stock":true,"rating":4.6,"created_at":"2026-03-01","tags":["平板","办公"]}
{"index":{"_index":"products","_id":"5"}}
{"name":"MacBook Pro 14","brand":"苹果","price":14999,"category":"笔记本","in_stock":true,"rating":4.9,"created_at":"2026-02-15","tags":["办公","创作"]}
{"update":{"_index":"products","_id":"1"}}
{"doc":{"price":6499}}
{"delete":{"_index":"products","_id":"6"}}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Bulk 支持的操作:
  index   — 创建/覆盖文档
  create  — 创建(已存在则失败)
  update  — 更新
  delete  — 删除

性能建议:
  - 每批 1000-5000 条
  - 批量大小控制在 5-15MB
  - 不要太大,否则 ES 内存压力大

二、全文搜索

2.1 match(最常用)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// 全文搜索:对搜索词分词后匹配
GET /products/_search
{
  "query": {
    "match": {
      "name": "华为旗舰"
    }
  }
}
// "华为旗舰" 分词为 ["华为", "旗舰"]
// 匹配 name 中包含 "华为" 或 "旗舰" 的文档(OR 关系)

// AND 关系
GET /products/_search
{
  "query": {
    "match": {
      "name": {
        "query": "华为旗舰",
        "operator": "and"
      }
    }
  }
}
// 必须同时包含 "华为" 和 "旗舰"

2.2 match_phrase(短语匹配)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// 短语搜索:分词后必须按顺序连续出现
GET /products/_search
{
  "query": {
    "match_phrase": {
      "name": "华为 Mate"
    }
  }
}
// "华为 Mate" 必须连续出现,"Mate 华为" 不匹配

// 允许间隔(slop)
GET /products/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "华为 Pro",
        "slop": 2
      }
    }
  }
}
// 允许中间隔 2 个词

2.3 multi_match(多字段搜索)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// 在多个字段中搜索
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "华为旗舰",
      "fields": ["name", "description", "tags"]
    }
  }
}

// 指定字段权重(^ 后面是权重倍数)
GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "华为旗舰",
      "fields": ["name^3", "description^2", "tags"]
    }
  }
}
// name 匹配的得分 × 3,description 匹配 × 2,tags 匹配 × 1

2.4 query_string

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// 支持Lucene 语法的查询
GET /products/_search
{
  "query": {
    "query_string": {
      "query": "(华为 OR 苹果) AND 旗舰",
      "fields": ["name", "description"]
    }
  }
}

三、精确搜索

3.1 term(精确匹配)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// term:不分词,精确匹配(用于 keyword 字段)
GET /products/_search
{
  "query": {
    "term": {
      "brand": "华为"
    }
  }
}
// 注意:不要对 text 字段用 term(text 字段存储的是分词后的结果)

// terms:匹配多个值(类似 SQL 的 IN)
GET /products/_search
{
  "query": {
    "terms": {
      "brand": ["华为", "苹果"]
    }
  }
}

3.2 range(范围查询)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 3000,
        "lte": 7000
      }
    }
  }
}

// 日期范围
GET /products/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "2026-01-01",
        "lt": "2026-04-01"
      }
    }
  }
}
1
2
3
4
5
操作符:
  gt    — 大于
  gte   — 大于等于
  lt    — 小于
  lte   — 小于等于

3.3 exists 和 missing

1
2
3
4
5
6
7
8
9
// 字段存在
GET /products/_search
{
  "query": {
    "exists": {
      "field": "description"
    }
  }
}

3.4 prefix、wildcard、regexp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// 前缀匹配
GET /products/_search
{
  "query": {
    "prefix": {
      "name.keyword": {
        "value": "华为"
      }
    }
  }
}

// 通配符匹配(性能差,慎用)
GET /products/_search
{
  "query": {
    "wildcard": {
      "brand": {
        "value": "华*"
      }
    }
  }
}

// 正则匹配(性能差,慎用)
GET /products/_search
{
  "query": {
    "regexp": {
      "brand": "华.*"
    }
  }
}

四、复合查询(bool)

4.1 bool 查询结构

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GET /products/_search
{
  "query": {
    "bool": {
      "must": [],
      "should": [],
      "must_not": [],
      "filter": []
    }
  }
}
1
2
3
4
must      — 必须匹配,参与评分
should    — 至少匹配一个(或全部不匹配也行),参与评分
must_not  — 必须不匹配,不参与评分
filter    — 必须匹配,不参与评分(性能更好,会缓存)

4.2 实战组合

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// 需求:华为品牌的手机,价格 3000-7000,有库存
GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "华为" } }
      ],
      "filter": [
        { "term": { "category": "手机" } },
        { "range": { "price": { "gte": 3000, "lte": 7000 } } },
        { "term": { "in_stock": true } }
      ],
      "must_not": [
        { "term": { "brand": "已下架" } }
      ]
    }
  }
}

4.3 must vs filter 的区别

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
must:
  - 参与相关性评分(_score)
  - 匹配程度影响排序
  - 适合:用户搜索关键词

filter:
  - 不参与评分(_score = 0)
  - 只判断是/否
  - 性能更好(ES 会缓存 filter 结果)
  - 适合:精确过滤条件(品牌、分类、价格范围、日期范围)

最佳实践:
  用户搜索的关键词 → must
  过滤条件        → filter

4.4 should 的行为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// 有 must/filter 时:should 不强制匹配(加分项)
GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "手机" } }
      ],
      "should": [
        { "term": { "brand": "华为" } },
        { "term": { "brand": "苹果" } }
      ]
    }
  }
}
// 必须匹配 "手机",如果品牌是华为或苹果会排名更靠前

// 没有 must/filter 时:should 至少匹配一个
GET /products/_search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "brand": "华为" } },
        { "term": { "brand": "苹果" } }
      ],
      "minimum_should_match": 1
    }
  }
}

五、排序

5.1 基本排序

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// 按价格升序
GET /products/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "price": "asc" }
  ]
}

// 多字段排序
GET /products/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "in_stock": "desc" },
    { "price": "asc" }
  ]
}
// 先按是否有库存排,有库存的在前;再按价格升序

5.2 按相关性排序

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// 默认按 _score 降序(相关性最高的在前)
GET /products/_search
{
  "query": {
    "match": {
      "name": "华为"
    }
  }
}

// 显式指定
GET /products/_search
{
  "query": {
    "match": {
      "name": "华为"
    }
  },
  "sort": [
    "_score",
    { "price": "asc" }
  ]
}

六、分页

6.1 from / size(基本分页)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// 第1页,每页10条
GET /products/_search
{
  "query": { "match_all": {} },
  "from": 0,
  "size": 10
}

// 第2页
GET /products/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}
1
2
3
4
5
from  — 跳过前 N 条(0 开始)
size  — 返回条数(默认 10,最大 10000)

限制:from + size 不能超过 10000(index.max_result_window)
超过需要用 search_after 或 scroll

6.2 search_after(深度分页)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// 第1次查询
GET /products/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

// 下一页:用上一页最后一条的 sort 值
GET /products/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ],
  "search_after": ["2026-03-01", "4"]
}

6.3 分页方案对比

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from/size     — 简单,适合浅分页(前100页)
                 深分页性能差(from=9990 需要跳过 9990 条)

search_after  — 适合深度分页和实时数据
                 每次请求都基于上一页最后一条
                 不能跳页,只能翻页

scroll        — 适合批量导出全量数据
                 创建快照,逐批拉取
                 不适合实时查询(数据不是最新的)

七、高亮

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
GET /products/_search
{
  "query": {
    "match": {
      "name": "华为旗舰"
    }
  },
  "highlight": {
    "pre_tags": ["<em>"],
    "post_tags": ["</em>"],
    "fields": {
      "name": {},
      "description": {}
    }
  }
}

返回结果中高亮字段:

1
2
3
4
"highlight": {
  "name": ["<em>华为</em> Mate 60 <em>旗舰</em> 手机"],
  "description": ["<em>华为</em><em>旗舰</em>手机,搭载麒麟芯片"]
}

八、_source 过滤

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// 只返回指定字段
GET /products/_search
{
  "query": { "match_all": {} },
  "_source": ["name", "price", "brand"]
}

// 排除指定字段
GET /products/_search
{
  "query": { "match_all": {} },
  "_source": {
    "excludes": ["description", "tags"]
  }
}

// 通配符
GET /products/_search
{
  "query": { "match_all": {} },
  "_source": ["name", "price*"]
}

九、count(计数)

1
2
3
4
5
6
7
// 满足条件的文档数量
GET /products/_count
{
  "query": {
    "term": { "brand": "华为" }
  }
}

十、搜索查询速查

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
全文搜索:
  match             — 分词后搜索(最常用)
  match_phrase      — 短语匹配(必须连续)
  multi_match       — 多字段搜索

精确搜索:
  term              — 精确匹配(keyword 字段)
  terms             — 多值匹配(IN)
  range             — 范围查询
  exists            — 字段存在

复合查询:
  bool.must         — 必须匹配(参与评分)
  bool.filter       — 必须匹配(不参与评分,更快)
  bool.should       — 加分项
  bool.must_not     — 必须不匹配

排序:sort
分页:from/size, search_after
高亮:highlight
字段过滤:_source

十一、小结

本文学习了文档操作和搜索查询:

  • 文档 CRUD(Index、Get、Update、Delete、Bulk)
  • 全文搜索(match、match_phrase、multi_match)
  • 精确搜索(term、terms、range、exists)
  • 复合查询(bool:must/filter/should/must_not)
  • 排序、分页和高亮
  • _source 过滤和计数

下一篇将学习聚合分析:指标聚合、桶聚合和实战统计。