Elasticsearch文档索引 #

一、单文档索引 #

1.1 指定ID索引 #

bash
PUT /products/_doc/1
{
  "name": "iPhone 15",
  "price": 999,
  "brand": "Apple",
  "category": "Electronics",
  "in_stock": true
}

响应:

json
{
  "_index": "products",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

1.2 自动生成ID #

bash
POST /products/_doc
{
  "name": "iPhone 15",
  "price": 999,
  "brand": "Apple"
}

响应:

json
{
  "_index": "products",
  "_id": "W0tpsmIBdLvYHbW0xQqC",
  "_version": 1,
  "result": "created"
}

1.3 使用_create防止覆盖 #

bash
PUT /products/_create/1
{
  "name": "iPhone 15",
  "price": 999
}

如果ID已存在,返回409错误:

json
{
  "error": {
    "type": "version_conflict_engine_exception",
    "reason": "[1]: version conflict, document already exists"
  },
  "status": 409
}

1.4 使用op_type参数 #

bash
PUT /products/_doc/1?op_type=create
{
  "name": "iPhone 15",
  "price": 999
}

二、索引参数 #

2.1 常用参数 #

参数 说明 默认值
op_type 操作类型(create/index) index
refresh 是否立即刷新 false
routing 路由值 文档ID
timeout 超时时间 1m
version 版本号 -
version_type 版本类型 internal
pipeline 预处理管道 -

2.2 refresh参数 #

bash
PUT /products/_doc/1?refresh=true
{
  "name": "iPhone 15"
}

PUT /products/_doc/1?refresh=wait_for
{
  "name": "iPhone 15"
}
说明
true 立即刷新,可搜索
false 不刷新(默认)
wait_for 等待刷新完成

2.3 routing参数 #

bash
PUT /products/_doc/1?routing=apple
{
  "name": "iPhone 15",
  "brand": "Apple"
}

查询时需要指定相同的routing:

bash
GET /products/_doc/1?routing=apple

2.4 timeout参数 #

bash
PUT /products/_doc/1?timeout=5s
{
  "name": "iPhone 15"
}

三、批量索引 #

3.1 Bulk API格式 #

text
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n

3.2 批量索引示例 #

bash
POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "iPhone 15", "price": 999, "brand": "Apple"}
{"index": {"_index": "products", "_id": "2"}}
{"name": "MacBook Pro", "price": 1999, "brand": "Apple"}
{"index": {"_index": "products", "_id": "3"}}
{"name": "iPad Pro", "price": 799, "brand": "Apple"}

3.3 指定索引批量操作 #

bash
POST /products/_bulk
{"index": {"_id": "1"}}
{"name": "iPhone 15", "price": 999}
{"index": {"_id": "2"}}
{"name": "MacBook Pro", "price": 1999}
{"create": {"_id": "3"}}
{"name": "iPad Pro", "price": 799}

3.4 批量响应 #

json
{
  "took": 30,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "products",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "status": 201
      }
    },
    {
      "index": {
        "_index": "products",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "status": 201
      }
    }
  ]
}

3.5 批量操作类型 #

操作 说明
index 创建或更新文档
create 仅创建,存在则失败
update 更新文档
delete 删除文档

混合操作示例:

bash
POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "iPhone 15", "price": 999}
{"update": {"_index": "products", "_id": "2"}}
{"doc": {"price": 899}}
{"delete": {"_index": "products", "_id": "3"}}

四、批量性能优化 #

4.1 批量大小建议 #

text
批量大小建议
├── 文档大小
│   └── 单批5-15MB
├── 文档数量
│   └── 单批1000-5000个
└── 并发请求
    └── 根据集群能力调整

4.2 索引设置优化 #

bash
PUT /products/_settings
{
  "index": {
    "refresh_interval": "-1",
    "number_of_replicas": 0
  }
}

批量导入完成后恢复:

bash
PUT /products/_settings
{
  "index": {
    "refresh_interval": "1s",
    "number_of_replicas": 1
  }
}

POST /products/_refresh

4.3 使用管道预处理 #

bash
PUT /_ingest/pipeline/product_pipeline
{
  "processors": [
    {
      "set": {
        "field": "processed_at",
        "value": "{{_ingest.timestamp}}"
      }
    },
    {
      "uppercase": {
        "field": "brand"
      }
    }
  ]
}

POST /products/_doc/1?pipeline=product_pipeline
{
  "name": "iPhone 15",
  "brand": "apple"
}

五、路由控制 #

5.1 路由原理 #

text
路由公式
shard_num = hash(routing_value) % num_primary_shards

默认routing = _id

5.2 自定义路由 #

bash
PUT /products/_doc/1?routing=apple
{
  "name": "iPhone 15",
  "brand": "Apple"
}

5.3 必需路由 #

bash
PUT /products
{
  "mappings": {
    "_routing": {
      "required": true
    }
  }
}

5.4 路由最佳实践 #

text
路由策略
├── 按用户路由
│   └── 同一用户数据在同一分片
├── 按时间路由
│   └── 同一天数据在同一分片
└── 按业务路由
    └── 同一业务数据在同一分片

六、版本控制 #

6.1 内部版本控制 #

bash
PUT /products/_doc/1?version=2
{
  "name": "iPhone 15",
  "price": 899
}

6.2 外部版本控制 #

bash
PUT /products/_doc/1?version=5&version_type=external
{
  "name": "iPhone 15",
  "price": 899
}

6.3 版本类型 #

类型 说明
internal 内部版本号(默认)
external 外部版本号
external_gte 外部版本号,大于等于

6.4 if_seq_no和if_primary_term #

bash
PUT /products/_doc/1?if_seq_no=0&if_primary_term=1
{
  "name": "iPhone 15",
  "price": 899
}

七、索引生命周期 #

7.1 文档TTL #

bash
PUT /logs
{
  "mappings": {
    "_ttl": {
      "enabled": true,
      "default": "7d"
    }
  }
}

注意:TTL已废弃,建议使用ILM。

7.2 使用ILM #

bash
PUT /_ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

八、索引冲突处理 #

8.1 重试策略 #

bash
PUT /products/_doc/1?retry_on_conflict=3
{
  "name": "iPhone 15",
  "price": 899
}

8.2 乐观并发控制 #

bash
GET /products/_doc/1

PUT /products/_doc/1?if_seq_no=0&if_primary_term=1
{
  "name": "iPhone 15",
  "price": 899
}

九、索引监控 #

9.1 索引统计 #

bash
GET /products/_stats

9.2 索引速度监控 #

bash
GET /_nodes/stats/indices/indexing

9.3 慢索引日志 #

bash
PUT /products/_settings
{
  "index.indexing.slowlog.threshold.index.warn": "10s",
  "index.indexing.slowlog.threshold.index.info": "5s",
  "index.indexing.slowlog.threshold.index.debug": "2s"
}

十、索引错误处理 #

10.1 常见错误 #

错误 原因 解决方案
version_conflict_engine_exception 版本冲突 使用正确的版本号
mapper_parsing_exception 字段类型错误 检查映射配置
illegal_argument_exception 参数错误 检查请求参数
index_not_found_exception 索引不存在 先创建索引

10.2 错误响应示例 #

json
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "object mapping for [price] tried to parse field [price] as object, but found a concrete value"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "object mapping for [price] tried to parse field [price] as object, but found a concrete value"
  },
  "status": 400
}

十一、索引最佳实践 #

11.1 批量索引流程 #

text
批量索引流程
├── 1. 准备阶段
│   ├── 禁用refresh
│   └── 设置副本为0
├── 2. 索引阶段
│   ├── 合理批量大小
│   ├── 多线程并发
│   └── 监控进度
└── 3. 完成阶段
    ├── 恢复refresh
    ├── 恢复副本
    └── 执行refresh

11.2 索引性能建议 #

text
性能优化建议
├── 批量大小
│   └── 5-15MB
├── 并发请求
│   └── 根据节点数调整
├── 刷新间隔
│   └── 批量时禁用
├── 副本数量
│   └── 批量时设为0
└── 硬件优化
    └── SSD存储

十二、总结 #

本章介绍了Elasticsearch文档索引:

  1. 单文档索引支持指定ID和自动生成ID
  2. Bulk API支持高效的批量操作
  3. routing控制文档分布
  4. 版本控制实现并发安全
  5. 批量索引需要优化配置
  6. 合理的错误处理和监控

下一步,我们将学习文档更新操作。

最后更新:2026-03-27