Elasticsearch文档索引 #
一、单文档索引 #
1.1 指定ID索引 #
bash
PUT /products/_doc/1
{
"name": "iPhone 15",
"price": 999,
"brand": "Apple",
"category": "Electronics",
"in_stock": true
}
响应:
json
{
"_index": "products",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
1.2 自动生成ID #
bash
POST /products/_doc
{
"name": "iPhone 15",
"price": 999,
"brand": "Apple"
}
响应:
json
{
"_index": "products",
"_id": "W0tpsmIBdLvYHbW0xQqC",
"_version": 1,
"result": "created"
}
1.3 使用_create防止覆盖 #
bash
PUT /products/_create/1
{
"name": "iPhone 15",
"price": 999
}
如果ID已存在,返回409错误:
json
{
"error": {
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, document already exists"
},
"status": 409
}
1.4 使用op_type参数 #
bash
PUT /products/_doc/1?op_type=create
{
"name": "iPhone 15",
"price": 999
}
二、索引参数 #
2.1 常用参数 #
| 参数 | 说明 | 默认值 |
|---|---|---|
| op_type | 操作类型(create/index) | index |
| refresh | 是否立即刷新 | false |
| routing | 路由值 | 文档ID |
| timeout | 超时时间 | 1m |
| version | 版本号 | - |
| version_type | 版本类型 | internal |
| pipeline | 预处理管道 | - |
2.2 refresh参数 #
bash
PUT /products/_doc/1?refresh=true
{
"name": "iPhone 15"
}
PUT /products/_doc/1?refresh=wait_for
{
"name": "iPhone 15"
}
| 值 | 说明 |
|---|---|
| true | 立即刷新,可搜索 |
| false | 不刷新(默认) |
| wait_for | 等待刷新完成 |
2.3 routing参数 #
bash
PUT /products/_doc/1?routing=apple
{
"name": "iPhone 15",
"brand": "Apple"
}
查询时需要指定相同的routing:
bash
GET /products/_doc/1?routing=apple
2.4 timeout参数 #
bash
PUT /products/_doc/1?timeout=5s
{
"name": "iPhone 15"
}
三、批量索引 #
3.1 Bulk API格式 #
text
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
3.2 批量索引示例 #
bash
POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "iPhone 15", "price": 999, "brand": "Apple"}
{"index": {"_index": "products", "_id": "2"}}
{"name": "MacBook Pro", "price": 1999, "brand": "Apple"}
{"index": {"_index": "products", "_id": "3"}}
{"name": "iPad Pro", "price": 799, "brand": "Apple"}
3.3 指定索引批量操作 #
bash
POST /products/_bulk
{"index": {"_id": "1"}}
{"name": "iPhone 15", "price": 999}
{"index": {"_id": "2"}}
{"name": "MacBook Pro", "price": 1999}
{"create": {"_id": "3"}}
{"name": "iPad Pro", "price": 799}
3.4 批量响应 #
json
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "products",
"_id": "1",
"_version": 1,
"result": "created",
"status": 201
}
},
{
"index": {
"_index": "products",
"_id": "2",
"_version": 1,
"result": "created",
"status": 201
}
}
]
}
3.5 批量操作类型 #
| 操作 | 说明 |
|---|---|
| index | 创建或更新文档 |
| create | 仅创建,存在则失败 |
| update | 更新文档 |
| delete | 删除文档 |
混合操作示例:
bash
POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "iPhone 15", "price": 999}
{"update": {"_index": "products", "_id": "2"}}
{"doc": {"price": 899}}
{"delete": {"_index": "products", "_id": "3"}}
四、批量性能优化 #
4.1 批量大小建议 #
text
批量大小建议
├── 文档大小
│ └── 单批5-15MB
├── 文档数量
│ └── 单批1000-5000个
└── 并发请求
└── 根据集群能力调整
4.2 索引设置优化 #
bash
PUT /products/_settings
{
"index": {
"refresh_interval": "-1",
"number_of_replicas": 0
}
}
批量导入完成后恢复:
bash
PUT /products/_settings
{
"index": {
"refresh_interval": "1s",
"number_of_replicas": 1
}
}
POST /products/_refresh
4.3 使用管道预处理 #
bash
PUT /_ingest/pipeline/product_pipeline
{
"processors": [
{
"set": {
"field": "processed_at",
"value": "{{_ingest.timestamp}}"
}
},
{
"uppercase": {
"field": "brand"
}
}
]
}
POST /products/_doc/1?pipeline=product_pipeline
{
"name": "iPhone 15",
"brand": "apple"
}
五、路由控制 #
5.1 路由原理 #
text
路由公式
shard_num = hash(routing_value) % num_primary_shards
默认routing = _id
5.2 自定义路由 #
bash
PUT /products/_doc/1?routing=apple
{
"name": "iPhone 15",
"brand": "Apple"
}
5.3 必需路由 #
bash
PUT /products
{
"mappings": {
"_routing": {
"required": true
}
}
}
5.4 路由最佳实践 #
text
路由策略
├── 按用户路由
│ └── 同一用户数据在同一分片
├── 按时间路由
│ └── 同一天数据在同一分片
└── 按业务路由
└── 同一业务数据在同一分片
六、版本控制 #
6.1 内部版本控制 #
bash
PUT /products/_doc/1?version=2
{
"name": "iPhone 15",
"price": 899
}
6.2 外部版本控制 #
bash
PUT /products/_doc/1?version=5&version_type=external
{
"name": "iPhone 15",
"price": 899
}
6.3 版本类型 #
| 类型 | 说明 |
|---|---|
| internal | 内部版本号(默认) |
| external | 外部版本号 |
| external_gte | 外部版本号,大于等于 |
6.4 if_seq_no和if_primary_term #
bash
PUT /products/_doc/1?if_seq_no=0&if_primary_term=1
{
"name": "iPhone 15",
"price": 899
}
七、索引生命周期 #
7.1 文档TTL #
bash
PUT /logs
{
"mappings": {
"_ttl": {
"enabled": true,
"default": "7d"
}
}
}
注意:TTL已废弃,建议使用ILM。
7.2 使用ILM #
bash
PUT /_ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "7d"
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
八、索引冲突处理 #
8.1 重试策略 #
bash
PUT /products/_doc/1?retry_on_conflict=3
{
"name": "iPhone 15",
"price": 899
}
8.2 乐观并发控制 #
bash
GET /products/_doc/1
PUT /products/_doc/1?if_seq_no=0&if_primary_term=1
{
"name": "iPhone 15",
"price": 899
}
九、索引监控 #
9.1 索引统计 #
bash
GET /products/_stats
9.2 索引速度监控 #
bash
GET /_nodes/stats/indices/indexing
9.3 慢索引日志 #
bash
PUT /products/_settings
{
"index.indexing.slowlog.threshold.index.warn": "10s",
"index.indexing.slowlog.threshold.index.info": "5s",
"index.indexing.slowlog.threshold.index.debug": "2s"
}
十、索引错误处理 #
10.1 常见错误 #
| 错误 | 原因 | 解决方案 |
|---|---|---|
| version_conflict_engine_exception | 版本冲突 | 使用正确的版本号 |
| mapper_parsing_exception | 字段类型错误 | 检查映射配置 |
| illegal_argument_exception | 参数错误 | 检查请求参数 |
| index_not_found_exception | 索引不存在 | 先创建索引 |
10.2 错误响应示例 #
json
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "object mapping for [price] tried to parse field [price] as object, but found a concrete value"
}
],
"type": "mapper_parsing_exception",
"reason": "object mapping for [price] tried to parse field [price] as object, but found a concrete value"
},
"status": 400
}
十一、索引最佳实践 #
11.1 批量索引流程 #
text
批量索引流程
├── 1. 准备阶段
│ ├── 禁用refresh
│ └── 设置副本为0
├── 2. 索引阶段
│ ├── 合理批量大小
│ ├── 多线程并发
│ └── 监控进度
└── 3. 完成阶段
├── 恢复refresh
├── 恢复副本
└── 执行refresh
11.2 索引性能建议 #
text
性能优化建议
├── 批量大小
│ └── 5-15MB
├── 并发请求
│ └── 根据节点数调整
├── 刷新间隔
│ └── 批量时禁用
├── 副本数量
│ └── 批量时设为0
└── 硬件优化
└── SSD存储
十二、总结 #
本章介绍了Elasticsearch文档索引:
- 单文档索引支持指定ID和自动生成ID
- Bulk API支持高效的批量操作
- routing控制文档分布
- 版本控制实现并发安全
- 批量索引需要优化配置
- 合理的错误处理和监控
下一步,我们将学习文档更新操作。
最后更新:2026-03-27