Solr核心概念 #

一、Core（核心） #

1.1 什么是Core #

Core是Solr的基本运行单元，类似于关系数据库中的"数据库"概念。每个Core包含：

索引数据（Lucene索引）
配置文件（solrconfig.xml、schema.xml）
数据目录

text

Solr Server
    ├── Core 1 (商品索引)
    │   ├── conf/
    │   └── data/
    ├── Core 2 (用户索引)
    │   ├── conf/
    │   └── data/
    └── Core 3 (日志索引)
        ├── conf/
        └── data/

1.2 Core管理 #

创建Core

bash

# 使用命令行创建
bin/solr create_core -c mycore

# 指定配置集
bin/solr create_core -c mycore -d _default

# 指定配置目录
bin/solr create_core -c mycore -d /path/to/config

删除Core

bash

bin/solr delete -c mycore

查看Core状态

bash

# 命令行
bin/solr status

# API方式
curl "http://localhost:8983/solr/admin/cores?action=STATUS"

1.3 core.properties #

properties

name=mycore
config=solrconfig.xml
schema=managed-schema
dataDir=data

二、Document（文档） #

2.1 什么是Document #

Document是Solr索引的基本单位，类似于关系数据库中的"行"。每个Document由多个Field组成。

json

{
  "id": "book-001",
  "title": "Solr实战指南",
  "author": "张三",
  "price": 99.00,
  "category": ["技术", "搜索"],
  "publish_date": "2026-03-27T00:00:00Z"
}

2.2 Document特点 #

每个Document必须有唯一标识（uniqueKey）
Document的结构可以灵活变化
Document可以包含多值字段

2.3 Document操作 #

添加Document

bash

curl -X POST "http://localhost:8983/solr/mycore/update/json/docs" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "book-001",
    "title": "Solr实战指南",
    "author": "张三"
  }'

批量添加

json

[
  {"id": "book-001", "title": "Solr实战指南"},
  {"id": "book-002", "title": "Elasticsearch权威指南"},
  {"id": "book-003", "title": "Lucene实战"}
]

三、Field（字段） #

3.1 什么是Field #

Field是Document的组成部分，类似于关系数据库中的"列"。每个Field有名称、值和类型。

xml

<field name="title" type="text_general" indexed="true" stored="true"/>

3.2 Field属性 #

属性	说明	默认值
name	字段名称	必填
type	字段类型	必填
indexed	是否索引	true
stored	是否存储	true
required	是否必需	false
multiValued	是否多值	false
docValues	是否使用DocValues	false
omitNorms	是否忽略规范	false

3.3 字段类型示例 #

xml

<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="content" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="price" type="pdouble" indexed="true" stored="true"/>
<field name="timestamp" type="pdate" indexed="true" stored="true"/>
<field name="popularity" type="pint" indexed="true" stored="true"/>

3.4 Dynamic Field（动态字段） #

xml

<dynamicField name="*_i" type="pint" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="pdate" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>

使用示例：

json

{
  "id": "product-001",
  "name_s": "iPhone 15",
  "price_i": 6999,
  "in_stock_b": true,
  "created_dt": "2026-03-27T10:00:00Z"
}

3.5 Copy Field（复制字段） #

xml

<copyField source="title" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="author" dest="text"/>

用于将多个字段复制到一个字段，便于统一搜索。

四、Schema（模式） #

4.1 什么是Schema #

Schema定义了Document的结构，包括字段定义、字段类型、分析器等。

4.2 Schema管理方式 #

Managed Schema（推荐）

通过API动态修改
自动保存到managed-schema文件
支持Schema API

Classic Schema

手动编辑schema.xml文件
需要重启Core生效

4.3 Schema API #

添加字段

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field": {
      "name": "author",
      "type": "text_general",
      "indexed": true,
      "stored": true
    }
  }'

添加动态字段

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-dynamic-field": {
      "name": "*_txt",
      "type": "text_general",
      "indexed": true,
      "stored": true
    }
  }'

添加复制字段

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-copy-field": {
      "source": "title",
      "dest": "text"
    }
  }'

五、FieldType（字段类型） #

5.1 常用字段类型 #

类型	说明	示例
string	字符串（不分词）	“hello world”
text_general	通用文本（分词）	“hello world” → [“hello”, “world”]
pint	整数	123
plong	长整数	1234567890
pdouble	双精度浮点	3.14159
pdate	日期	“2026-03-27T10:00:00Z”
boolean	布尔值	true/false
binary	二进制数据	Base64编码

5.2 字段类型定义 #

xml

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

六、Analyzer（分析器） #

6.1 分析器组成 #

Analyzer由Tokenizer和Filter组成：

text

输入文本
    ↓
Tokenizer（分词器）
    ↓
Token Filter（过滤器）
    ↓
Token流

6.2 分析过程示例 #

text

"Hello World!"
    ↓
StandardTokenizer
["Hello", "World"]
    ↓
LowerCaseFilter
["hello", "world"]
    ↓
StopFilter
["hello", "world"]

6.3 索引与查询分析器 #

xml

<fieldType name="text_general" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"/>
  </analyzer>
</fieldType>

七、Index（索引） #

7.1 倒排索引 #

Solr使用倒排索引实现快速搜索：

text

Document:
- Doc1: "Solr is a search engine"
- Doc2: "Elasticsearch is also a search engine"
- Doc3: "Solr and Elasticsearch are similar"

倒排索引:
┌─────────────┬─────────────────┐
│    Term     │   Document IDs  │
├─────────────┼─────────────────┤
│ solr        │ [1, 3]          │
│ search      │ [1, 2]          │
│ engine      │ [1, 2]          │
│ elasticsearch│ [2, 3]         │
│ similar     │ [3]             │
└─────────────┴─────────────────┘

7.2 DocValues #

DocValues是一种列式存储，用于排序、聚合、分组等操作：

xml

<field name="price" type="pdouble" indexed="true" stored="true" docValues="true"/>

八、Request Handler（请求处理器） #

8.1 搜索请求处理器 #

xml

<requestHandler name="/select" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <int name="rows">10</int>
    <str name="df">text</str>
  </lst>
</requestHandler>

8.2 更新请求处理器 #

xml

<requestHandler name="/update" class="solr.UpdateRequestHandler"/>
<requestHandler name="/update/json" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="stream.contentType">application/json</str>
  </lst>
</requestHandler>

8.3 常用请求处理器 #

路径	功能
/select	搜索查询
/update	文档更新
/get	实时获取
/browse	浏览界面
/analysis/field	字段分析
/admin/ping	健康检查

九、Search Component（搜索组件） #

9.1 搜索组件列表 #

xml

<requestHandler name="/select" class="solr.SearchHandler">
  <arr name="components">
    <str>query</str>
    <str>facet</str>
    <str>mlt</str>
    <str>highlight</str>
    <str>stats</str>
    <str>debug</str>
  </arr>
</requestHandler>

9.2 常用组件 #

组件	说明
query	基础查询
facet	分面统计
mlt	相似文档
highlight	高亮显示
stats	统计信息
spellcheck	拼写检查
suggest	自动建议

十、Collection（集合） #

10.1 什么是Collection #

在SolrCloud模式下，Collection是分布式索引的逻辑单元，由多个Shard组成。

text

Collection: products
├── Shard 1
│   ├── Replica 1 (Leader)
│   └── Replica 2
├── Shard 2
│   ├── Replica 1 (Leader)
│   └── Replica 2
└── Shard 3
    ├── Replica 1 (Leader)
    └── Replica 2

10.2 Collection操作 #

bash

# 创建Collection
bin/solr create_collection -c mycollection -shards 3 -replicationFactor 2

# 删除Collection
bin/solr delete -c mycollection

十一、总结 #

核心概念对照表：

Solr概念	类比数据库概念
Core/Collection	Database
Document	Row
Field	Column
Schema	Table Schema
Index	Index
Query	SQL Query

学习建议：

理解Core和Document的关系
掌握Schema设计原则
了解分析器工作原理
熟悉常用请求处理器

下一步，让我们学习Solr的基础语法！