Schema设计 #

一、Schema概述 #

1.1 什么是Schema #

Schema定义了索引的结构，包括：

字段定义（Field）
字段类型（FieldType）
动态字段（Dynamic Field）
复制字段（Copy Field）
唯一键（Unique Key）

1.2 Schema管理方式 #

Managed Schema（推荐）

通过API动态修改
自动保存到managed-schema文件
支持Schema API

Classic Schema

手动编辑schema.xml文件
需要重载Core生效

1.3 Schema文件位置 #

text

mycore/
└── conf/
    ├── managed-schema    # 托管Schema
    └── schema.xml        # 传统Schema（可选）

二、字段定义 #

2.1 基本字段定义 #

xml

<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="content" type="text_general" indexed="true" stored="true"/>
<field name="price" type="pdouble" indexed="true" stored="true"/>
<field name="timestamp" type="pdate" indexed="true" stored="true"/>

2.2 字段属性详解 #

属性	类型	默认值	说明
name	string	必填	字段名称
type	string	必填	字段类型
indexed	boolean	true	是否索引
stored	boolean	true	是否存储
required	boolean	false	是否必需
multiValued	boolean	false	是否多值
docValues	boolean	false	是否使用DocValues
omitNorms	boolean	false	是否忽略规范
omitTermFreqAndPositions	boolean	false	是否忽略词频和位置
omitPositions	boolean	false	是否忽略位置
termVectors	boolean	false	是否存储词向量
termPositions	boolean	false	是否存储词位置
termOffsets	boolean	false	是否存储词偏移
termPayloads	boolean	false	是否存储词载荷
large	boolean	false	是否大字段

2.3 字段属性选择指南 #

indexed vs stored

场景	indexed	stored
搜索字段	true	true
仅搜索不展示	true	false
仅展示不搜索	false	true
排序字段	true	false

docValues

xml

<!-- 排序、聚合、分组字段建议开启 -->
<field name="price" type="pdouble" indexed="true" stored="true" docValues="true"/>
<field name="category" type="string" indexed="true" stored="true" docValues="true"/>

2.4 Schema API添加字段 #

bash

# 添加字段
curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field": {
      "name": "author",
      "type": "text_general",
      "indexed": true,
      "stored": true
    }
  }'

# 批量添加字段
curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field": [
      {"name": "author", "type": "text_general"},
      {"name": "publisher", "type": "string"},
      {"name": "publish_date", "type": "pdate"}
    ]
  }'

三、动态字段 #

3.1 什么是动态字段 #

动态字段允许根据字段名模式自动匹配字段类型：

xml

<dynamicField name="*_i" type="pint" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_dt" type="pdate" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="pfloat" indexed="true" stored="true"/>
<dynamicField name="*_d" type="pdouble" indexed="true" stored="true"/>

3.2 使用示例 #

json

{
  "id": "product-001",
  "name_s": "iPhone 15",
  "price_d": 6999.00,
  "stock_i": 100,
  "in_stock_b": true,
  "created_dt": "2026-03-27T10:00:00Z",
  "description_t": "最新款iPhone"
}

3.3 Schema API添加动态字段 #

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-dynamic-field": {
      "name": "*_txt",
      "type": "text_general",
      "indexed": true,
      "stored": true,
      "multiValued": true
    }
  }'

四、复制字段 #

4.1 什么是复制字段 #

复制字段将一个或多个字段的值复制到目标字段：

xml

<copyField source="title" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="author" dest="text"/>

4.2 使用场景 #

统一搜索

xml

<!-- 定义目标字段 -->
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

<!-- 复制多个字段 -->
<copyField source="title" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="tags" dest="text"/>

拼写检查

xml

<field name="spellcheck" type="text_spell" indexed="true" stored="false" multiValued="true"/>
<copyField source="title" dest="spellcheck"/>
<copyField source="content" dest="spellcheck"/>

4.3 Schema API添加复制字段 #

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-copy-field": {
      "source": "title",
      "dest": ["text", "spellcheck"]
    }
  }'

五、字段类型 #

5.1 常用字段类型 #

字符串类型

xml

<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>

数值类型

xml

<fieldType name="pint" class="solr.IntPointField" docValues="true"/>
<fieldType name="plong" class="solr.LongPointField" docValues="true"/>
<fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
<fieldType name="pdouble" class="solr.DoublePointField" docValues="true"/>

日期类型

xml

<fieldType name="pdate" class="solr.DatePointField" docValues="true"/>

布尔类型

xml

<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

文本类型

xml

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

5.2 中文文本类型 #

IK分词器

xml

<fieldType name="text_ik" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true"/>
  </analyzer>
</fieldType>

SmartChinese分词

xml

<fieldType name="text_smartcn" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
    <filter class="solr.SmartChineseStopTokenFilterFactory"/>
  </analyzer>
</fieldType>

5.3 自定义字段类型 #

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field-type": {
      "name": "text_custom",
      "class": "solr.TextField",
      "analyzer": {
        "tokenizer": {
          "class": "solr.StandardTokenizerFactory"
        },
        "filters": [
          {"class": "solr.LowerCaseFilterFactory"},
          {"class": "solr.StopFilterFactory", "words": "stopwords.txt"}
        ]
      }
    }
  }'

六、分析器配置 #

6.1 分析器组成 #

text

输入文本
    ↓
Tokenizer（分词器）
    ↓
Token Filter（过滤器）
    ↓
Token流

6.2 常用分词器 #

分词器	说明
StandardTokenizer	标准分词器
WhitespaceTokenizer	空格分词器
KeywordTokenizer	关键词分词器（不分词）
LetterTokenizer	字母分词器
PatternTokenizer	正则分词器
PathHierarchyTokenizer	路径分词器

6.3 常用过滤器 #

过滤器	说明
LowerCaseFilter	小写转换
StopFilter	停用词过滤
SynonymFilter	同义词扩展
StemmerFilter	词干提取
RemoveDuplicatesTokenFilter	去重
ASCIIFoldingFilter	ASCII转换

6.4 完整分析器示例 #

xml

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

6.5 同义词配置 #

synonyms.txt

text

# 同义词配置
iphone,苹果手机
ipad,苹果平板
mbp,macbook pro

使用同义词

xml

<fieldType name="text_synonyms" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymGraphFilterFactory" 
            synonyms="synonyms.txt" 
            ignoreCase="true" 
            expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

6.6 停用词配置 #

stopwords.txt

text

# 停用词列表
a
an
and
are
as
at
be
but
by
for

七、唯一键 #

7.1 定义唯一键 #

xml

<uniqueKey>id</uniqueKey>

7.2 唯一键要求 #

必须是stored字段
建议使用string类型
值不能为空

7.3 自动生成ID #

xml

<field name="id" type="string" indexed="true" stored="true" required="true"/>
<updateRequestProcessorChain name="uuid">
  <processor class="solr.UUIDUpdateProcessorFactory">
    <str name="fieldName">id</str>
  </processor>
  <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

八、Schema设计最佳实践 #

8.1 字段命名规范 #

xml

<!-- 推荐 -->
<field name="product_id" type="string"/>
<field name="product_name" type="text_general"/>
<field name="product_price" type="pdouble"/>

<!-- 不推荐 -->
<field name="id" type="string"/>
<field name="name" type="text_general"/>
<field name="price" type="pdouble"/>

8.2 字段类型选择 #

数据类型	推荐字段类型
ID	string
标题	text_general
正文	text_general
分类	string
标签	string (multiValued)
价格	pdouble
数量	pint
日期	pdate
布尔	boolean

8.3 性能优化 #

减少stored字段

xml

<!-- 仅搜索不展示 -->
<field name="search_text" type="text_general" indexed="true" stored="false"/>

<!-- 仅展示不搜索 -->
<field name="description" type="string" indexed="false" stored="true"/>

使用DocValues

xml

<!-- 排序、聚合字段 -->
<field name="price" type="pdouble" indexed="true" stored="true" docValues="true"/>
<field name="category" type="string" indexed="true" stored="true" docValues="true"/>

合理使用multiValued

xml

<!-- 多值字段 -->
<field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>

九、Schema API完整示例 #

9.1 创建完整Schema #

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field-type": {
      "name": "text_cn",
      "class": "solr.TextField",
      "analyzer": {
        "tokenizer": {"class": "solr.StandardTokenizerFactory"},
        "filters": [
          {"class": "solr.LowerCaseFilterFactory"}
        ]
      }
    },
    "add-field": [
      {"name": "id", "type": "string", "indexed": true, "stored": true, "required": true},
      {"name": "title", "type": "text_cn", "indexed": true, "stored": true},
      {"name": "content", "type": "text_cn", "indexed": true, "stored": true},
      {"name": "author", "type": "string", "indexed": true, "stored": true},
      {"name": "price", "type": "pdouble", "indexed": true, "stored": true, "docValues": true},
      {"name": "category", "type": "string", "indexed": true, "stored": true, "docValues": true},
      {"name": "tags", "type": "string", "indexed": true, "stored": true, "multiValued": true},
      {"name": "publish_date", "type": "pdate", "indexed": true, "stored": true},
      {"name": "text", "type": "text_cn", "indexed": true, "stored": false, "multiValued": true}
    ],
    "add-copy-field": [
      {"source": "title", "dest": "text"},
      {"source": "content", "dest": "text"},
      {"source": "author", "dest": "text"}
    ]
  }'

9.2 查看Schema #

bash

curl "http://localhost:8983/solr/mycore/schema?wt=json"

9.3 删除字段 #

bash

curl -X POST "http://localhost:8983/solr/mycore/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "delete-field": {"name": "old_field"}
  }'

十、Schema版本管理 #

10.1 导出Schema #

bash

curl "http://localhost:8983/solr/mycore/schema?wt=json" > schema_backup.json

10.2 版本控制 #

bash

# 复制Schema到版本控制目录
cp server/solr/mycore/conf/managed-schema /path/to/repo/

# 使用Git管理
git add managed-schema
git commit -m "Update schema"

十一、总结 #

Schema设计要点：

要点	说明
字段定义	合理设置indexed、stored、docValues
动态字段	使用动态字段简化Schema
复制字段	统一搜索字段
字段类型	选择合适的字段类型
分析器	配置合适的分词器和过滤器

最佳实践：

使用Schema API管理Schema
合理设置字段属性
使用DocValues优化排序聚合
使用动态字段灵活扩展
定期备份Schema

下一步，让我们学习文档操作！