全文索引 #

一、全文索引概述 #

1.1 什么是全文索引 #

全文索引是一种用于文本搜索的索引类型:

text
全文索引特点:
├── 支持分词搜索
├── 支持模糊匹配
├── 支持相关性排序
└── 基于Lucene引擎

1.2 全文索引优势 #

优势 说明
高效搜索 快速定位文本内容
分词支持 支持中英文分词
相关性排序 按匹配度排序结果
灵活查询 支持多种查询语法

1.3 Lucene引擎 #

OrientDB全文索引基于Apache Lucene:

text
Lucene特性:
├── 高性能全文搜索
├── 支持多种分析器
├── 支持复杂查询语法
└── 支持多语言

二、创建全文索引 #

2.1 基本语法 #

sql
CREATE INDEX <index-name> 
ON <class> (<property>) 
FULLTEXT 
ENGINE LUCENE

2.2 创建单字段全文索引 #

sql
CREATE INDEX idx_article_title ON Article (title) FULLTEXT ENGINE LUCENE
CREATE INDEX idx_product_name ON Product (name) FULLTEXT ENGINE LUCENE

2.3 创建多字段全文索引 #

sql
CREATE INDEX idx_article_content ON Article (title, content) FULLTEXT ENGINE LUCENE
CREATE INDEX idx_product_search ON Product (name, description) FULLTEXT ENGINE LUCENE

2.4 指定分析器 #

sql
CREATE INDEX idx_article_title_cn ON Article (title) 
FULLTEXT ENGINE LUCENE 
METADATA {
    "analyzer": "org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"
}

2.5 常用分析器 #

分析器 说明
StandardAnalyzer 标准分析器(默认)
SmartChineseAnalyzer 中文智能分析器
EnglishAnalyzer 英文分析器
WhitespaceAnalyzer 空格分词

三、全文搜索 #

3.1 基本搜索 #

sql
SELECT FROM Article WHERE title LUCENE 'database'
SELECT FROM Article WHERE title CONTAINSTEXT 'database'

3.2 多词搜索 #

sql
SELECT FROM Article WHERE title LUCENE 'database system'
SELECT FROM Article WHERE content LUCENE 'orientdb graph database'

3.3 短语搜索 #

sql
SELECT FROM Article WHERE title LUCENE '"graph database"'
SELECT FROM Article WHERE content LUCENE '"distributed system"'

3.4 通配符搜索 #

sql
SELECT FROM Article WHERE title LUCENE 'data*'
SELECT FROM Article WHERE title LUCENE '*base'
SELECT FROM Article WHERE title LUCENE 'dat?base'

3.5 模糊搜索 #

sql
SELECT FROM Article WHERE title LUCENE 'database~'
SELECT FROM Article WHERE title LUCENE 'database~0.8'

3.6 范围搜索 #

sql
SELECT FROM Article WHERE title LUCENE '[a TO d]'
SELECT FROM Product WHERE price LUCENE '[100 TO 500]'

四、高级搜索语法 #

4.1 布尔操作 #

sql
SELECT FROM Article WHERE title LUCENE 'database AND graph'
SELECT FROM Article WHERE title LUCENE 'database OR nosql'
SELECT FROM Article WHERE title LUCENE 'database NOT sql'
SELECT FROM Article WHERE title LUCENE '+database -sql'

4.2 字段指定 #

sql
SELECT FROM Article WHERE title LUCENE 'title:database'
SELECT FROM Article WHERE content LUCENE 'content:orientdb'

4.3 权重搜索 #

sql
SELECT FROM Article WHERE title LUCENE 'database^2 graph'
SELECT FROM Article WHERE title LUCENE 'title:database^3 content:graph'

4.4 分组搜索 #

sql
SELECT FROM Article WHERE title LUCENE '(database OR nosql) AND graph'
SELECT FROM Article WHERE title LUCENE 'database AND (graph OR document)'

4.5 转义字符 #

sql
SELECT FROM Article WHERE title LUCENE '\+database'
SELECT FROM Article WHERE title LUCENE 'C\+\+'

五、多字段搜索 #

5.1 创建多字段索引 #

sql
CREATE INDEX idx_article_full ON Article (title, content, tags) 
FULLTEXT ENGINE LUCENE

5.2 多字段搜索 #

sql
SELECT FROM Article WHERE [title, content] LUCENE 'database'
SELECT FROM Article WHERE [title, content, tags] LUCENE 'graph database'

5.3 指定字段权重 #

sql
SELECT FROM Article WHERE title LUCENE 'title:database^2 content:database'

六、中文全文搜索 #

6.1 创建中文索引 #

sql
CREATE INDEX idx_article_cn ON Article (title, content) 
FULLTEXT ENGINE LUCENE 
METADATA {
    "analyzer": "org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer",
    "index": "article_cn"
}

6.2 中文搜索 #

sql
SELECT FROM Article WHERE title LUCENE '数据库'
SELECT FROM Article WHERE content LUCENE '图数据库 技术'
SELECT FROM Article WHERE [title, content] LUCENE '分布式系统'

6.3 中文短语搜索 #

sql
SELECT FROM Article WHERE title LUCENE '"图数据库"'
SELECT FROM Article WHERE content LUCENE '"分布式系统架构"'

七、相关性排序 #

7.1 默认排序 #

全文搜索默认按相关性排序:

sql
SELECT FROM Article WHERE title LUCENE 'database'

7.2 获取相关性分数 #

sql
SELECT 
    title,
    $score AS relevance
FROM Article 
WHERE title LUCENE 'database'
ORDER BY $score DESC

7.3 自定义排序 #

sql
SELECT FROM Article 
WHERE title LUCENE 'database'
ORDER BY createdAt DESC

八、索引配置 #

8.1 配置分析器 #

sql
CREATE INDEX idx_article_custom ON Article (content) 
FULLTEXT ENGINE LUCENE 
METADATA {
    "analyzer": "org.apache.lucene.analysis.standard.StandardAnalyzer",
    "stopwords": "a, an, the, is, are",
    "minTermFreq": 1,
    "minDocFreq": 1
}

8.2 配置选项 #

选项 说明 默认值
analyzer 分析器类 StandardAnalyzer
stopwords 停用词 默认停用词表
minTermFreq 最小词频 1
minDocFreq 最小文档频率 1
maxTermFreq 最大词频 无限制

8.3 查看索引配置 #

sql
SELECT name, metadata FROM metadata:indexes WHERE name = 'idx_article_title'

九、索引维护 #

9.1 重建全文索引 #

sql
REBUILD INDEX idx_article_title
REBUILD INDEX idx_article_content

9.2 优化索引 #

sql
ALTER INDEX idx_article_title METADATA {"refresh": true}

9.3 删除全文索引 #

sql
DROP INDEX idx_article_title

十、实际应用示例 #

10.1 文章搜索 #

sql
CREATE CLASS Article EXTENDS V
CREATE PROPERTY Article.title STRING
CREATE PROPERTY Article.content STRING
CREATE PROPERTY Article.author STRING
CREATE PROPERTY Article.tags LIST OF STRING
CREATE PROPERTY Article.createdAt DATETIME

CREATE INDEX idx_article_search ON Article (title, content) 
FULLTEXT ENGINE LUCENE

SELECT 
    title,
    author,
    $score AS relevance
FROM Article
WHERE [title, content] LUCENE 'graph database'
ORDER BY $score DESC
LIMIT 10

10.2 产品搜索 #

sql
CREATE CLASS Product EXTENDS V
CREATE PROPERTY Product.name STRING
CREATE PROPERTY Product.description STRING
CREATE PROPERTY Product.category STRING
CREATE PROPERTY Product.price DECIMAL

CREATE INDEX idx_product_search ON Product (name, description) 
FULLTEXT ENGINE LUCENE

SELECT 
    name,
    category,
    price,
    $score AS relevance
FROM Product
WHERE [name, description] LUCENE 'laptop gaming'
ORDER BY $score DESC

10.3 日志搜索 #

sql
CREATE CLASS Log EXTENDS V
CREATE PROPERTY Log.level STRING
CREATE PROPERTY Log.message STRING
CREATE PROPERTY Log.source STRING
CREATE PROPERTY Log.timestamp DATETIME

CREATE INDEX idx_log_message ON Log (message) 
FULLTEXT ENGINE LUCENE

SELECT 
    level,
    message,
    source,
    timestamp
FROM Log
WHERE message LUCENE 'error AND timeout'
ORDER BY timestamp DESC
LIMIT 100

10.4 知识库搜索 #

sql
CREATE CLASS KnowledgeBase EXTENDS V
CREATE PROPERTY KnowledgeBase.title STRING
CREATE PROPERTY KnowledgeBase.content STRING
CREATE PROPERTY KnowledgeBase.category STRING
CREATE PROPERTY KnowledgeBase.keywords LIST OF STRING

CREATE INDEX idx_kb_search ON KnowledgeBase (title, content) 
FULLTEXT ENGINE LUCENE 
METADATA {
    "analyzer": "org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer"
}

SELECT 
    title,
    category,
    $score AS relevance
FROM KnowledgeBase
WHERE [title, content] LUCENE '数据库 优化'
ORDER BY $score DESC

十一、性能优化 #

11.1 索引设计 #

text
全文索引设计建议:
├── 选择合适的分析器
├── 避免索引过多字段
├── 合理设置停用词
└── 定期优化索引

11.2 查询优化 #

text
查询优化建议:
├── 使用精确短语搜索
├── 限制返回结果数量
├── 使用分页查询
└── 避免过于宽泛的搜索

11.3 监控索引 #

sql
SELECT 
    name,
    size,
    keySize,
    valueSize
FROM metadata:indexes
WHERE type = 'FULLTEXT'

十二、常见问题 #

12.1 中文搜索无结果 #

text
问题:中文搜索返回空结果
解决:使用中文分析器
CREATE INDEX ... METADATA {"analyzer": "SmartChineseAnalyzer"}

12.2 搜索结果不准确 #

text
问题:搜索结果相关性低
解决:
1. 调整分析器配置
2. 使用短语搜索
3. 调整权重

12.3 索引更新延迟 #

text
问题:新数据搜索不到
解决:
1. 等待索引刷新
2. 手动刷新索引
ALTER INDEX ... METADATA {"refresh": true}

十三、总结 #

全文索引要点:

操作 语法 说明
创建索引 CREATE INDEX … FULLTEXT 创建全文索引
基本搜索 LUCENE ‘keyword’ 关键词搜索
短语搜索 LUCENE ‘“phrase”’ 精确短语
布尔搜索 AND/OR/NOT 组合条件
中文搜索 SmartChineseAnalyzer 中文分析器

下一步,让我们学习事务处理!

最后更新:2026-03-27