索引与性能 #
本章深入介绍 Qdrant 的索引机制和性能优化策略。
索引概述 #
text
Qdrant 索引架构:
┌─────────────────────────────────────────────────────────────┐
│ 索引层次 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 向量索引 Payload 索引 其他索引 │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ HNSW │ │ Keyword │ │ ID 索引 │ │
│ │ 量化 │ │ Integer │ │ 段索引 │ │
│ │ 稀疏 │ │ Float │ │ │ │
│ └─────────────┘ │ Geo │ │ │ │
│ │ Text │ │ │ │
│ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
HNSW 索引 #
HNSW 原理 #
text
HNSW(Hierarchical Navigable Small World):
Layer 2 (稀疏层)
○────────────────────────○
│ │
Layer 1 (中间层)
○───○─────────○────○───○
│ │ │ │ │
Layer 0 (密集层)
○───○───○───○───○───○───○───○───○
搜索过程:
1. 从顶层入口点开始
2. 贪心搜索找到最近邻
3. 逐层向下细化
4. 底层返回最终结果
HNSW 参数配置 #
python
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, HnswConfigDiff
)
client = QdrantClient(":memory:")
client.create_collection(
collection_name="hnsw_demo",
vectors_config=VectorParams(
size=384,
distance=Distance.COSINE
),
hnsw_config=HnswConfigDiff(
m=16,
ef_construct=100,
full_scan_threshold=10000,
max_indexing_threads=2,
on_disk=False,
payload_m=8
)
)
参数详解 #
| 参数 | 说明 | 默认值 | 推荐范围 |
|---|---|---|---|
| m | 每个节点的连接数 | 16 | 8-64 |
| ef_construct | 构建时搜索范围 | 100 | 50-400 |
| full_scan_threshold | 全扫描阈值 | 10000 | 根据数据量 |
| max_indexing_threads | 索引线程数 | 自动 | CPU 核心数 |
| on_disk | 索引是否存磁盘 | False | 大数据集 True |
| payload_m | Payload 索引连接数 | m/2 | m/2-m |
参数调优指南 #
text
场景化参数建议:
高精度场景(召回率 > 98%):
├── m: 32-64
├── ef_construct: 200-400
└── 内存占用高,构建慢
平衡场景(召回率 ~95%):
├── m: 16
├── ef_construct: 100
└── 推荐默认值
高性能场景(召回率 ~90%):
├── m: 8-12
├── ef_construct: 50-80
└── 内存占用低,构建快
内存受限场景:
├── m: 8
├── ef_construct: 50
├── on_disk: True
└── 配合量化使用
搜索时参数 #
python
from qdrant_client.models import SearchParams
results = client.search(
collection_name="hnsw_demo",
query_vector=[0.1] * 384,
search_params=SearchParams(
hnsw_ef=128,
exact=False
),
limit=10
)
| 参数 | 说明 |
|---|---|
| hnsw_ef | 搜索时的 ef 值,越大越精确 |
| exact | 是否使用精确搜索(暴力搜索) |
精确搜索 #
python
results = client.search(
collection_name="hnsw_demo",
query_vector=[0.1] * 384,
search_params=SearchParams(
exact=True
),
limit=10
)
向量量化 #
量化可以显著减少内存占用,以少量精度损失换取存储效率。
标量量化(Scalar Quantization) #
python
from qdrant_client.models import (
ScalarQuantization,
ScalarQuantizationConfig,
ScalarType
)
client.create_collection(
collection_name="scalar_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99,
always_ram=True
)
)
)
text
标量量化效果:
原始向量:float32(4 字节/维度)
量化后:int8(1 字节/维度)
内存节省:75%
精度损失:1-3%
乘积量化(Product Quantization) #
python
from qdrant_client.models import (
ProductQuantization,
ProductQuantizationConfig
)
client.create_collection(
collection_name="pq_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ProductQuantization(
product=ProductQuantizationConfig(
compression_ratio=16,
always_ram=True
)
)
)
text
乘积量化效果:
压缩比 16:
├── 内存节省:93.75%
├── 精度损失:5-10%
└── 适合超大规模数据
压缩比 8:
├── 内存节省:87.5%
├── 精度损失:3-5%
└── 推荐平衡选择
压缩比 4:
├── 内存节省:75%
├── 精度损失:1-3%
└── 高精度场景
二进制量化(Binary Quantization) #
python
from qdrant_client.models import (
BinaryQuantization,
BinaryQuantizationConfig
)
client.create_collection(
collection_name="binary_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=BinaryQuantization(
binary=BinaryQuantizationConfig(
always_ram=True
)
)
)
text
二进制量化效果:
内存节省:96.875%(32 倍)
精度损失:10-20%
适用场景:初步筛选 + 重排序
量化重排序 #
python
results = client.search(
collection_name="scalar_quantized",
query_vector=[0.1] * 384,
search_params=SearchParams(
quantization={
"ignore": False,
"rescore": True,
"oversampling": 2.0
}
),
limit=10
)
| 参数 | 说明 |
|---|---|
| ignore | 是否忽略量化 |
| rescore | 是否使用原始向量重排序 |
| oversampling | 过采样倍数 |
量化对比 #
| 量化类型 | 内存节省 | 精度损失 | 速度 | 适用场景 |
|---|---|---|---|---|
| 无量化 | 0% | 0% | 快 | 小数据集 |
| 标量量化 | 75% | 1-3% | 快 | 通用场景 |
| 乘积量化 | 87-94% | 3-10% | 中 | 大数据集 |
| 二进制量化 | 97% | 10-20% | 极快 | 初筛场景 |
内存管理 #
内存估算 #
python
def estimate_memory(points_count, vector_size, quantization=None):
bytes_per_float = 4
if quantization == "scalar":
bytes_per_dim = 1
elif quantization == "binary":
bytes_per_dim = 1 / 8
elif quantization == "pq16":
bytes_per_dim = bytes_per_float / 16
else:
bytes_per_dim = bytes_per_float
vector_memory = points_count * vector_size * bytes_per_dim
hnsw_overhead = 1.3
total_memory = vector_memory * hnsw_overhead
return {
"vector_memory_mb": vector_memory / (1024 * 1024),
"total_memory_mb": total_memory / (1024 * 1024),
"total_memory_gb": total_memory / (1024 * 1024 * 1024)
}
estimate = estimate_memory(1_000_000, 384, "scalar")
print(f"预计内存: {estimate['total_memory_gb']:.2f} GB")
磁盘存储配置 #
python
client.create_collection(
collection_name="disk_storage",
vectors_config=VectorParams(
size=384,
distance=Distance.COSINE,
on_disk=True
),
hnsw_config=HnswConfigDiff(
on_disk=True
),
on_disk_payload=True
)
内存映射(mmap) #
python
from qdrant_client.models import OptimizersConfigDiff
client.create_collection(
collection_name="mmap_config",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
optimizer_config=OptimizersConfigDiff(
memmap_threshold_kb=50000
)
)
性能优化策略 #
批量操作优化 #
python
import numpy as np
from concurrent.futures import ThreadPoolExecutor
def batch_insert_optimized(collection_name, points, batch_size=100, workers=4):
def insert_batch(batch_points):
client.upsert(
collection_name=collection_name,
points=batch_points,
wait=False
)
batches = [
points[i:i + batch_size]
for i in range(0, len(points), batch_size)
]
with ThreadPoolExecutor(max_workers=workers) as executor:
list(executor.map(insert_batch, batches))
print(f"批量插入 {len(points)} 个点完成")
搜索性能优化 #
python
from qdrant_client.models import SearchParams
def optimized_search(collection_name, query_vector, limit=10):
results = client.search(
collection_name=collection_name,
query_vector=query_vector,
search_params=SearchParams(
hnsw_ef=max(64, limit * 4)
),
limit=limit,
with_vectors=False,
with_payload=["title", "category"]
)
return results
并行搜索 #
python
from concurrent.futures import ThreadPoolExecutor
def parallel_search(collection_name, query_vectors, limit=5, workers=4):
def search_one(query_vector):
return client.search(
collection_name=collection_name,
query_vector=query_vector,
limit=limit
)
with ThreadPoolExecutor(max_workers=workers) as executor:
results = list(executor.map(search_one, query_vectors))
return results
索引优化 #
python
client.update_collection(
collection_name="hnsw_demo",
optimizer_config=OptimizersConfigDiff(
indexing_threshold_kb=10000,
max_optimization_threads=4
)
)
性能监控 #
查询延迟监控 #
python
import time
import statistics
def measure_search_latency(collection_name, query_vectors, iterations=100):
latencies = []
for query_vector in query_vectors[:iterations]:
start = time.time()
client.search(
collection_name=collection_name,
query_vector=query_vector,
limit=10
)
latency = (time.time() - start) * 1000
latencies.append(latency)
return {
"mean_ms": statistics.mean(latencies),
"median_ms": statistics.median(latencies),
"p95_ms": sorted(latencies)[int(len(latencies) * 0.95)],
"p99_ms": sorted(latencies)[int(len(latencies) * 0.99)]
}
stats = measure_search_latency("hnsw_demo", [[0.1] * 384] * 100)
print(f"P99 延迟: {stats['p99_ms']:.2f}ms")
吞吐量测试 #
python
import time
from concurrent.futures import ThreadPoolExecutor
def throughput_test(collection_name, query_vectors, duration_sec=10, workers=8):
start_time = time.time()
query_count = 0
def search_worker():
nonlocal query_count
while time.time() - start_time < duration_sec:
for qv in query_vectors:
client.search(
collection_name=collection_name,
query_vector=qv,
limit=10
)
query_count += 1
if time.time() - start_time >= duration_sec:
break
with ThreadPoolExecutor(max_workers=workers) as executor:
executor.map(lambda _: search_worker(), range(workers))
qps = query_count / duration_sec
return {"qps": qps, "total_queries": query_count}
result = throughput_test("hnsw_demo", [[0.1] * 384] * 10)
print(f"QPS: {result['qps']:.0f}")
性能基准 #
典型性能指标 #
text
性能基准参考:
数据规模:100 万向量,384 维
无量化:
├── 内存:~2 GB
├── P99 延迟:< 10ms
└── QPS:> 10,000
标量量化:
├── 内存:~0.5 GB
├── P99 延迟:< 8ms
└── QPS:> 15,000
乘积量化(16x):
├── 内存:~0.15 GB
├── P99 延迟:< 15ms
└── QPS:> 8,000
硬件建议 #
text
硬件配置建议:
小规模(< 100 万向量):
├── CPU:4 核
├── 内存:8 GB
└── 存储:SSD 50 GB
中等规模(100-1000 万向量):
├── CPU:8 核
├── 内存:32 GB
└── 存储:SSD 200 GB
大规模(> 1000 万向量):
├── CPU:16+ 核
├── 内存:64+ GB
└── 存储:NVMe SSD 1 TB+
小结 #
本章详细介绍了索引与性能优化:
- HNSW 索引原理和参数调优
- 向量量化技术
- 内存管理策略
- 性能优化技巧
- 性能监控方法
下一步 #
掌握性能优化后,继续学习 分布式部署,了解如何构建高可用集群!
最后更新:2026-04-04