索引与性能 #

本章深入介绍 Qdrant 的索引机制和性能优化策略。

索引概述 #

text
Qdrant 索引架构:

┌─────────────────────────────────────────────────────────────┐
│                      索引层次                                │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  向量索引              Payload 索引          其他索引        │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐ │
│  │   HNSW      │      │  Keyword    │      │   ID 索引   │ │
│  │   量化      │      │  Integer    │      │   段索引    │ │
│  │   稀疏      │      │  Float      │      │             │ │
│  └─────────────┘      │  Geo        │      │             │ │
│                       │  Text       │      │             │ │
│                       └─────────────┘      └─────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘

HNSW 索引 #

HNSW 原理 #

text
HNSW(Hierarchical Navigable Small World):

Layer 2 (稀疏层)
    ○────────────────────────○
         │                  │
Layer 1 (中间层)
    ○───○─────────○────○───○
     │   │         │    │   │
Layer 0 (密集层)
○───○───○───○───○───○───○───○───○

搜索过程:
1. 从顶层入口点开始
2. 贪心搜索找到最近邻
3. 逐层向下细化
4. 底层返回最终结果

HNSW 参数配置 #

python
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, HnswConfigDiff
)

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="hnsw_demo",
    vectors_config=VectorParams(
        size=384,
        distance=Distance.COSINE
    ),
    hnsw_config=HnswConfigDiff(
        m=16,
        ef_construct=100,
        full_scan_threshold=10000,
        max_indexing_threads=2,
        on_disk=False,
        payload_m=8
    )
)

参数详解 #

参数 说明 默认值 推荐范围
m 每个节点的连接数 16 8-64
ef_construct 构建时搜索范围 100 50-400
full_scan_threshold 全扫描阈值 10000 根据数据量
max_indexing_threads 索引线程数 自动 CPU 核心数
on_disk 索引是否存磁盘 False 大数据集 True
payload_m Payload 索引连接数 m/2 m/2-m

参数调优指南 #

text
场景化参数建议:

高精度场景(召回率 > 98%):
├── m: 32-64
├── ef_construct: 200-400
└── 内存占用高,构建慢

平衡场景(召回率 ~95%):
├── m: 16
├── ef_construct: 100
└── 推荐默认值

高性能场景(召回率 ~90%):
├── m: 8-12
├── ef_construct: 50-80
└── 内存占用低,构建快

内存受限场景:
├── m: 8
├── ef_construct: 50
├── on_disk: True
└── 配合量化使用

搜索时参数 #

python
from qdrant_client.models import SearchParams

results = client.search(
    collection_name="hnsw_demo",
    query_vector=[0.1] * 384,
    search_params=SearchParams(
        hnsw_ef=128,
        exact=False
    ),
    limit=10
)
参数 说明
hnsw_ef 搜索时的 ef 值,越大越精确
exact 是否使用精确搜索(暴力搜索)

精确搜索 #

python
results = client.search(
    collection_name="hnsw_demo",
    query_vector=[0.1] * 384,
    search_params=SearchParams(
        exact=True
    ),
    limit=10
)

向量量化 #

量化可以显著减少内存占用,以少量精度损失换取存储效率。

标量量化(Scalar Quantization) #

python
from qdrant_client.models import (
    ScalarQuantization,
    ScalarQuantizationConfig,
    ScalarType
)

client.create_collection(
    collection_name="scalar_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,
            quantile=0.99,
            always_ram=True
        )
    )
)
text
标量量化效果:

原始向量:float32(4 字节/维度)
量化后:int8(1 字节/维度)

内存节省:75%
精度损失:1-3%

乘积量化(Product Quantization) #

python
from qdrant_client.models import (
    ProductQuantization,
    ProductQuantizationConfig
)

client.create_collection(
    collection_name="pq_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ProductQuantization(
        product=ProductQuantizationConfig(
            compression_ratio=16,
            always_ram=True
        )
    )
)
text
乘积量化效果:

压缩比 16:
├── 内存节省:93.75%
├── 精度损失:5-10%
└── 适合超大规模数据

压缩比 8:
├── 内存节省:87.5%
├── 精度损失:3-5%
└── 推荐平衡选择

压缩比 4:
├── 内存节省:75%
├── 精度损失:1-3%
└── 高精度场景

二进制量化(Binary Quantization) #

python
from qdrant_client.models import (
    BinaryQuantization,
    BinaryQuantizationConfig
)

client.create_collection(
    collection_name="binary_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=BinaryQuantization(
        binary=BinaryQuantizationConfig(
            always_ram=True
        )
    )
)
text
二进制量化效果:

内存节省:96.875%(32 倍)
精度损失:10-20%
适用场景:初步筛选 + 重排序

量化重排序 #

python
results = client.search(
    collection_name="scalar_quantized",
    query_vector=[0.1] * 384,
    search_params=SearchParams(
        quantization={
            "ignore": False,
            "rescore": True,
            "oversampling": 2.0
        }
    ),
    limit=10
)
参数 说明
ignore 是否忽略量化
rescore 是否使用原始向量重排序
oversampling 过采样倍数

量化对比 #

量化类型 内存节省 精度损失 速度 适用场景
无量化 0% 0% 小数据集
标量量化 75% 1-3% 通用场景
乘积量化 87-94% 3-10% 大数据集
二进制量化 97% 10-20% 极快 初筛场景

内存管理 #

内存估算 #

python
def estimate_memory(points_count, vector_size, quantization=None):
    bytes_per_float = 4
    
    if quantization == "scalar":
        bytes_per_dim = 1
    elif quantization == "binary":
        bytes_per_dim = 1 / 8
    elif quantization == "pq16":
        bytes_per_dim = bytes_per_float / 16
    else:
        bytes_per_dim = bytes_per_float
    
    vector_memory = points_count * vector_size * bytes_per_dim
    
    hnsw_overhead = 1.3
    
    total_memory = vector_memory * hnsw_overhead
    
    return {
        "vector_memory_mb": vector_memory / (1024 * 1024),
        "total_memory_mb": total_memory / (1024 * 1024),
        "total_memory_gb": total_memory / (1024 * 1024 * 1024)
    }

estimate = estimate_memory(1_000_000, 384, "scalar")
print(f"预计内存: {estimate['total_memory_gb']:.2f} GB")

磁盘存储配置 #

python
client.create_collection(
    collection_name="disk_storage",
    vectors_config=VectorParams(
        size=384,
        distance=Distance.COSINE,
        on_disk=True
    ),
    hnsw_config=HnswConfigDiff(
        on_disk=True
    ),
    on_disk_payload=True
)

内存映射(mmap) #

python
from qdrant_client.models import OptimizersConfigDiff

client.create_collection(
    collection_name="mmap_config",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    optimizer_config=OptimizersConfigDiff(
        memmap_threshold_kb=50000
    )
)

性能优化策略 #

批量操作优化 #

python
import numpy as np
from concurrent.futures import ThreadPoolExecutor

def batch_insert_optimized(collection_name, points, batch_size=100, workers=4):
    def insert_batch(batch_points):
        client.upsert(
            collection_name=collection_name,
            points=batch_points,
            wait=False
        )
    
    batches = [
        points[i:i + batch_size]
        for i in range(0, len(points), batch_size)
    ]
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        list(executor.map(insert_batch, batches))
    
    print(f"批量插入 {len(points)} 个点完成")

搜索性能优化 #

python
from qdrant_client.models import SearchParams

def optimized_search(collection_name, query_vector, limit=10):
    results = client.search(
        collection_name=collection_name,
        query_vector=query_vector,
        search_params=SearchParams(
            hnsw_ef=max(64, limit * 4)
        ),
        limit=limit,
        with_vectors=False,
        with_payload=["title", "category"]
    )
    return results

并行搜索 #

python
from concurrent.futures import ThreadPoolExecutor

def parallel_search(collection_name, query_vectors, limit=5, workers=4):
    def search_one(query_vector):
        return client.search(
            collection_name=collection_name,
            query_vector=query_vector,
            limit=limit
        )
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        results = list(executor.map(search_one, query_vectors))
    
    return results

索引优化 #

python
client.update_collection(
    collection_name="hnsw_demo",
    optimizer_config=OptimizersConfigDiff(
        indexing_threshold_kb=10000,
        max_optimization_threads=4
    )
)

性能监控 #

查询延迟监控 #

python
import time
import statistics

def measure_search_latency(collection_name, query_vectors, iterations=100):
    latencies = []
    
    for query_vector in query_vectors[:iterations]:
        start = time.time()
        
        client.search(
            collection_name=collection_name,
            query_vector=query_vector,
            limit=10
        )
        
        latency = (time.time() - start) * 1000
        latencies.append(latency)
    
    return {
        "mean_ms": statistics.mean(latencies),
        "median_ms": statistics.median(latencies),
        "p95_ms": sorted(latencies)[int(len(latencies) * 0.95)],
        "p99_ms": sorted(latencies)[int(len(latencies) * 0.99)]
    }

stats = measure_search_latency("hnsw_demo", [[0.1] * 384] * 100)
print(f"P99 延迟: {stats['p99_ms']:.2f}ms")

吞吐量测试 #

python
import time
from concurrent.futures import ThreadPoolExecutor

def throughput_test(collection_name, query_vectors, duration_sec=10, workers=8):
    start_time = time.time()
    query_count = 0
    
    def search_worker():
        nonlocal query_count
        while time.time() - start_time < duration_sec:
            for qv in query_vectors:
                client.search(
                    collection_name=collection_name,
                    query_vector=qv,
                    limit=10
                )
                query_count += 1
                if time.time() - start_time >= duration_sec:
                    break
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        executor.map(lambda _: search_worker(), range(workers))
    
    qps = query_count / duration_sec
    return {"qps": qps, "total_queries": query_count}

result = throughput_test("hnsw_demo", [[0.1] * 384] * 10)
print(f"QPS: {result['qps']:.0f}")

性能基准 #

典型性能指标 #

text
性能基准参考:

数据规模:100 万向量,384 维

无量化:
├── 内存:~2 GB
├── P99 延迟:< 10ms
└── QPS:> 10,000

标量量化:
├── 内存:~0.5 GB
├── P99 延迟:< 8ms
└── QPS:> 15,000

乘积量化(16x):
├── 内存:~0.15 GB
├── P99 延迟:< 15ms
└── QPS:> 8,000

硬件建议 #

text
硬件配置建议:

小规模(< 100 万向量):
├── CPU:4 核
├── 内存:8 GB
└── 存储:SSD 50 GB

中等规模(100-1000 万向量):
├── CPU:8 核
├── 内存:32 GB
└── 存储:SSD 200 GB

大规模(> 1000 万向量):
├── CPU:16+ 核
├── 内存:64+ GB
└── 存储:NVMe SSD 1 TB+

小结 #

本章详细介绍了索引与性能优化:

  • HNSW 索引原理和参数调优
  • 向量量化技术
  • 内存管理策略
  • 性能优化技巧
  • 性能监控方法

下一步 #

掌握性能优化后,继续学习 分布式部署,了解如何构建高可用集群!

最后更新:2026-04-04