性能优化 #

一、性能优化概述 #

text

性能优化方向：

┌─────────────────────────────────────────┐
│           优化维度                       │
├─────────────────────────────────────────┤
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  索引优化                        │   │
│  │  - 索引类型选择                  │   │
│  │  - 参数调优                      │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  查询优化                        │   │
│  │  - 搜索参数                      │   │
│  │  - 批量处理                      │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  配置优化                        │   │
│  │  - 内存配置                      │   │
│  │  - 并发配置                      │   │
│  └─────────────────────────────────┘   │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │  架构优化                        │   │
│  │  - 资源配置                      │   │
│  │  - 扩展策略                      │   │
│  └─────────────────────────────────┘   │
│                                         │
└─────────────────────────────────────────┘

二、索引优化 #

2.1 索引类型选择 #

text

索引选择指南：

数据量              推荐索引         说明
──────────────────────────────────────────
< 10万              FLAT            精度最高
10万-100万          IVF_FLAT        平衡性能
100万-1000万        HNSW/IVF_PQ     高性能
> 1000万            DISKANN         大规模

2.2 IVF索引参数调优 #

python

def tune_ivf_params(n_vectors, dim):
    nlist = int(4 * (n_vectors ** 0.5))
    
    nlist = min(nlist, 65536)
    
    return {
        "metric_type": "L2",
        "index_type": "IVF_FLAT",
        "params": {"nlist": nlist}
    }

index_params = tune_ivf_params(1000000, 768)
collection.create_index("embedding", index_params)

2.3 HNSW参数调优 #

python

def tune_hnsw_params(dim, precision_requirement="medium"):
    if precision_requirement == "high":
        M = min(64, dim // 2)
        efConstruction = 500
    elif precision_requirement == "medium":
        M = min(32, dim // 4)
        efConstruction = 200
    else:
        M = min(16, dim // 8)
        efConstruction = 100
    
    return {
        "metric_type": "L2",
        "index_type": "HNSW",
        "params": {
            "M": M,
            "efConstruction": efConstruction
        }
    }

index_params = tune_hnsw_params(768, "high")
collection.create_index("embedding", index_params)

2.4 搜索参数优化 #

python

def optimize_search_params(index_type, nlist=None):
    if index_type == "IVF_FLAT":
        return {
            "metric_type": "L2",
            "params": {"nprobe": max(1, nlist // 8)}
        }
    elif index_type == "HNSW":
        return {
            "metric_type": "L2",
            "params": {"ef": 64}
        }
    elif index_type == "IVF_PQ":
        return {
            "metric_type": "L2",
            "params": {"nprobe": max(1, nlist // 4)}
        }
    return {"metric_type": "L2"}

三、查询优化 #

3.1 批量搜索 #

python

import numpy as np

def batch_search(collection, query_vectors, batch_size=100):
    all_results = []
    
    for i in range(0, len(query_vectors), batch_size):
        batch = query_vectors[i:i+batch_size]
        results = collection.search(
            data=batch,
            anns_field="embedding",
            param={"metric_type": "L2", "params": {"nprobe": 16}},
            limit=10
        )
        all_results.extend(results)
    
    return all_results

query_vectors = np.random.rand(1000, 768).tolist()
results = batch_search(collection, query_vectors)

3.2 并行搜索 #

python

from concurrent.futures import ThreadPoolExecutor

def parallel_search(collection, query_vectors, workers=4):
    def search_batch(batch):
        return collection.search(
            data=batch,
            anns_field="embedding",
            param={"metric_type": "L2", "params": {"nprobe": 16}},
            limit=10
        )
    
    batch_size = len(query_vectors) // workers
    batches = [
        query_vectors[i:i+batch_size]
        for i in range(0, len(query_vectors), batch_size)
    ]
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        results = list(executor.map(search_batch, batches))
    
    return results

3.3 过滤优化 #

python

results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10,
    expr='category == "electronics"'
)

results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10,
    partition_names=["electronics_partition"]
)

3.4 输出字段优化 #

python

results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10,
    output_fields=["id", "title"]
)

results = collection.search(
    data=[query_vector],
    anns_field="embedding",
    param=search_params,
    limit=10
)

四、配置优化 #

4.1 内存配置 #

yaml

queryNode:
  resources:
    requests:
      memory: "16Gi"
    limits:
      memory: "32Gi"
  config:
    cacheSize: 32

dataNode:
  resources:
    requests:
      memory: "8Gi"
    limits:
      memory: "16Gi"

4.2 并发配置 #

yaml

proxy:
  config:
    maxFieldLength: 65535
    maxShardNum: 256
    maxTaskNum: 1024

queryNode:
  config:
    scheduler:
      receiveChanSize: 1024
      unsolvedQueueSize: 1024

4.3 段配置 #

yaml

dataCoord:
  config:
    segment:
      maxSize: 512
      diskSegmentMaxSize: 2048
      sealProportion: 0.12

queryNode:
  config:
    segcore:
      chunkRows: 1024
      smallIndex:
        nlist: 128
        nprobe: 16

五、资源配置优化 #

5.1 CPU配置 #

yaml

queryNode:
  resources:
    requests:
      cpu: "4"
    limits:
      cpu: "8"

indexNode:
  resources:
    requests:
      cpu: "4"
    limits:
      cpu: "8"

5.2 副本配置 #

yaml

queryNode:
  replicas: 5

proxy:
  replicas: 3

dataNode:
  replicas: 3

5.3 存储配置 #

yaml

minio:
  persistence:
    storageClass: "fast-ssd"
    size: 500Gi

pulsar:
  bookkeeper:
    volumes:
      journal:
        storageClass: "fast-ssd"
        size: 100Gi

六、性能监控 #

6.1 关键指标 #

text

关键监控指标：

┌─────────────────────────────────────────┐
│           延迟指标                       │
├─────────────────────────────────────────┤
│  - 搜索延迟 (P50/P95/P99)               │
│  - 插入延迟                             │
│  - 查询延迟                             │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│           吞吐量指标                     │
├─────────────────────────────────────────┤
│  - QPS (每秒查询数)                     │
│  - 插入TPS                              │
│  - 网络吞吐                             │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│           资源指标                       │
├─────────────────────────────────────────┤
│  - CPU使用率                            │
│  - 内存使用率                           │
│  - 磁盘I/O                              │
│  - 网络I/O                              │
└─────────────────────────────────────────┘

6.2 性能测试 #

python

import time
import numpy as np

def benchmark_search(collection, num_queries=1000, dim=768):
    query_vectors = np.random.rand(num_queries, dim).tolist()
    
    search_params = {
        "metric_type": "L2",
        "params": {"nprobe": 16}
    }
    
    latencies = []
    
    for query in query_vectors:
        start = time.time()
        collection.search(
            data=[query],
            anns_field="embedding",
            param=search_params,
            limit=10
        )
        latencies.append(time.time() - start)
    
    latencies = np.array(latencies) * 1000
    
    return {
        "p50": np.percentile(latencies, 50),
        "p95": np.percentile(latencies, 95),
        "p99": np.percentile(latencies, 99),
        "avg": np.mean(latencies),
        "qps": num_queries / sum(latencies / 1000)
    }

results = benchmark_search(collection)
print(f"P50延迟: {results['p50']:.2f}ms")
print(f"P95延迟: {results['p95']:.2f}ms")
print(f"P99延迟: {results['p99']:.2f}ms")
print(f"QPS: {results['qps']:.2f}")

七、性能调优案例 #

7.1 高QPS场景 #

yaml

proxy:
  replicas: 5
  resources:
    requests:
      cpu: "2"
      memory: "4Gi"

queryNode:
  replicas: 10
  resources:
    requests:
      cpu: "4"
      memory: "16Gi"
  config:
    cacheSize: 64

queryCoord:
  config:
    autoHandoff: true
    autoBalance: true

7.2 大数据量场景 #

yaml

dataNode:
  replicas: 5
  config:
    segment:
      maxSize: 1024

indexNode:
  replicas: 5
  resources:
    requests:
      cpu: "8"
      memory: "32Gi"

queryNode:
  config:
    segcore:
      smallIndex:
        enable: true

7.3 低延迟场景 #

yaml

queryNode:
  config:
    cacheSize: 128
    segcore:
      chunkRows: 4096

proxy:
  config:
    timeTickInterval: 100

八、最佳实践 #

8.1 性能优化检查清单 #

text

性能优化检查清单：

□ 选择合适的索引类型
□ 调优索引参数
□ 优化搜索参数
□ 使用批量操作
□ 配置足够内存
□ 合理配置副本数
□ 启用监控告警
□ 定期性能测试

8.2 常见问题排查 #

text

常见性能问题：

问题                原因                解决方案
──────────────────────────────────────────────
搜索延迟高          索引参数不当        调整nprobe/ef
内存不足            缓存配置小          增加cacheSize
QPS低               副本数不足          增加queryNode副本
插入慢              批量大小小          增大batch_size

九、总结 #

性能优化速查表：

优化方向	方法
索引优化	选择合适索引类型和参数
查询优化	批量搜索、并行搜索
配置优化	内存、并发、段配置
资源优化	CPU、内存、副本配置
监控优化	关键指标监控告警

恭喜你完成Milvus学习之旅！