模型服务 #
概述 #
模型服务是将训练好的机器学习模型部署为可访问的 API 服务,使其能够处理实际的推理请求。Kubeflow 支持多种模型服务方案。
支持的服务框架 #
text
┌─────────────────────────────────────────────────────────────┐
│ 模型服务框架 │
├─────────────────────────────────────────────────────────────┤
│ │
│ KServe (推荐): │
│ ├── Kubernetes 原生模型服务 │
│ ├── 支持多种模型框架 │
│ ├── 自动扩缩容 │
│ └── 高级推理功能 │
│ │
│ Seldon Core: │
│ ├── MLOps 平台 │
│ ├── 灵活的推理图 │
│ ├── 解释器支持 │
│ └── A/B 测试 │
│ │
│ TFServing: │
│ ├── TensorFlow 官方服务 │
│ ├── 高性能推理 │
│ └── 模型热更新 │
│ │
│ Triton Inference Server: │
│ ├── NVIDIA 推理服务 │
│ ├── 多框架支持 │
│ └── GPU 优化 │
│ │
└─────────────────────────────────────────────────────────────┘
KServe #
KServe 概述 #
KServe(原 KFServing)是 Kubernetes 原生的模型服务解决方案,提供高性能、可扩展的模型推理服务。
text
KServe 核心特性:
├── 多框架支持
│ ├── TensorFlow
│ ├── PyTorch
│ ├── scikit-learn
│ ├── XGBoost
│ ├── ONNX
│ └── 自定义
│
├── 自动扩缩容
│ ├── Knative Serving
│ ├── 基于请求的自动扩缩
│ └── 缩减到零
│
├── 高级功能
│ ├── Canary 发布
│ ├── A/B 测试
│ ├── 模型解释
│ └── Transformer
│
└── 协议支持
├── V1 (REST)
├── V2 (REST/gRPC)
└── 自定义协议
安装 KServe #
bash
# 安装 KServe
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.11.0/kserve.yaml
# 安装 KServe Runtime
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.11.0/kserve-cluster-resources.yaml
# 验证安装
kubectl get pods -n kserve
部署 InferenceService #
TensorFlow 模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: tensorflow-model
namespace: kubeflow-user-example-com
spec:
predictor:
tensorflow:
storageUri: gs://kfserving-examples/models/tensorflow/iris
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
PyTorch 模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: pytorch-model
namespace: kubeflow-user-example-com
spec:
predictor:
pytorch:
storageUri: gs://kfserving-examples/models/pytorch/iris
modelClassName: IrisNet
resources:
requests:
cpu: "1"
memory: "2Gi"
scikit-learn 模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-model
namespace: kubeflow-user-example-com
spec:
predictor:
sklearn:
storageUri: gs://kfserving-examples/models/sklearn/iris
protocolVersion: v2
XGBoost 模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: xgboost-model
namespace: kubeflow-user-example-com
spec:
predictor:
xgboost:
storageUri: gs://kfserving-examples/models/xgboost/iris
protocolVersion: v2
ONNX 模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: onnx-model
namespace: kubeflow-user-example-com
spec:
predictor:
onnx:
storageUri: gs://kfserving-examples/models/onnx/iris
自定义模型 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: custom-model
namespace: kubeflow-user-example-com
spec:
predictor:
containers:
- name: custom-container
image: my-registry/custom-model:latest
ports:
- containerPort: 8080
protocol: TCP
env:
- name: MODEL_PATH
value: /models/model.pkl
resources:
requests:
cpu: "1"
memory: "2Gi"
GPU 支持 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: gpu-model
namespace: kubeflow-user-example-com
spec:
predictor:
tensorflow:
storageUri: gs://my-bucket/models/tensorflow
resources:
requests:
cpu: "2"
memory: "8Gi"
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
推理协议 #
V1 协议 (REST) #
python
import requests
import json
url = "http://sklearn-model.kubeflow-user-example-com.example.com/v1/models/sklearn-model:predict"
data = {
"instances": [
[5.1, 3.5, 1.4, 0.2],
[6.2, 2.9, 4.3, 1.3]
]
}
response = requests.post(url, json=data)
predictions = response.json()
print(predictions)
V2 协议 (REST) #
python
import requests
import json
import numpy as np
url = "http://sklearn-model.kubeflow-user-example-com.example.com/v2/models/sklearn-model/infer"
data = {
"id": "request-1",
"inputs": [
{
"name": "input-0",
"shape": [2, 4],
"datatype": "FP32",
"data": [
[5.1, 3.5, 1.4, 0.2],
[6.2, 2.9, 4.3, 1.3]
]
}
]
}
response = requests.post(url, json=data)
result = response.json()
print(result)
gRPC 调用 #
python
import grpc
from kserve.protocol.grpc import predict_pb2, prediction_service_pb2_grpc
channel = grpc.insecure_channel('sklearn-model.kubeflow-user-example-com.example.com:9000')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'sklearn-model'
request.inputs['input-0'].CopyFrom(
tf.make_tensor_proto(
values=[[5.1, 3.5, 1.4, 0.2]],
dtype=tf.float32,
shape=[1, 4]
)
)
response = stub.Predict(request)
print(response)
高级功能 #
Canary 发布 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: canary-model
namespace: kubeflow-user-example-com
spec:
predictor:
canaryTrafficPercent: 20
model:
modelFormat:
name: sklearn
storageUri: gs://my-bucket/models/sklearn/v2
sklearn:
storageUri: gs://my-bucket/models/sklearn/v1
A/B 测试 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: ab-model
namespace: kubeflow-user-example-com
spec:
predictor:
sklearn:
storageUri: gs://my-bucket/models/sklearn/v1
resources:
requests:
cpu: "1"
memory: "2Gi"
---
apiVersion: serving.kserve.io/v1alpha1
kind: InferenceGraph
metadata:
name: ab-test-graph
namespace: kubeflow-user-example-com
spec:
nodes:
router:
routerType: Splitter
routes:
- nodeName: model-a
weight: 70
- nodeName: model-b
weight: 30
model-a:
serviceName: ab-model
model-b:
serviceName: ab-model-v2
Transformer #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: transformer-model
namespace: kubeflow-user-example-com
spec:
predictor:
sklearn:
storageUri: gs://my-bucket/models/sklearn/iris
transformer:
containers:
- name: transformer
image: my-registry/transformer:latest
command:
- python
- -m
- transformer
- --model_name
- transformer-model
- --protocol
- v2
模型解释 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: explainable-model
namespace: kubeflow-user-example-com
spec:
predictor:
sklearn:
storageUri: gs://my-bucket/models/sklearn/iris
explainer:
alibi:
type: AnchorTabular
resources:
requests:
cpu: "1"
memory: "2Gi"
Seldon Core #
Seldon Core 概述 #
Seldon Core 是一个开源的 MLOps 平台,提供灵活的模型部署和管理能力。
text
Seldon Core 特性:
├── 推理图
│ ├── 模型组合
│ ├── 数据预处理
│ └── 后处理
│
├── 模型解释
│ ├── Alibi Explain
│ ├── 多种解释方法
│ └── 可视化
│
├── 监控
│ ├── Prometheus 指标
│ ├── Grafana 仪表板
│ └── 请求日志
│
└── 高级部署
├── A/B 测试
├── 金丝雀发布
└── 多臂老虎机
安装 Seldon Core #
bash
# 安装 Seldon Core
kubectl create namespace seldon-system
helm install seldon-core seldon-core-operator \
--repo https://storage.googleapis.com/seldon-charts \
--namespace seldon-system \
--set usageMetrics.enabled=true
# 安装 Istio(用于 Ingress)
kubectl apply -f https://raw.githubusercontent.com/SeldonIO/seldon-core/master/examples/notebooks/seldon_core_istio.yaml
部署 SeldonDeployment #
yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: sklearn-model
namespace: kubeflow-user-example-com
spec:
predictors:
- name: default
replicas: 1
graph:
name: classifier
implementation: SKLEARN_SERVER
modelUri: gs://seldon-models/sklearn/iris
envSecretRefName: seldon-rclone-secret
componentSpecs:
- spec:
containers:
- name: classifier
resources:
requests:
cpu: "1"
memory: "2Gi"
推理图 #
yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: inference-graph
namespace: kubeflow-user-example-com
spec:
predictors:
- name: default
graph:
name: pipeline
implementation: ENSEMBLE
children:
- name: preprocess
implementation: CUSTOM
modelUri: gs://my-bucket/preprocess
children:
- name: model-a
implementation: SKLEARN_SERVER
modelUri: gs://my-bucket/model-a
- name: model-b
implementation: SKLEARN_SERVER
modelUri: gs://my-bucket/model-b
模型存储 #
存储选项 #
text
支持的存储:
├── S3 / MinIO
├── Google Cloud Storage (GCS)
├── Azure Blob Storage
├── PVC (Persistent Volume Claim)
├── HTTP/HTTPS
└── 本地存储(测试)
S3 存储 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: s3-model
namespace: kubeflow-user-example-com
annotations:
serving.kserve.io/s3-secret-name: aws-secret
spec:
predictor:
sklearn:
storageUri: s3://my-bucket/models/sklearn/iris
PVC 存储 #
yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
namespace: kubeflow-user-example-com
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: pvc-model
namespace: kubeflow-user-example-com
spec:
predictor:
sklearn:
storageUri: pvc://model-pvc/models/sklearn
自动扩缩容 #
Knative 配置 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: autoscale-model
namespace: kubeflow-user-example-com
annotations:
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "10"
spec:
predictor:
sklearn:
storageUri: gs://my-bucket/models/sklearn/iris
HPA 配置 #
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: model-hpa
namespace: kubeflow-user-example-com
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sklearn-model-predictor
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
监控和日志 #
Prometheus 指标 #
yaml
apiVersion: v1
kind: ServiceMonitor
metadata:
name: kserve-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: kserve
endpoints:
- port: metrics
interval: 30s
请求日志 #
yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: logged-model
namespace: kubeflow-user-example-com
spec:
predictor:
logger:
mode: all
url: http://log-server/logging
sklearn:
storageUri: gs://my-bucket/models/sklearn/iris
管理服务 #
查看服务状态 #
bash
# 列出 InferenceService
kubectl get inferenceservices -n kubeflow-user-example-com
# 查看服务详情
kubectl describe inferenceservice sklearn-model -n kubeflow-user-example-com
# 查看 Pod 状态
kubectl get pods -n kubeflow-user-example-com -l serving.kserve.io/inferenceservice=sklearn-model
# 查看日志
kubectl logs -n kubeflow-user-example-com -l serving.kserve.io/inferenceservice=sklearn-model
更新服务 #
bash
# 更新模型
kubectl patch inferenceservice sklearn-model -n kubeflow-user-example-com --type merge -p '
{
"spec": {
"predictor": {
"sklearn": {
"storageUri": "gs://my-bucket/models/sklearn/v2"
}
}
}
}'
删除服务 #
bash
# 删除 InferenceService
kubectl delete inferenceservice sklearn-model -n kubeflow-user-example-com
最佳实践 #
部署策略 #
text
1. 版本管理
├── 使用语义化版本
├── 保留历史版本
└── 支持快速回滚
2. 发布策略
├── 金丝雀发布
├── A/B 测试
└── 蓝绿部署
3. 资源配置
├── 合理设置资源请求
├── 配置自动扩缩容
└── 监控资源使用
性能优化 #
text
1. 模型优化
├── 模型量化
├── 模型剪枝
└── 知识蒸馏
2. 推理优化
├── 批处理
├── 缓存
└── 异步处理
3. 资源优化
├── GPU 加速
├── 多模型服务
└── 资源共享
下一步 #
现在你已经掌握了模型服务,接下来学习 训练概述,了解 Kubeflow 的训练作业管理!
最后更新:2026-04-05