健康检查 #

一、健康检查概述 #

健康检查是Kubernetes监控容器状态的重要机制,确保应用正常运行。

1.1 探针类型 #

text
探针类型
    │
    ├── Liveness Probe(存活探针)
    │   └── 检测容器是否存活
    │
    ├── Readiness Probe(就绪探针)
    │   └── 检测容器是否就绪
    │
    └── Startup Probe(启动探针)
        └── 检测容器是否启动完成

1.2 探针作用 #

探针 失败后果 用途
Liveness 重启容器 检测死锁、服务挂起
Readiness 移出Service端点 检测服务就绪
Startup 禁用其他探针 慢启动应用保护

二、存活探针 #

2.1 HTTP探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - name: app
    image: nginx
    livenessProbe:
      httpGet:
        path: /health
        port: 80
        httpHeaders:
        - name: Custom-Header
          value: value
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
      successThreshold: 1

2.2 TCP探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcp
spec:
  containers:
  - name: app
    image: nginx
    livenessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 10

2.3 命令探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
spec:
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "touch /tmp/health && sleep 3600"]
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/health
      initialDelaySeconds: 5
      periodSeconds: 5

三、就绪探针 #

3.1 HTTP探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: readiness-http
spec:
  containers:
  - name: app
    image: nginx
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3

3.2 TCP探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: readiness-tcp
spec:
  containers:
  - name: app
    image: nginx
    readinessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5

3.3 命令探针 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: readiness-exec
spec:
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "sleep 10 && touch /tmp/ready && sleep 3600"]
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/ready
      initialDelaySeconds: 5
      periodSeconds: 5

四、启动探针 #

4.1 基本配置 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe
spec:
  containers:
  - name: app
    image: nginx
    startupProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 10
      failureThreshold: 30
    livenessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 0
      periodSeconds: 5

4.2 慢启动应用 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: slow-startup
spec:
  containers:
  - name: app
    image: myapp
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      periodSeconds: 10
      failureThreshold: 60

五、探针参数 #

5.1 参数说明 #

参数 说明 默认值
initialDelaySeconds 初始延迟 0
periodSeconds 检查间隔 10
timeoutSeconds 超时时间 1
failureThreshold 失败阈值 3
successThreshold 成功阈值 1

5.2 参数配置 #

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  successThreshold: 1

六、完整配置示例 #

6.1 Web应用 #

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: nginx:1.25
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3

6.2 数据库应用 #

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: rootpassword
        ports:
        - containerPort: 3306
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
            - -h
            - localhost
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - mysql
            - -h
            - 127.0.0.1
            - -e
            - SELECT 1
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1

6.3 微服务应用 #

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
      - name: api
        image: myapi:latest
        ports:
        - containerPort: 8080
        startupProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          periodSeconds: 10
          failureThreshold: 30
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          periodSeconds: 5
          timeoutSeconds: 3

七、探针调试 #

7.1 查看探针状态 #

bash
# 查看Pod事件
kubectl describe pod <pod-name>

# 查看探针配置
kubectl get pod <pod-name> -o yaml

# 查看容器日志
kubectl logs <pod-name>

7.2 测试探针端点 #

bash
# 进入容器测试
kubectl exec -it <pod-name> -- curl localhost:80/health

# 端口转发测试
kubectl port-forward <pod-name> 8080:80
curl http://localhost:8080/health

八、最佳实践 #

8.1 探针设计原则 #

text
探针设计原则
    │
    ├── 轻量级
    │   └── 快速响应
    │
    ├── 幂等性
    │   └── 多次调用无副作用
    │
    ├── 独立性
    │   └── 不依赖外部服务
    │
    └── 合理阈值
        └── 根据应用特点设置

8.2 探针配置建议 #

场景 Liveness Readiness Startup
快速启动应用 initialDelay: 10s initialDelay: 5s 不需要
慢启动应用 initialDelay: 0 initialDelay: 0 failureThreshold: 60
数据库应用 exec检查 exec检查 长时间等待

8.3 常见问题 #

yaml
# 问题:探针检查依赖外部服务
# 解决:探针只检查自身状态

# 问题:探针超时时间太短
# 解决:根据响应时间设置合理超时

# 问题:初始延迟不够
# 解决:使用Startup探针或增加initialDelaySeconds

九、故障排查 #

9.1 常见问题 #

bash
# 查看Pod重启原因
kubectl describe pod <pod-name> | grep -A 5 "Last State"

# 查看探针事件
kubectl get events --field-selector involvedObject.name=<pod-name>

# 查看容器退出码
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

9.2 问题诊断 #

问题 原因 解决方案
容器频繁重启 Liveness失败 检查探针端点
服务不可用 Readiness失败 检查依赖服务
启动超时 Startup失败 增加failureThreshold

十、总结 #

10.1 核心要点 #

探针 用途 失败后果
Liveness 检测存活 重启容器
Readiness 检测就绪 移出Service
Startup 检测启动 禁用其他探针

10.2 下一步 #

掌握了健康检查后,让我们学习 Helm包管理,了解应用打包和部署的方法。

最后更新:2026-03-28