健康检查 #
一、健康检查概述 #
健康检查是Kubernetes监控容器状态的重要机制,确保应用正常运行。
1.1 探针类型 #
text
探针类型
│
├── Liveness Probe(存活探针)
│ └── 检测容器是否存活
│
├── Readiness Probe(就绪探针)
│ └── 检测容器是否就绪
│
└── Startup Probe(启动探针)
└── 检测容器是否启动完成
1.2 探针作用 #
| 探针 | 失败后果 | 用途 |
|---|---|---|
| Liveness | 重启容器 | 检测死锁、服务挂起 |
| Readiness | 移出Service端点 | 检测服务就绪 |
| Startup | 禁用其他探针 | 慢启动应用保护 |
二、存活探针 #
2.1 HTTP探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-http
spec:
containers:
- name: app
image: nginx
livenessProbe:
httpGet:
path: /health
port: 80
httpHeaders:
- name: Custom-Header
value: value
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
2.2 TCP探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcp
spec:
containers:
- name: app
image: nginx
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 10
periodSeconds: 10
2.3 命令探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec
spec:
containers:
- name: app
image: busybox
command: ["sh", "-c", "touch /tmp/health && sleep 3600"]
livenessProbe:
exec:
command:
- cat
- /tmp/health
initialDelaySeconds: 5
periodSeconds: 5
三、就绪探针 #
3.1 HTTP探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: readiness-http
spec:
containers:
- name: app
image: nginx
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
3.2 TCP探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: readiness-tcp
spec:
containers:
- name: app
image: nginx
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 5
periodSeconds: 5
3.3 命令探针 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: readiness-exec
spec:
containers:
- name: app
image: busybox
command: ["sh", "-c", "sleep 10 && touch /tmp/ready && sleep 3600"]
readinessProbe:
exec:
command:
- cat
- /tmp/ready
initialDelaySeconds: 5
periodSeconds: 5
四、启动探针 #
4.1 基本配置 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: startup-probe
spec:
containers:
- name: app
image: nginx
startupProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 0
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 0
periodSeconds: 5
4.2 慢启动应用 #
yaml
apiVersion: v1
kind: Pod
metadata:
name: slow-startup
spec:
containers:
- name: app
image: myapp
startupProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 10
failureThreshold: 60
五、探针参数 #
5.1 参数说明 #
| 参数 | 说明 | 默认值 |
|---|---|---|
| initialDelaySeconds | 初始延迟 | 0 |
| periodSeconds | 检查间隔 | 10 |
| timeoutSeconds | 超时时间 | 1 |
| failureThreshold | 失败阈值 | 3 |
| successThreshold | 成功阈值 | 1 |
5.2 参数配置 #
yaml
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
六、完整配置示例 #
6.1 Web应用 #
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: nginx:1.25
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
6.2 数据库应用 #
yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: rootpassword
ports:
- containerPort: 3306
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command:
- mysql
- -h
- 127.0.0.1
- -e
- SELECT 1
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
6.3 微服务应用 #
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myapi:latest
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /actuator/health
port: 8080
periodSeconds: 10
failureThreshold: 30
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
timeoutSeconds: 3
七、探针调试 #
7.1 查看探针状态 #
bash
# 查看Pod事件
kubectl describe pod <pod-name>
# 查看探针配置
kubectl get pod <pod-name> -o yaml
# 查看容器日志
kubectl logs <pod-name>
7.2 测试探针端点 #
bash
# 进入容器测试
kubectl exec -it <pod-name> -- curl localhost:80/health
# 端口转发测试
kubectl port-forward <pod-name> 8080:80
curl http://localhost:8080/health
八、最佳实践 #
8.1 探针设计原则 #
text
探针设计原则
│
├── 轻量级
│ └── 快速响应
│
├── 幂等性
│ └── 多次调用无副作用
│
├── 独立性
│ └── 不依赖外部服务
│
└── 合理阈值
└── 根据应用特点设置
8.2 探针配置建议 #
| 场景 | Liveness | Readiness | Startup |
|---|---|---|---|
| 快速启动应用 | initialDelay: 10s | initialDelay: 5s | 不需要 |
| 慢启动应用 | initialDelay: 0 | initialDelay: 0 | failureThreshold: 60 |
| 数据库应用 | exec检查 | exec检查 | 长时间等待 |
8.3 常见问题 #
yaml
# 问题:探针检查依赖外部服务
# 解决:探针只检查自身状态
# 问题:探针超时时间太短
# 解决:根据响应时间设置合理超时
# 问题:初始延迟不够
# 解决:使用Startup探针或增加initialDelaySeconds
九、故障排查 #
9.1 常见问题 #
bash
# 查看Pod重启原因
kubectl describe pod <pod-name> | grep -A 5 "Last State"
# 查看探针事件
kubectl get events --field-selector involvedObject.name=<pod-name>
# 查看容器退出码
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
9.2 问题诊断 #
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 容器频繁重启 | Liveness失败 | 检查探针端点 |
| 服务不可用 | Readiness失败 | 检查依赖服务 |
| 启动超时 | Startup失败 | 增加failureThreshold |
十、总结 #
10.1 核心要点 #
| 探针 | 用途 | 失败后果 |
|---|---|---|
| Liveness | 检测存活 | 重启容器 |
| Readiness | 检测就绪 | 移出Service |
| Startup | 检测启动 | 禁用其他探针 |
10.2 下一步 #
掌握了健康检查后,让我们学习 Helm包管理,了解应用打包和部署的方法。
最后更新:2026-03-28