安全配置 #

概述 #

安全是 Kubeflow 企业部署的关键要素。本章介绍 Kubeflow 的安全配置,帮助你构建安全的机器学习平台。

安全配置内容 #

text
┌─────────────────────────────────────────────────────────────┐
│                    安全配置内容                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  认证:                                                      │
│  ├── Dex 身份认证                                          │
│  ├── OIDC 集成                                             │
│  ├── LDAP/AD 集成                                          │
│  └── 静态用户配置                                           │
│                                                             │
│  授权:                                                      │
│  ├── RBAC 配置                                             │
│  ├── 命名空间隔离                                           │
│  ├── 角色管理                                              │
│  └── 权限控制                                              │
│                                                             │
│  网络安全:                                                  │
│  ├── 网络策略                                              │
│  ├── Istio 安全                                            │
│  ├── TLS 配置                                              │
│  └── 访问控制                                              │
│                                                             │
│  密钥管理:                                                  │
│  ├── Secret 管理                                           │
│  ├── 密钥轮换                                              │
│  ├── 外部密钥管理                                          │
│  └── 审计日志                                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

认证配置 #

Dex 概述 #

Dex 是 Kubeflow 的身份认证服务,支持多种身份提供商。

text
Dex 支持的认证方式:
├── OIDC (OpenID Connect)
│   ├── Google
│   ├── Azure AD
│   ├── Okta
│   └── Keycloak
├── LDAP/Active Directory
├── GitHub
├── GitLab
└── 静态用户

配置静态用户 #

yaml
apiVersion: v1
kind: Secret
metadata:
  name: dex-static-users
  namespace: dex
stringData:
  config.yaml: |
    staticPasswords:
    - email: admin@example.com
      hash: $2a$10$2b2cU8CPhOTaGrs1HRQuA
      username: admin
      userID: "08a8684b-db88-4b73-90a9-3cd1661f5466"
    - email: user@example.com
      hash: $2a$10$2b2cU8CPhOTaGrs1HRQuA
      username: user
      userID: "08a8684b-db88-4b73-90a9-3cd1661f5467"

配置 OIDC #

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dex
  namespace: dex
data:
  config.yaml: |
    issuer: https://kubeflow.example.com/dex
    storage:
      type: kubernetes
      config:
        inCluster: true
    web:
      http: 0.0.0.0:5556
    connectors:
    - type: oidc
      name: Google
      id: google
      config:
        issuer: https://accounts.google.com
        clientID: your-client-id.apps.googleusercontent.com
        clientSecret: $GOOGLE_CLIENT_SECRET
        redirectURI: https://kubeflow.example.com/dex/callback
        hostedDomains:
        - example.com

配置 LDAP #

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dex
  namespace: dex
data:
  config.yaml: |
    connectors:
    - type: ldap
      name: ActiveDirectory
      id: ad
      config:
        host: ldap.example.com:636
        insecureNoSSL: false
        bindDN: cn=admin,dc=example,dc=com
        bindPW: admin-password
        usernamePrompt: Username
        userSearch:
          baseDN: dc=example,dc=com
          filter: "(objectClass=user)"
          username: sAMAccountName
          idAttr: DN
          emailAttr: mail
          nameAttr: displayName
        groupSearch:
          baseDN: dc=example,dc=com
          filter: "(objectClass=group)"
          userMatchers:
          - userAttr: DN
            groupAttr: member
          nameAttr: cn

配置 GitHub #

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: dex
  namespace: dex
data:
  config.yaml: |
    connectors:
    - type: github
      name: GitHub
      id: github
      config:
        clientID: your-github-client-id
        clientSecret: $GITHUB_CLIENT_SECRET
        redirectURI: https://kubeflow.example.com/dex/callback
        orgs:
        - name: your-org
          teams:
          - ml-team

RBAC 配置 #

Kubeflow 角色 #

text
内置角色:
├── kubeflow-admin
│   └── 完全管理权限
├── kubeflow-edit
│   └── 创建和修改资源
├── kubeflow-view
│   └── 只读权限
└── 自定义角色
    └── 按需配置

创建自定义角色 #

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ml-developer
  namespace: kubeflow-user-alice
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["kubeflow.org"]
  resources: ["notebooks", "tfjobs", "pytorchjobs", "experiments", "trials"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["serving.kserve.io"]
  resources: ["inferenceservices"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ml-developer-binding
  namespace: kubeflow-user-alice
subjects:
- kind: User
  name: alice@example.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: ml-developer
  apiGroup: rbac.authorization.k8s.io

集群级权限 #

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kubeflow-viewer
rules:
- apiGroups: ["kubeflow.org"]
  resources: ["profiles"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["nodes", "namespaces"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubeflow-viewer-binding
subjects:
- kind: Group
  name: viewers@example.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: kubeflow-viewer
  apiGroup: rbac.authorization.k8s.io

网络安全 #

网络策略 #

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: kubeflow-user-alice
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-istio-ingress
  namespace: kubeflow-user-alice
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: istio-system
    ports:
    - protocol: TCP
      port: 8888
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: kubeflow-user-alice
spec:
  podSelector: {}
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53

Istio 安全配置 #

yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: kubeflow-user-alice
spec:
  mtls:
    mode: STRICT
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: kubeflow-user-alice
spec:
  {}
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-istio-ingress
  namespace: kubeflow-user-alice
spec:
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account"]

TLS 配置 #

yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-tls
  namespace: istio-system
spec:
  secretName: kubeflow-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - kubeflow.example.com

密钥管理 #

Secret 配置 #

yaml
apiVersion: v1
kind: Secret
metadata:
  name: ml-credentials
  namespace: kubeflow-user-alice
type: Opaque
stringData:
  aws-access-key: AKIAIOSFODNN7EXAMPLE
  aws-secret-key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  mlflow-tracking-uri: https://mlflow.example.com
  mlflow-username: mlflow-user
  mlflow-password: mlflow-password

在 Pod 中使用 Secret #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: secure-notebook
  namespace: kubeflow-user-alice
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: python:3.9
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: ml-credentials
              key: aws-access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: ml-credentials
              key: aws-secret-key
        volumeMounts:
        - name: credentials
          mountPath: /etc/credentials
          readOnly: true
      volumes:
      - name: credentials
        secret:
          secretName: ml-credentials

外部密钥管理 #

yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: kubeflow-user-alice
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-west-2
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: ml-credentials
  namespace: kubeflow-user-alice
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: ml-credentials
    creationPolicy: Owner
  data:
  - secretKey: aws-access-key
    remoteRef:
      key: ml-credentials
      property: access-key
  - secretKey: aws-secret-key
    remoteRef:
      key: ml-credentials
      property: secret-key

审计日志 #

启用审计日志 #

yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
  resources:
  - group: ""
    resources: ["secrets"]
- level: RequestResponse
  resources:
  - group: "kubeflow.org"
    resources: ["*"]
- level: Metadata
  omitStages:
  - RequestReceived

审计日志分析 #

bash
# 查看审计日志
kubectl logs -n kube-system kube-apiserver-master | grep audit

# 分析特定用户的操作
kubectl logs -n kube-system kube-apiserver-master | grep "user\":\"alice@example.com"

# 查看敏感操作
kubectl logs -n kube-system kube-apiserver-master | grep -E "(secrets|configmaps)"

安全最佳实践 #

访问控制 #

text
1. 最小权限原则
   ├── 只授予必要的权限
   ├── 定期审查权限
   └── 及时撤销不需要的权限

2. 身份验证
   ├── 使用强密码策略
   ├── 启用多因素认证
   └── 定期轮换凭证

3. 会话管理
   ├── 设置合理的会话超时
   ├── 限制并发会话
   └── 记录登录日志

数据安全 #

text
1. 数据加密
   ├── 启用 etcd 加密
   ├── 使用 TLS 传输加密
   └── 敏感数据使用 Secret

2. 密钥管理
   ├── 定期轮换密钥
   ├── 使用外部密钥管理
   └── 限制密钥访问

3. 数据隔离
   ├── 命名空间隔离
   ├── 网络策略隔离
   └── 存储隔离

容器安全 #

yaml
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: app:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

安全扫描 #

bash
# 镜像安全扫描
trivy image my-image:latest

# 配置安全扫描
kubeaudit all -n kubeflow-user-alice

# RBAC 安全检查
kubectl auth can-i --list -n kubeflow-user-alice --as=alice@example.com

安全检查清单 #

text
认证配置:
□ 配置强身份认证
□ 启用多因素认证
□ 配置会话超时
□ 禁用静态用户(生产环境)

授权配置:
□ 配置最小权限 RBAC
□ 启用命名空间隔离
□ 定期审查权限
□ 记录权限变更

网络安全:
□ 配置网络策略
□ 启用 TLS
□ 配置 Istio 安全
□ 限制外部访问

密钥管理:
□ 使用 Secret 存储敏感信息
□ 定期轮换密钥
□ 启用 etcd 加密
□ 配置审计日志

容器安全:
□ 使用非 root 用户
□ 只读文件系统
□ 禁用特权
□ 定期安全扫描

下一步 #

现在你已经掌握了安全配置,接下来学习 监控与日志,了解 Kubeflow 的监控和日志管理!

最后更新:2026-04-05