Notebooks 笔记本 #

概述 #

Kubeflow Notebooks 提供了在 Kubernetes 上运行交互式开发环境的能力,支持 JupyterLab、VS Code Server、RStudio 等多种 IDE。

核心特性 #

text
┌─────────────────────────────────────────────────────────────┐
│                   Notebooks 核心特性                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  多 IDE 支持:                                               │
│  ├── JupyterLab (默认)                                      │
│  ├── VS Code Server                                         │
│  ├── RStudio                                                │
│  └── 自定义 IDE                                             │
│                                                             │
│  资源管理:                                                  │
│  ├── CPU/内存 配置                                          │
│  ├── GPU 支持                                               │
│  ├── 持久化存储                                             │
│  └── 资源限制                                               │
│                                                             │
│  多用户支持:                                                │
│  ├── 命名空间隔离                                           │
│  ├── 独立工作空间                                           │
│  ├── 权限控制                                               │
│  └── 资源配额                                               │
│                                                             │
│  自定义能力:                                                │
│  ├── 自定义镜像                                             │
│  ├── 预装依赖                                               │
│  ├── 环境变量配置                                           │
│  └── 启动脚本                                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Notebook 架构 #

架构图 #

text
┌─────────────────────────────────────────────────────────────┐
│                    Notebook 架构                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  用户访问层                                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  浏览器 → Istio Gateway → Notebook Service          │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  Notebook Controller                                        │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Notebook Controller → 管理 Notebook CRD            │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  Notebook Pod                                               │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  ┌─────────────┐  ┌─────────────┐                   │   │
│  │  │  IDE 容器    │  │  边车容器    │                   │   │
│  │  │  (Jupyter)  │  │  (代理)      │                   │   │
│  │  └─────────────┘  └─────────────┘                   │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  存储层                                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  PVC (持久化存储)                                    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Notebook CRD #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: my-notebook
  namespace: kubeflow-user-example-com
  labels:
    app: my-notebook
spec:
  template:
    spec:
      serviceAccountName: default-editor
      containers:
      - name: notebook
        image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow:v1.8.0
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
        volumeMounts:
        - name: data
          mountPath: /home/jovyan
        env:
        - name: NOTEBOOK_ARGS
          value: "--NotebookApp.default_url=/lab"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: my-notebook-pvc

创建 Notebook #

通过 Dashboard 创建 #

text
1. 登录 Kubeflow Dashboard

2. 点击左侧导航栏 "Notebooks"

3. 点击 "New Notebook" 按钮

4. 配置 Notebook:
   ├── Name: 输入名称
   ├── Namespace: 选择命名空间
   ├── Image: 选择镜像类型
   │   ├── TensorFlow
   │   ├── PyTorch
   │   ├── Base Python
   │   └── Custom
   ├── CPU: 设置 CPU 核数
   ├── Memory: 设置内存大小
   ├── GPU: 设置 GPU 数量
   ├── Workspace Volume: 配置存储
   │   ├── Size: 存储大小
   │   ├── Access Mode: 访问模式
   │   └── Storage Class: 存储类
   └── Affinity/Tolerations: 调度配置

5. 点击 "LAUNCH" 创建

6. 等待状态变为 "Running"

7. 点击 "CONNECT" 连接

通过 YAML 创建 #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: ml-notebook
  namespace: kubeflow-user-example-com
  labels:
    app: ml-notebook
spec:
  template:
    spec:
      serviceAccountName: default-editor
      containers:
      - name: notebook
        image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow:v1.8.0
        resources:
          requests:
            cpu: "4"
            memory: "8Gi"
            nvidia.com/gpu: "1"
          limits:
            cpu: "8"
            memory: "16Gi"
            nvidia.com/gpu: "1"
        volumeMounts:
        - name: workspace
          mountPath: /home/jovyan
        - name: data
          mountPath: /data
        env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: "all"
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: workspace-pvc
      - name: data
        persistentVolumeClaim:
          claimName: data-pvc
bash
# 应用配置
kubectl apply -f notebook.yaml

# 查看 Notebook 状态
kubectl get notebooks -n kubeflow-user-example-com

# 查看 Pod 状态
kubectl get pods -n kubeflow-user-example-com -l app=ml-notebook

镜像选择 #

官方镜像 #

text
TensorFlow 镜像:
├── jupyter-tensorflow - TensorFlow + JupyterLab
├── jupyter-tensorflow-cuda - TensorFlow + GPU 支持
└── jupyter-tensorflow-full - 完整 TensorFlow 环境

PyTorch 镜像:
├── jupyter-pytorch - PyTorch + JupyterLab
├── jupyter-pytorch-cuda - PyTorch + GPU 支持
└── jupyter-pytorch-full - 完整 PyTorch 环境

基础镜像:
├── jupyter-scipy - 科学计算环境
├── jupyter-datascience - 数据科学环境
└── jupyter-minimal - 最小化环境

镜像地址 #

yaml
# TensorFlow 镜像
image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow:v1.8.0

# PyTorch 镜像
image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch:v1.8.0

# VS Code Server 镜像
image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/codeserver-python:v1.8.0

# RStudio 镜像
image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/rstudio-tidyverse:v1.8.0

自定义镜像 #

dockerfile
# Dockerfile
FROM python:3.9-slim

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# 安装 Python 依赖
RUN pip install --no-cache-dir \
    jupyterlab \
    numpy \
    pandas \
    scikit-learn \
    tensorflow \
    torch

# 设置工作目录
WORKDIR /home/jovyan

# 设置用户
USER root

# 启动 JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
bash
# 构建镜像
docker build -t my-notebook:latest .

# 推送到镜像仓库
docker push my-registry/my-notebook:latest

资源配置 #

CPU 和内存 #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: cpu-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: python:3.9
        resources:
          requests:
            cpu: "2"        # 请求 2 核 CPU
            memory: "4Gi"   # 请求 4GB 内存
          limits:
            cpu: "4"        # 最大 4 核 CPU
            memory: "8Gi"   # 最大 8GB 内存

GPU 配置 #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: gpu-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: tensorflow/tensorflow:latest-gpu
        resources:
          requests:
            cpu: "4"
            memory: "16Gi"
            nvidia.com/gpu: "1"  # 请求 1 个 GPU
          limits:
            cpu: "8"
            memory: "32Gi"
            nvidia.com/gpu: "1"
        env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: "all"
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"

多 GPU 配置 #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: multi-gpu-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: pytorch/pytorch:latest
        resources:
          limits:
            nvidia.com/gpu: "4"  # 请求 4 个 GPU

存储配置 #

持久化存储 #

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: notebook-pvc
  namespace: kubeflow-user-example-com
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: standard
---
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: storage-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: python:3.9
        volumeMounts:
        - name: workspace
          mountPath: /home/jovyan
        - name: datasets
          mountPath: /datasets
          readOnly: true
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: notebook-pvc
      - name: datasets
        persistentVolumeClaim:
          claimName: shared-datasets-pvc

配置 ConfigMap 和 Secret #

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: notebook-config
  namespace: kubeflow-user-example-com
data:
  config.yaml: |
    database:
      host: mysql-service
      port: 3306
---
apiVersion: v1
kind: Secret
metadata:
  name: notebook-secrets
  namespace: kubeflow-user-example-com
type: Opaque
stringData:
  db-password: "my-secret-password"
  api-key: "my-api-key"
---
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: config-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: python:3.9
        volumeMounts:
        - name: config
          mountPath: /etc/config
          readOnly: true
        - name: secrets
          mountPath: /etc/secrets
          readOnly: true
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: notebook-secrets
              key: db-password
      volumes:
      - name: config
        configMap:
          name: notebook-config
      - name: secrets
        secret:
          name: notebook-secrets

网络配置 #

服务配置 #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: network-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: notebook
        image: python:3.9
        ports:
        - containerPort: 8888
          name: notebook
          protocol: TCP
        - containerPort: 6006
          name: tensorboard
          protocol: TCP

网络策略 #

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: notebook-network-policy
  namespace: kubeflow-user-example-com
spec:
  podSelector:
    matchLabels:
      app: my-notebook
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: istio-system
    ports:
    - port: 8888
      protocol: TCP
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - port: 53
      protocol: UDP

VS Code Server #

创建 VS Code Notebook #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: vscode-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: vscode
        image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/codeserver-python:v1.8.0
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
        volumeMounts:
        - name: workspace
          mountPath: /home/coder
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: vscode-pvc

VS Code 配置 #

yaml
# 自定义 VS Code 设置
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: vscode-custom
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: vscode
        image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/codeserver-python:v1.8.0
        env:
        - name: VSCODE_ARGS
          value: "--disable-telemetry"
        volumeMounts:
        - name: vscode-settings
          mountPath: /home/coder/.local/share/code-server/User
      volumes:
      - name: vscode-settings
        configMap:
          name: vscode-settings

RStudio #

创建 RStudio Notebook #

yaml
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
  name: rstudio-notebook
  namespace: kubeflow-user-example-com
spec:
  template:
    spec:
      containers:
      - name: rstudio
        image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/rstudio-tidyverse:v1.8.0
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
        volumeMounts:
        - name: workspace
          mountPath: /home/rstudio
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: rstudio-pvc

管理 Notebook #

查看状态 #

bash
# 列出所有 Notebook
kubectl get notebooks -n kubeflow-user-example-com

# 查看 Notebook 详情
kubectl describe notebook my-notebook -n kubeflow-user-example-com

# 查看 Pod 状态
kubectl get pods -n kubeflow-user-example-com -l app=my-notebook

# 查看日志
kubectl logs -n kubeflow-user-example-com -l app=my-notebook -c notebook

停止和启动 #

bash
# 停止 Notebook(删除 Pod)
kubectl delete pod -n kubeflow-user-example-com -l app=my-notebook

# Notebook Controller 会自动重建 Pod

# 完全删除 Notebook
kubectl delete notebook my-notebook -n kubeflow-user-example-com

扩展资源 #

bash
# 编辑 Notebook 配置
kubectl edit notebook my-notebook -n kubeflow-user-example-com

# 修改资源配置后,Pod 会自动重建

最佳实践 #

资源管理 #

text
1. 合理配置资源
   ├── 根据工作负载设置 CPU/内存
   ├── 不使用时停止 Notebook
   └── 设置资源限制防止资源耗尽

2. 存储管理
   ├── 使用持久化存储保存数据
   ├── 定期清理不需要的文件
   └── 合理设置存储大小

3. 镜像管理
   ├── 使用预构建镜像
   ├── 或自定义镜像预装依赖
   └── 定期更新镜像版本

安全实践 #

text
1. 访问控制
   ├── 使用命名空间隔离
   ├── 配置 RBAC 权限
   └── 不要共享账户

2. 数据安全
   ├── 使用 Secret 存储敏感信息
   ├── 不要在代码中硬编码密钥
   └── 定期更新密码

3. 网络安全
   ├── 配置网络策略
   ├── 限制外部访问
   └── 使用 HTTPS

开发效率 #

text
1. 环境配置
   ├── 预装常用依赖
   ├── 配置环境变量
   └── 使用启动脚本

2. 代码管理
   ├── 使用 Git 版本控制
   ├── 定期提交代码
   └── 使用分支管理

3. 协作开发
   ├── 共享数据集
   ├── 使用共享存储
   └── 记录实验结果

故障排查 #

常见问题 #

bash
# Notebook 无法启动
kubectl describe notebook my-notebook -n kubeflow-user-example-com
kubectl get events -n kubeflow-user-example-com --sort-by='.lastTimestamp'

# 资源不足
kubectl describe nodes | grep -A 5 "Allocated resources"

# 镜像拉取失败
kubectl describe pod -n kubeflow-user-example-com -l app=my-notebook

# 存储问题
kubectl get pvc -n kubeflow-user-example-com
kubectl describe pvc my-notebook-pvc -n kubeflow-user-example-com

日志查看 #

bash
# 查看 Notebook 容器日志
kubectl logs -n kubeflow-user-example-com my-notebook-0 -c notebook

# 查看边车容器日志
kubectl logs -n kubeflow-user-example-com my-notebook-0 -c istio-proxy

# 实时查看日志
kubectl logs -f -n kubeflow-user-example-com my-notebook-0 -c notebook

下一步 #

现在你已经掌握了 Notebooks 的使用,接下来学习 Katib 超参数调优,了解如何自动化优化模型参数!

最后更新:2026-04-05