PyTorch 安装与配置 #

安装前的准备 #

在安装 PyTorch 之前，我们需要了解几个关键概念：

CPU vs GPU 版本 #

text

┌─────────────────────────────────────────────────────────────┐
│                    PyTorch 版本选择                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  CPU 版本：                                                  │
│  ✅ 无需 GPU 硬件                                            │
│  ✅ 安装简单                                                 │
│  ❌ 训练速度慢                                               │
│  适用：学习、小规模实验                                      │
│                                                             │
│  GPU 版本：                                                  │
│  ✅ 训练速度快 10-100 倍                                     │
│  ✅ 支持大规模模型                                           │
│  ❌ 需要 NVIDIA GPU                                          │
│  ❌ 安装配置复杂                                             │
│  适用：生产、大规模训练                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

CUDA 版本 #

text

┌─────────────────────────────────────────────────────────────┐
│                    CUDA 版本说明                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  CUDA 是 NVIDIA 的 GPU 计算平台                              │
│                                                             │
│  版本对应关系：                                              │
│                                                             │
│  ┌─────────────┬─────────────┬─────────────┐               │
│  │ GPU 驱动版本 │ CUDA 版本   │ 推荐场景     │               │
│  ├─────────────┼─────────────┼─────────────┤               │
│  │ ≥ 525.60.13 │ CUDA 12.1   │ 最新         │               │
│  │ ≥ 520.61.05 │ CUDA 11.8   │ 稳定         │               │
│  │ ≥ 470.42.01 │ CUDA 11.7   │ 兼容性好     │               │
│  └─────────────┴─────────────┴─────────────┘               │
│                                                             │
│  查看驱动版本：nvidia-smi                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

系统要求 #

硬件要求 #

组件	最低要求	推荐配置
CPU	双核	四核以上
内存	8 GB	16 GB+
GPU	-	NVIDIA GTX 1060+
硬盘	10 GB	SSD 50 GB+

软件要求 #

软件	版本要求
Python	3.8 - 3.11
pip	最新版本
CUDA（GPU）	11.7 / 11.8 / 12.1
cuDNN（GPU）	对应 CUDA 版本

安装方式 #

方式一：使用 pip 安装（推荐） #

CPU 版本 #

bash

pip install torch torchvision torchaudio

GPU 版本（CUDA 11.8） #

bash

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

GPU 版本（CUDA 12.1） #

bash

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

方式二：使用 Conda 安装 #

安装 Anaconda 或 Miniconda #

bash

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

创建虚拟环境 #

bash

conda create -n pytorch python=3.10
conda activate pytorch

安装 PyTorch #

bash

conda install pytorch torchvision torchaudio cpuonly -c pytorch

bash

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

方式三：从源码编译 #

bash

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
python setup.py install

验证安装 #

基本验证 #

python

import torch

print(f"PyTorch 版本: {torch.__version__}")
print(f"CUDA 是否可用: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA 版本: {torch.version.cuda}")
    print(f"GPU 数量: {torch.cuda.device_count()}")
    print(f"GPU 名称: {torch.cuda.get_device_name(0)}")

测试 GPU 加速 #

python

import torch
import time

x = torch.randn(5000, 5000)
y = torch.randn(5000, 5000)

start = time.time()
for _ in range(100):
    z = torch.matmul(x, y)
print(f"CPU 时间: {time.time() - start:.2f}s")

if torch.cuda.is_available():
    x_gpu = x.cuda()
    y_gpu = y.cuda()
    
    torch.cuda.synchronize()
    start = time.time()
    for _ in range(100):
        z = torch.matmul(x_gpu, y_gpu)
    torch.cuda.synchronize()
    print(f"GPU 时间: {time.time() - start:.2f}s")

开发环境配置 #

VS Code 配置 #

text

┌─────────────────────────────────────────────────────────────┐
│                    VS Code 推荐扩展                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  必装扩展：                                                  │
│                                                             │
│  1. Python                                                  │
│     - Python 语言支持                                       │
│     - 代码补全、调试                                        │
│                                                             │
│  2. Pylance                                                 │
│     - 高级类型检查                                          │
│     - 智能补全                                              │
│                                                             │
│  3. Jupyter                                                 │
│     - Notebook 支持                                         │
│     - 交互式开发                                            │
│                                                             │
│  可选扩展：                                                  │
│                                                             │
│  - Python Debugger                                          │
│  - GitLens                                                  │
│  - Remote SSH                                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Jupyter Notebook 配置 #

bash

conda install jupyter
conda install nb_conda_kernels

jupyter notebook

python

%load_ext autoreload
%autoreload 2
%matplotlib inline

import torch
import torchvision
import matplotlib.pyplot as plt

PyCharm 配置 #

text

┌─────────────────────────────────────────────────────────────┐
│                    PyCharm 配置要点                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 解释器设置                                               │
│     Settings → Project → Python Interpreter                 │
│     选择 Conda 环境                                         │
│                                                             │
│  2. 科学模式                                                 │
│     View → Scientific Mode                                  │
│     支持变量查看、图表显示                                   │
│                                                             │
│  3. 远程开发                                                 │
│     配置 SSH 解释器                                         │
│     在服务器上运行代码                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

常见问题解决 #

问题 1：CUDA 版本不匹配 #

text

错误信息：
RuntimeError: CUDA out of memory
或
RuntimeError: Found no NVIDIA driver on your system

解决方案：
1. 检查 GPU 驱动版本
   nvidia-smi

2. 安装匹配的 CUDA 版本 PyTorch
   pip install torch --index-url https://download.pytorch.org/whl/cu118

3. 验证 CUDA 可用
   python -c "import torch; print(torch.cuda.is_available())"

问题 2：pip 安装速度慢 #

bash

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

pip install torch torchvision torchaudio

问题 3：内存不足 #

python

import torch

torch.cuda.empty_cache()

import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:128'

问题 4：版本冲突 #

bash

pip uninstall torch torchvision torchaudio

pip cache purge

pip install torch torchvision torchaudio

Docker 环境 #

使用官方镜像 #

bash

docker pull pytorch/pytorch:latest

docker run --gpus all -it pytorch/pytorch:latest bash

自定义 Dockerfile #

dockerfile

FROM nvidia/cuda:11.8-cudnn8-devel-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

WORKDIR /workspace

bash

docker build -t my-pytorch .
docker run --gpus all -v $(pwd):/workspace -it my-pytorch bash

云平台使用 #

Google Colab #

python

import torch

print(torch.__version__)
print(torch.cuda.is_available())

!nvidia-smi

AWS #

bash

aws ec2 run-instances \
    --image-id ami-0abcdef1234567890 \
    --instance-type p3.2xlarge \
    --key-name my-key

阿里云 PAI #

text

┌─────────────────────────────────────────────────────────────┐
│                    阿里云 PAI 配置                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 创建 PAI 实例                                            │
│     - 选择 GPU 实例类型                                      │
│     - 选择 PyTorch 镜像                                     │
│                                                             │
│  2. 连接实例                                                 │
│     - SSH 连接                                              │
│     - JupyterLab 界面                                       │
│                                                             │
│  3. 运行训练                                                 │
│     - 上传代码                                              │
│     - 执行训练脚本                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

项目结构建议 #

text

my-pytorch-project/
├── data/
│   ├── raw/
│   └── processed/
├── models/
│   ├── __init__.py
│   └── resnet.py
├── utils/
│   ├── __init__.py
│   └── data.py
├── configs/
│   └── config.yaml
├── train.py
├── test.py
├── requirements.txt
└── README.md

requirements.txt #

text

torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
tqdm>=4.65.0
tensorboard>=2.12.0

下一步 #

现在你已经完成了 PyTorch 的安装和配置，接下来学习张量基础，开始你的深度学习之旅！