安装与配置 #

系统要求 #

硬件要求 #

text

最低配置（CPU 推理）
├── CPU: 现代 x86_64 处理器
├── RAM: 4 GB 以上
├── 存储: 10 GB 可用空间
└── 适合: tiny、base 模型

推荐配置（GPU 推理）
├── GPU: NVIDIA GPU（计算能力 3.5+）
├── VRAM: 8 GB 以上（large 模型需要 10 GB）
├── RAM: 16 GB 以上
└── 存储: 20 GB 可用空间

软件要求 #

软件	版本要求	说明
Python	3.8 - 3.11	推荐 3.10
PyTorch	1.10+	推荐 2.0+
CUDA	11.0+	GPU 加速需要
ffmpeg	4.0+	音频处理

安装方法 #

方法一：pip 安装（推荐） #

bash

pip install openai-whisper

方法二：从源码安装 #

bash

git clone https://github.com/openai/whisper.git
cd whisper
pip install -e .

方法三：使用 conda #

bash

conda install -c conda-forge openai-whisper

安装依赖 #

安装 ffmpeg #

Whisper 依赖 ffmpeg 进行音频处理。

macOS:

bash

brew install ffmpeg

Ubuntu/Debian:

bash

sudo apt update
sudo apt install ffmpeg

Windows:

bash

winget install ffmpeg

或从官网下载：https://ffmpeg.org/download.html

安装 PyTorch #

CPU 版本:

bash

pip install torch

GPU 版本（CUDA 11.8）:

bash

pip install torch --index-url https://download.pytorch.org/whl/cu118

GPU 版本（CUDA 12.1）:

bash

pip install torch --index-url https://download.pytorch.org/whl/cu121

GPU 配置 #

检查 CUDA 是否可用 #

python

import torch

print(f"CUDA 可用: {torch.cuda.is_available()}")
print(f"CUDA 版本: {torch.version.cuda}")
print(f"GPU 数量: {torch.cuda.device_count()}")

if torch.cuda.is_available():
    print(f"GPU 名称: {torch.cuda.get_device_name(0)}")
    print(f"GPU 显存: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

安装 CUDA Toolkit #

下载地址: https://developer.nvidia.com/cuda-downloads

安装后验证:

bash

nvcc --version
nvidia-smi

GPU 内存需求 #

模型	VRAM 需求	推荐显卡
tiny	~1 GB	集成显卡
base	~1 GB	GTX 1050
small	~2 GB	GTX 1650
medium	~5 GB	RTX 2060
large	~10 GB	RTX 3080
large-v3	~10 GB	RTX 3080

验证安装 #

基本验证 #

python

import whisper

print(f"Whisper 版本: {whisper.__version__}")

model = whisper.load_model("base")
print("模型加载成功！")

完整测试 #

python

import whisper

model = whisper.load_model("base")
result = model.transcribe("test.mp3")
print(result["text"])

虚拟环境配置 #

使用 venv #

bash

python -m venv whisper-env
source whisper-env/bin/activate  # Linux/macOS
whisper-env\Scripts\activate     # Windows

pip install openai-whisper

使用 conda #

bash

conda create -n whisper python=3.10
conda activate whisper
pip install openai-whisper

使用 Poetry #

bash

poetry new whisper-project
cd whisper-project
poetry add openai-whisper

模型下载 #

自动下载 #

首次使用时，模型会自动下载到缓存目录：

python

import whisper

model = whisper.load_model("base")

手动下载 #

bash

mkdir -p ~/.cache/whisper

tiny:    https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt
base:    https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt
small:   https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
medium:  https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt
large:   https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

指定下载目录 #

python

import whisper

model = whisper.load_model("base", download_root="/path/to/models")

常见问题 #

问题 1: ffmpeg 未找到 #

text

错误信息: FileNotFoundError: ffmpeg not found

解决方案:

bash

brew install ffmpeg          # macOS
sudo apt install ffmpeg      # Ubuntu
winget install ffmpeg        # Windows

问题 2: CUDA 内存不足 #

text

错误信息: CUDA out of memory

解决方案:

python

import whisper

model = whisper.load_model("small")

或使用 CPU:

python

model = whisper.load_model("base", device="cpu")

问题 3: 模型下载失败 #

text

错误信息: ConnectionError / Timeout

解决方案:

使用代理
手动下载模型文件
使用镜像站点

python

import os
os.environ['HTTP_PROXY'] = 'http://proxy:port'
os.environ['HTTPS_PROXY'] = 'http://proxy:port'

问题 4: 音频格式不支持 #

text

错误信息: Unsupported audio format

解决方案:

bash

ffmpeg -i input.m4a output.mp3

问题 5: 权限错误 #

text

错误信息: Permission denied

解决方案:

bash

chmod +x ~/.cache/whisper/*

配置文件 #

创建配置文件 #

python

import whisper

class WhisperConfig:
    def __init__(self):
        self.model_size = "base"
        self.device = "cuda" if whisper.torch.cuda.is_available() else "cpu"
        self.language = None
        self.task = "transcribe"
        self.temperature = 0.0
        self.compression_ratio_threshold = 2.4
        self.logprob_threshold = -1.0
        self.no_speech_threshold = 0.6

config = WhisperConfig()
model = whisper.load_model(config.model_size, device=config.device)

使用 YAML 配置 #

yaml

whisper:
  model_size: base
  device: cuda
  language: zh
  task: transcribe
  temperature: 0.0
  compression_ratio_threshold: 2.4
  logprob_threshold: -1.0
  no_speech_threshold: 0.6

python

import yaml
import whisper

with open("config.yaml", "r") as f:
    config = yaml.safe_load(f)

model = whisper.load_model(
    config["whisper"]["model_size"],
    device=config["whisper"]["device"]
)

下一步 #

环境配置完成后，继续学习快速开始，开始你的第一个语音转录任务！