预训练模型 #

模型概述 #

Coqui TTS 提供了丰富的预训练模型，覆盖多种语言和应用场景。

模型命名规则 #

text

模型名称格式：
tts_models/<语言>/<数据集>/<模型架构>

示例：
tts_models/en/ljspeech/vits
├── en: 英语
├── ljspeech: LJSpeech 数据集
└── vits: VITS 模型架构

模型类型 #

text

┌─────────────────────────────────────────────────────────────┐
│                     模型类型分类                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  按语言分类：                                                │
│  ├── 单语言模型：针对特定语言优化                            │
│  ├── 多语言模型：支持多种语言                                │
│  └── 语言特定模型：针对特定语言设计                          │
│                                                             │
│  按说话人分类：                                              │
│  ├── 单说话人模型：一种声音                                  │
│  └── 多说话人模型：多种声音可选                              │
│                                                             │
│  按功能分类：                                                │
│  ├── 标准合成模型：文本转语音                                │
│  ├── 声音克隆模型：支持声音克隆                              │
│  └── 情感模型：支持情感控制                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

查看可用模型 #

使用 CLI 查看 #

bash

# 列出所有模型
tts --list_models

# 输出示例：
# Name format: type/language/dataset/model
# 1. tts_models/multilingual/multi-dataset/xtts_v2
# 2. tts_models/en/ljspeech/vits
# 3. tts_models/en/ljspeech/tacotron2-DDC
# ...

使用 Python 查看 #

python

from TTS.api import TTS

# 获取所有模型
models = TTS.list_models()
for model in models:
    print(model)

# 过滤特定语言
english_models = [m for m in models if m.startswith("tts_models/en")]
print(f"英语模型数量: {len(english_models)}")

# 过滤多语言模型
multilingual = [m for m in models if "multilingual" in m]
print(f"多语言模型: {multilingual}")

按语言分类的模型 #

英语模型 #

python

english_models = {
    "高质量单说话人": "tts_models/en/ljspeech/vits",
    "经典模型": "tts_models/en/ljspeech/tacotron2-DDC",
    "快速推理": "tts_models/en/ljspeech/fast_speech",
    "多说话人": "tts_models/en/vctk/vits",
    "女声": "tts_models/en/jenny/jenny",
    "男声": "tts_models/en/blizzard2013/capacitron-t2-c150_v2",
}

中文模型 #

python

chinese_models = {
    "标准中文": "tts_models/zh-CN/baker/tacotron2-DDC_GST",
    "多语言（含中文）": "tts_models/multilingual/multi-dataset/xtts_v2",
}

其他语言模型 #

python

other_models = {
    "日语": "tts_models/ja/kokoro/tacotron2-DDC",
    "德语": "tts_models/de/thorsten/tacotron2-DDC",
    "法语": "tts_models/fr/css10/vits",
    "西班牙语": "tts_models/es/css10/vits",
    "韩语": "tts_models/kokoro/tacotron2-DDC",
    "意大利语": "tts_models/it-css10/vits",
}

声码器模型 #

声码器将梅尔频谱图转换为音频波形。

查看声码器模型 #

python

from TTS.api import TTS

models = TTS.list_models()
vocoder_models = [m for m in models if m.startswith("vocoder_models/")]
for model in vocoder_models[:10]:
    print(model)

使用自定义声码器 #

python

from TTS.api import TTS

# 使用默认声码器
tts = TTS("tts_models/en/ljspeech/tacotron2-DDC")

# 指定声码器
tts = TTS(
    model_name="tts_models/en/ljspeech/tacotron2-DDC",
    vocoder_name="vocoder_models/en/ljspeech/hifigan_v2"
)

模型下载与管理 #

自动下载 #

python

from TTS.api import TTS

# 首次使用自动下载
tts = TTS("tts_models/en/ljspeech/vits")
# 模型下载到 ~/.local/share/tts/

手动下载 #

python

from TTS.utils.manage import ModelManager

manager = ModelManager()

# 下载模型
model_path, config_path, model_item = manager.download_model(
    "tts_models/en/ljspeech/vits"
)

print(f"模型路径: {model_path}")
print(f"配置路径: {config_path}")

离线使用 #

python

from TTS.api import TTS

# 指定本地模型路径
tts = TTS(
    model_path="/path/to/model.pth",
    config_path="/path/to/config.json"
)

模型缓存管理 #

bash

# 查看模型存储位置
ls ~/.local/share/tts/

# 查看缓存大小
du -sh ~/.local/share/tts/

# 清理特定模型
rm -rf ~/.local/share/tts/tts_models--en--ljspeech--vits

模型配置 #

查看模型配置 #

python

from TTS.api import TTS

tts = TTS("tts_models/en/ljspeech/vits")

# 查看模型信息
print(f"采样率: {tts.synthesizer.output_sample_rate}")
print(f"说话人: {tts.speakers}")
print(f"语言: {tts.languages}")

自定义配置 #

python

from TTS.config import load_config
from TTS.tts.models import setup_model

# 加载配置
config = load_config("/path/to/config.json")

# 修改配置
config.audio.sample_rate = 22050
config.training.batch_size = 16

# 使用配置创建模型
model = setup_model(config)

模型性能对比 #

质量对比 #

text

┌─────────────────────────────────────────────────────────────┐
│                   模型质量评分（1-10）                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  模型              自然度    清晰度    韵律    综合评分      │
│  ─────────────────────────────────────────────────────────  │
│  XTTS v2           9.0      9.0      8.5      8.8          │
│  VITS              8.5      8.5      8.0      8.3          │
│  Tacotron 2        8.0      8.0      7.5      7.8          │
│  FastSpeech2       7.5      8.0      7.0      7.5          │
│  Glow-TTS          7.5      7.5      7.0      7.3          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

速度对比 #

python

import time
from TTS.api import TTS

models = [
    "tts_models/en/ljspeech/vits",
    "tts_models/en/ljspeech/tacotron2-DDC",
    "tts_models/en/ljspeech/fast_speech",
]

text = "This is a performance test sentence for TTS models."

for model_name in models:
    tts = TTS(model_name)
    
    start = time.time()
    tts.tts_to_file(text=text, file_path="test.wav")
    elapsed = time.time() - start
    
    print(f"{model_name}: {elapsed:.2f}s")

模型选择指南 #

按场景选择 #

text

┌─────────────────────────────────────────────────────────────┐
│                     场景选择指南                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  有声读物制作：                                              │
│  └── XTTS v2 或 VITS（高质量）                              │
│                                                             │
│  实时对话：                                                  │
│  └── FastSpeech2 或 VITS（低延迟）                          │
│                                                             │
│  声音克隆：                                                  │
│  └── XTTS v2（最佳克隆效果）                                │
│                                                             │
│  多语言应用：                                                │
│  └── XTTS v2（支持 1100+ 语言）                             │
│                                                             │
│  资源受限环境：                                              │
│  └── VITS（模型小、速度快）                                 │
│                                                             │
│  研究学习：                                                  │
│  └── Tacotron 2（经典架构）                                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

按资源选择 #

text

GPU 显存需求：
├── XTTS v2: 4-8 GB
├── VITS: 1-2 GB
├── Tacotron 2: 2-4 GB
└── FastSpeech2: 1-2 GB

CPU 可用性：
├── XTTS v2: 可用（较慢）
├── VITS: 可用（中等）
├── Tacotron 2: 可用（较慢）
└── FastSpeech2: 可用（较快）

下一步 #

了解了预训练模型后，继续学习语音合成，深入了解文本处理和语音生成的细节！

预训练模型 #

模型概述 #

模型命名规则 #

模型类型 #

查看可用模型 #

使用 CLI 查看 #

使用 Python 查看 #

热门模型介绍 #

XTTS v2（推荐） #

VITS #

Tacotron 2 #

FastSpeech2 #

按语言分类的模型 #

英语模型 #

中文模型 #

其他语言模型 #

声码器模型 #

查看声码器模型 #

热门声码器 #

使用自定义声码器 #

模型下载与管理 #

自动下载 #

手动下载 #

离线使用 #

模型缓存管理 #

模型配置 #

查看模型配置 #

自定义配置 #

模型性能对比 #

质量对比 #

速度对比 #

模型选择指南 #

按场景选择 #

按资源选择 #

下一步 #