对话式 AI #

概述 #

对话式 AI（Conversational AI）是 ElevenLabs 的实时双向语音对话功能，支持超低延迟响应，适合构建 AI 客服、虚拟助手、游戏角色等应用。

text

┌─────────────────────────────────────────────────────────────┐
│                    对话式 AI 架构                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   用户语音 ───> 语音识别 ───> AI 处理 ───> 语音合成 ───> 输出│
│                                                             │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   │
│   │ 麦克风   │ → │ ASR     │ → │ LLM     │ → │ TTS     │   │
│   └─────────┘   └─────────┘   └─────────┘   └─────────┘   │
│                                                             │
│   延迟：< 1 秒（端到端）                                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

核心特性 #

超低延迟 #

text

┌─────────────────────────────────────────────────────────────┐
│                    延迟表现                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  端到端延迟：< 1 秒                                          │
│                                                             │
│  组成部分：                                                  │
│  ├── 语音识别 (ASR)：~200ms                                 │
│  ├── AI 处理 (LLM)：~300ms                                  │
│  ├── 语音合成 (TTS)：~300ms                                 │
│  └── 网络传输：~100ms                                       │
│                                                             │
│  优化选项：                                                  │
│  ├── 使用 Turbo 模型                                        │
│  ├── 流式处理                                               │
│  └── 预加载语音                                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

双向通信 #

text

功能：
├── 实时语音输入
├── 实时语音输出
├── 支持中断
├── 情感响应
└── 多轮对话

WebSocket 连接 #

连接端点 #

text

wss://api.elevenlabs.io/v1/convai/conversation

连接示例 #

python

import websocket
import json
import base64
import threading

class ConversationAI:
    def __init__(self, api_key, agent_config):
        self.api_key = api_key
        self.agent_config = agent_config
        self.ws = None
    
    def on_message(self, ws, message):
        event = json.loads(message)
        
        if event.get("type") == "audio":
            audio_data = base64.b64decode(event["audio"])
            self.play_audio(audio_data)
            
        elif event.get("type") == "transcript":
            print(f"User: {event['transcript']}")
            
        elif event.get("type") == "agent_response":
            print(f"Agent: {event['agent_response']}")
    
    def on_error(self, ws, error):
        print(f"Error: {error}")
    
    def on_close(self, ws, close_status_code, close_msg):
        print("Connection closed")
    
    def on_open(self, ws):
        config = {
            "agent": self.agent_config
        }
        ws.send(json.dumps(config))
    
    def connect(self):
        self.ws = websocket.WebSocketApp(
            "wss://api.elevenlabs.io/v1/convai/conversation",
            on_open=self.on_open,
            on_message=self.on_message,
            on_error=self.on_error,
            on_close=self.on_close,
            header={"xi-api-key": self.api_key}
        )
        
        thread = threading.Thread(target=self.ws.run_forever)
        thread.daemon = True
        thread.start()
    
    def send_audio(self, audio_data):
        if self.ws:
            message = json.dumps({
                "type": "audio",
                "audio": base64.b64encode(audio_data).decode()
            })
            self.ws.send(message)
    
    def play_audio(self, audio_data):
        pass

# 使用示例
agent_config = {
    "prompt": {
        "text": "You are a helpful assistant."
    },
    "first_message": "Hello! How can I help you today?",
    "language": "en",
    "voice": {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb"
    }
}

conv_ai = ConversationAI("your_api_key", agent_config)
conv_ai.connect()

Agent 配置 #

基本配置 #

python

agent_config = {
    "prompt": {
        "text": "You are a helpful customer service assistant.",
        "temperature": 0.7,
        "max_tokens": 150
    },
    "first_message": "Hello! Welcome to our service. How can I assist you?",
    "language": "en"
}

语音配置 #

python

agent_config = {
    "prompt": {
        "text": "You are a friendly assistant."
    },
    "voice": {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb",
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {
            "stability": 0.5,
            "similarity_boost": 0.75,
            "style": 0.3
        }
    }
}

完整配置选项 #

text

┌─────────────────────────────────────────────────────────────┐
│                    配置选项                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  prompt：                                                    │
│  ├── text - AI 行为提示                                     │
│  ├── temperature - 创造性 (0.0-1.0)                         │
│  └── max_tokens - 最大响应长度                              │
│                                                             │
│  first_message：                                             │
│  └── 开场白文本                                             │
│                                                             │
│  language：                                                  │
│  └── 对话语言代码                                           │
│                                                             │
│  voice：                                                     │
│  ├── voice_id - 语音 ID                                     │
│  ├── model_id - 模型 ID                                     │
│  └── voice_settings - 语音设置                              │
│                                                             │
│  asr：                                                       │
│  └── 语音识别设置                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

事件处理 #

输入事件 #

python

# 发送音频
def send_audio_event(ws, audio_data):
    event = {
        "type": "audio",
        "audio": base64.b64encode(audio_data).decode()
    }
    ws.send(json.dumps(event))

# 发送文本
def send_text_event(ws, text):
    event = {
        "type": "text",
        "text": text
    }
    ws.send(json.dumps(event))

# 中断当前响应
def send_interrupt_event(ws):
    event = {
        "type": "interrupt"
    }
    ws.send(json.dumps(event))

输出事件 #

python

def handle_event(event_data):
    event = json.loads(event_data)
    event_type = event.get("type")
    
    if event_type == "audio":
        audio = base64.b64decode(event["audio"])
        play_audio(audio)
        
    elif event_type == "transcript":
        print(f"User said: {event['transcript']}")
        
    elif event_type == "agent_response":
        print(f"Agent response: {event['agent_response']}")
        
    elif event_type == "interruption":
        print("Response was interrupted")
        
    elif event_type == "error":
        print(f"Error: {event['message']}")

应用场景 #

AI 客服 #

python

customer_service_config = {
    "prompt": {
        "text": """You are a professional customer service representative.
        Be polite, helpful, and efficient.
        Answer questions about products and services.
        Escalate complex issues when necessary."""
    },
    "first_message": "Thank you for calling. How may I help you today?",
    "language": "en",
    "voice": {
        "voice_id": "JBFqnCBsd6RMkjVDRZzb"
    }
}

虚拟助手 #

python

assistant_config = {
    "prompt": {
        "text": """You are a personal assistant.
        Help with scheduling, reminders, and general questions.
        Be concise and helpful."""
    },
    "first_message": "Hi! I'm your assistant. What can I do for you?",
    "language": "en"
}

游戏角色 #

python

game_character_config = {
    "prompt": {
        "text": """You are a wise old wizard character in a fantasy game.
        Speak in a mysterious and magical way.
        Give hints and guidance to players."""
    },
    "first_message": "Greetings, young adventurer...",
    "language": "en",
    "voice": {
        "voice_id": "wizard_voice_id",
        "voice_settings": {
            "stability": 0.3,
            "style": 0.7
        }
    }
}

最佳实践 #

延迟优化 #

text

┌─────────────────────────────────────────────────────────────┐
│                    延迟优化建议                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  模型选择：                                                  │
│  └── 使用 eleven_turbo_v2_5 获得最低延迟                    │
│                                                             │
│  音频格式：                                                  │
│  ├── 使用低采样率 (16kHz)                                   │
│  └── 使用 PCM 格式                                          │
│                                                             │
│  网络优化：                                                  │
│  ├── 使用稳定网络连接                                       │
│  ├── 选择就近服务器                                         │
│  └── 启用 WebSocket 压缩                                    │
│                                                             │
│  提示词优化：                                                │
│  ├── 保持提示词简洁                                         │
│  └── 限制 max_tokens                                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

错误处理 #

python

class RobustConversationAI:
    def __init__(self, api_key, agent_config):
        self.api_key = api_key
        self.agent_config = agent_config
        self.reconnect_attempts = 0
        self.max_reconnect_attempts = 5
    
    def handle_error(self, error):
        if "rate_limit" in str(error):
            time.sleep(5)
            self.reconnect()
        elif "connection" in str(error):
            self.reconnect()
        else:
            print(f"Fatal error: {error}")
    
    def reconnect(self):
        if self.reconnect_attempts < self.max_reconnect_attempts:
            self.reconnect_attempts += 1
            time.sleep(2 ** self.reconnect_attempts)
            self.connect()
        else:
            print("Max reconnection attempts reached")

限制说明 #

text

┌─────────────────────────────────────────────────────────────┐
│                    功能限制                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  计划要求：                                                  │
│  └── 需要 Pro 计划或更高                                    │
│                                                             │
│  并发限制：                                                  │
│  ├── Pro：10 个并发对话                                     │
│  └── Enterprise：自定义                                     │
│                                                             │
│  时长限制：                                                  │
│  └── 根据计划有所不同                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

下一步 #

WebSocket 实时语音 - 详细 WebSocket 文档
API 参考 - 完整 API 文档
最佳实践 - 开发最佳实践