OpenAI 图像生成 #

什么是 DALL·E？ #

DALL·E 是 OpenAI 开发的 AI 图像生成模型，能够根据文本描述生成高质量图像。它结合了 GPT 的语言理解能力和图像生成技术，可以创造出令人惊叹的视觉作品。

text

┌─────────────────────────────────────────────────────────────┐
│                    DALL·E 工作原理                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   输入：文本描述                                            │
│   "一只穿着宇航服的猫，站在月球上，背景是地球"               │
│                                                             │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                                                     │   │
│   │   文本理解 ───> 概念组合 ───> 图像生成               │   │
│   │                                                     │   │
│   └─────────────────────────────────────────────────────┘   │
│                                                             │
│   输出：生成的图像                                          │
│   [高质量 AI 生成图像]                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

DALL·E 版本对比 #

text

┌─────────────────────────────────────────────────────────────┐
│                    DALL·E 版本对比                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  DALL·E 3（推荐）                                            │
│  ─────────────────────────────────────────────────────────  │
│  ✅ 最高质量图像                                            │
│  ✅ 更好的提示词理解                                        │
│  ✅ 图像中可渲染文字                                        │
│  ✅ 安全性增强                                              │
│  ✅ 支持多种尺寸                                            │
│                                                             │
│  DALL·E 2                                                   │
│  ─────────────────────────────────────────────────────────  │
│  ✅ 支持图像编辑                                            │
│  ✅ 支持图像变体                                            │
│  ✅ 成本较低                                                │
│  ⚠️ 质量不如 DALL·E 3                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

基本用法 #

生成图像 #

python

from openai import OpenAI

client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="一只可爱的猫咪，坐在窗台上看外面的雨景，油画风格",
    size="1024x1024",
    quality="standard",
    n=1
)

image_url = response.data[0].url
print(f"图像 URL: {image_url}")

Node.js 示例 #

javascript

import OpenAI from 'openai';

const client = new OpenAI();

async function generateImage() {
  const response = await client.images.generate({
    model: 'dall-e-3',
    prompt: '一只可爱的猫咪，坐在窗台上看外面的雨景，油画风格',
    size: '1024x1024',
    quality: 'standard',
    n: 1
  });

  console.log(response.data[0].url);
}

generateImage();

cURL 示例 #

bash

curl https://api.openai.com/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "dall-e-3",
    "prompt": "一只可爱的猫咪",
    "size": "1024x1024",
    "n": 1
  }'

参数详解 #

model（模型） #

text

┌─────────────────────────────────────────────────────────────┐
│                    model 参数                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  可选值：                                                    │
│  ─────────────────────────────────────────────────────────  │
│  dall-e-3    最新版本，质量最高（推荐）                      │
│  dall-e-2    旧版本，支持编辑和变体                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

prompt（提示词） #

text

┌─────────────────────────────────────────────────────────────┐
│                    prompt 参数                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  描述：图像的文本描述                                        │
│  类型：string                                               │
│  最大长度：4000 字符（DALL·E 3）                            │
│                                                             │
│  提示词技巧：                                                │
│  ─────────────────────────────────────────────────────────  │
│  1. 描述要具体详细                                          │
│  2. 包含风格描述                                            │
│  3. 指定构图和视角                                          │
│  4. 添加光线和氛围描述                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

size（尺寸） #

text

┌─────────────────────────────────────────────────────────────┐
│                    size 参数                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  DALL·E 3 支持的尺寸：                                       │
│  ─────────────────────────────────────────────────────────  │
│  1024x1024    正方形                                        │
│  1792x1024    横向                                          │
│  1024x1792    纵向                                          │
│                                                             │
│  DALL·E 2 支持的尺寸：                                       │
│  ─────────────────────────────────────────────────────────  │
│  256x256                                                     │
│  512x512                                                     │
│  1024x1024                                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

quality（质量） #

text

┌─────────────────────────────────────────────────────────────┐
│                    quality 参数                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  可选值：                                                    │
│  ─────────────────────────────────────────────────────────  │
│  standard    标准质量                                       │
│  hd          高清质量（仅 DALL·E 3）                        │
│                                                             │
│  注意：                                                     │
│  hd 质量成本更高，但细节更丰富                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

n（数量） #

text

┌─────────────────────────────────────────────────────────────┐
│                    n 参数                                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  描述：生成的图像数量                                        │
│  类型：integer                                              │
│                                                             │
│  DALL·E 3：n = 1（只支持生成一张）                          │
│  DALL·E 2：n = 1-10                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

response_format（响应格式） #

text

┌─────────────────────────────────────────────────────────────┐
│                    response_format 参数                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  可选值：                                                    │
│  ─────────────────────────────────────────────────────────  │
│  url       返回临时 URL（默认，1小时有效）                   │
│  b64_json  返回 Base64 编码                                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

style（风格） #

text

┌─────────────────────────────────────────────────────────────┐
│                    style 参数                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  仅 DALL·E 3 支持                                            │
│  ─────────────────────────────────────────────────────────  │
│  vivid      生动、超现实风格（默认）                        │
│  natural    自然、写实风格                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

提示词技巧 #

基本结构 #

text

┌─────────────────────────────────────────────────────────────┐
│                    提示词结构                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  主体 + 动作/状态 + 环境 + 风格 + 技术参数                   │
│                                                             │
│  示例：                                                      │
│  "一只橙色的猫（主体）                                       │
│   蜷缩在沙发上睡觉（动作/状态）                              │
│   温暖的客厅，阳光透过窗户（环境）                           │
│   油画风格，印象派（风格）                                   │
│   柔和光线，高细节（技术参数）"                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

风格关键词 #

python

styles = {
    "绘画风格": [
        "油画", "水彩画", "素描", "国画", "丙烯画",
        "印象派", "超现实主义", "波普艺术", "极简主义"
    ],
    "数字艺术": [
        "数字绘画", "概念艺术", "赛博朋克", "像素艺术",
        "3D渲染", "矢量图", "扁平化设计"
    ],
    "摄影风格": [
        "专业摄影", "人像摄影", "风景摄影", "微距摄影",
        "黑白摄影", "复古摄影", "电影感"
    ],
    "艺术流派": [
        "梵高风格", "莫奈风格", "毕加索风格", "宫崎骏风格",
        "新海诚风格", "吉卜力风格"
    ]
}

提示词示例 #

python

prompts = [
    "一只穿着宇航服的猫咪，站在月球表面，背景是蓝色的地球升起，科幻电影风格，高细节，电影级光影",
    
    "古老的中国园林，亭台楼阁，小桥流水，樱花飘落，水墨画风格，意境悠远",
    
    "未来城市的天际线，霓虹灯闪烁，飞行汽车穿梭，赛博朋克风格，夜景，高对比度",
    
    "一杯冒着热气的咖啡，放在木质桌面上，旁边有一本书，阳光从窗户照进来，温馨氛围，产品摄影",
    
    "一只可爱的柴犬，戴着墨镜，坐在沙滩椅上，喝着椰子汁，卡通风格，明亮色彩"
]

for prompt in prompts:
    response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="standard",
        n=1
    )
    print(response.data[0].url)

图像编辑（DALL·E 2） #

编辑图像 #

python

response = client.images.edit(
    model="dall-e-2",
    image=open("original.png", "rb"),
    mask=open("mask.png", "rb"),
    prompt="一只白色的猫坐在那里",
    n=1,
    size="1024x1024"
)

print(response.data[0].url)

编辑流程 #

text

┌─────────────────────────────────────────────────────────────┐
│                    图像编辑流程                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   原始图像          遮罩图像           编辑结果              │
│   ┌─────────┐      ┌─────────┐       ┌─────────┐           │
│   │ 🐱      │      │ ███     │       │ 🐱      │           │
│   │         │  +   │         │   =   │ 🐱      │           │
│   │         │      │         │       │         │           │
│   └─────────┘      └─────────┘       └─────────┘           │
│                                                             │
│   遮罩区域（白色）会被重新生成                               │
│   黑色区域保持不变                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

图像变体（DALL·E 2） #

生成变体 #

python

response = client.images.create_variation(
    model="dall-e-2",
    image=open("original.png", "rb"),
    n=4,
    size="1024x1024"
)

for i, data in enumerate(response.data):
    print(f"变体 {i+1}: {data.url}")

保存图像 #

下载并保存 #

python

import requests
from openai import OpenAI

client = OpenAI()

def generate_and_save(prompt: str, filename: str):
    response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        n=1
    )
    
    image_url = response.data[0].url
    
    image_data = requests.get(image_url).content
    
    with open(filename, "wb") as f:
        f.write(image_data)
    
    print(f"图像已保存: {filename}")

generate_and_save(
    "一只可爱的猫咪",
    "cat.png"
)

使用 Base64 #

python

import base64
from openai import OpenAI

client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="一只可爱的猫咪",
    size="1024x1024",
    n=1,
    response_format="b64_json"
)

image_data = base64.b64decode(response.data[0].b64_json)

with open("cat.png", "wb") as f:
    f.write(image_data)

响应结构 #

完整响应 #

python

response = client.images.generate(
    model="dall-e-3",
    prompt="一只可爱的猫咪",
    size="1024x1024",
    n=1
)

print(f"创建时间: {response.created}")
print(f"图像数量: {len(response.data)}")

for i, data in enumerate(response.data):
    print(f"\n图像 {i+1}:")
    print(f"  URL: {data.url}")
    if hasattr(data, 'revised_prompt'):
        print(f"  优化后的提示词: {data.revised_prompt}")

revised_prompt #

text

┌─────────────────────────────────────────────────────────────┐
│                    revised_prompt 说明                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  DALL·E 3 会自动优化用户的提示词，添加更多细节：             │
│                                                             │
│  原始提示词：                                                │
│  "一只猫"                                                    │
│                                                             │
│  优化后的提示词：                                            │
│  "一只毛茸茸的橙色猫咪，有着明亮的绿色眼睛，                 │
│   坐在柔软的垫子上，柔和的光线，细节丰富..."                 │
│                                                             │
│  这有助于生成更高质量的图像                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

实际应用示例 #

批量生成 #

python

from openai import OpenAI
import time

client = OpenAI()

def batch_generate(prompts: list):
    results = []
    
    for i, prompt in enumerate(prompts):
        print(f"生成第 {i+1}/{len(prompts)} 张图像...")
        
        try:
            response = client.images.generate(
                model="dall-e-3",
                prompt=prompt,
                size="1024x1024",
                n=1
            )
            results.append({
                "prompt": prompt,
                "url": response.data[0].url
            })
        except Exception as e:
            print(f"错误: {e}")
        
        time.sleep(1)
    
    return results

prompts = [
    "春天的樱花",
    "夏天的海滩",
    "秋天的枫叶",
    "冬天的雪景"
]

results = batch_generate(prompts)
for r in results:
    print(f"{r['prompt']}: {r['url']}")

Web 应用集成 #

python

from flask import Flask, request, jsonify
from openai import OpenAI

app = Flask(__name__)
client = OpenAI()

@app.route('/generate', methods=['POST'])
def generate():
    data = request.json
    prompt = data.get('prompt')
    
    if not prompt:
        return jsonify({"error": "缺少 prompt 参数"}), 400
    
    try:
        response = client.images.generate(
            model="dall-e-3",
            prompt=prompt,
            size="1024x1024",
            n=1
        )
        
        return jsonify({
            "url": response.data[0].url,
            "revised_prompt": response.data[0].revised_prompt
        })
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

图像生成器类 #

python

from openai import OpenAI
import requests
from typing import Optional
from pathlib import Path

class ImageGenerator:
    def __init__(self, model: str = "dall-e-3"):
        self.client = OpenAI()
        self.model = model
    
    def generate(
        self,
        prompt: str,
        size: str = "1024x1024",
        quality: str = "standard",
        style: str = "vivid"
    ) -> dict:
        """生成图像"""
        response = self.client.images.generate(
            model=self.model,
            prompt=prompt,
            size=size,
            quality=quality,
            style=style,
            n=1
        )
        
        return {
            "url": response.data[0].url,
            "revised_prompt": response.data[0].revised_prompt
        }
    
    def generate_and_save(
        self,
        prompt: str,
        save_path: str,
        **kwargs
    ) -> str:
        """生成并保存图像"""
        result = self.generate(prompt, **kwargs)
        
        image_data = requests.get(result["url"]).content
        
        Path(save_path).parent.mkdir(parents=True, exist_ok=True)
        with open(save_path, "wb") as f:
            f.write(image_data)
        
        return save_path
    
    def enhance_prompt(self, base_prompt: str, style: str = None) -> str:
        """增强提示词"""
        enhancements = {
            "realistic": "photorealistic, highly detailed, professional photography",
            "artistic": "artistic, creative, expressive brushstrokes",
            "cinematic": "cinematic lighting, dramatic atmosphere, movie scene",
            "minimalist": "minimalist, clean lines, simple composition"
        }
        
        if style and style in enhancements:
            return f"{base_prompt}, {enhancements[style]}"
        
        return base_prompt

generator = ImageGenerator()

result = generator.generate_and_save(
    "一只可爱的猫咪",
    "images/cat.png",
    quality="hd"
)
print(f"图像已保存: {result}")

限制与注意事项 #

内容政策 #

text

┌─────────────────────────────────────────────────────────────┐
│                    内容限制                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  禁止生成：                                                  │
│  ❌ 暴力、血腥内容                                          │
│  ❌ 色情、成人内容                                          │
│  ❌ 仇恨、歧视内容                                          │
│  ❌ 政治敏感内容                                            │
│  ❌ 公众人物肖像                                            │
│  ❌ 商标、版权内容                                          │
│                                                             │
│  注意：                                                     │
│  DALL·E 3 有自动内容审核                                    │
│  违规请求会被拒绝                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用限制 #

text

┌─────────────────────────────────────────────────────────────┤
│                    使用限制                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  速率限制：                                                  │
│  - 每分钟请求数有限制                                       │
│  - 根据账户等级不同                                         │
│                                                             │
│  图像大小：                                                  │
│  - DALL·E 3 只支持特定尺寸                                  │
│  - 不能自定义分辨率                                         │
│                                                             │
│  生成数量：                                                  │
│  - DALL·E 3 每次只能生成 1 张                               │
│  - 需要多张需要多次调用                                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最佳实践 #

1. 提示词要具体 #

python

bad_prompt = "一只猫"

good_prompt = """
一只毛茸茸的橙色猫咪，有着明亮的绿色眼睛，
坐在窗台上，窗外是下雨的城市街道，
柔和的室内光线，温馨的氛围，
写实风格，高细节
"""

2. 指定风格和质量 #

python

response = client.images.generate(
    model="dall-e-3",
    prompt=prompt,
    size="1024x1024",
    quality="hd",
    style="vivid"
)

3. 错误处理 #

python

from openai import OpenAI, APIError, RateLimitError

client = OpenAI()

def safe_generate(prompt: str, retries: int = 3):
    for attempt in range(retries):
        try:
            response = client.images.generate(
                model="dall-e-3",
                prompt=prompt,
                size="1024x1024",
                n=1
            )
            return response.data[0].url
        except RateLimitError:
            print(f"速率限制，等待重试...")
            time.sleep(2 ** attempt)
        except APIError as e:
            print(f"API 错误: {e}")
            if "content_policy" in str(e).lower():
                print("内容违反政策")
                return None
    return None

下一步 #

现在你已经掌握了图像生成的使用方法，接下来学习文本嵌入，了解如何将文本转换为向量！