核心概念 #

张量（Tensor） #

张量是深度学习中最基本的数据结构，是多维数组的泛化形式。

text

┌─────────────────────────────────────────────────────────────┐
│                    张量的维度                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  标量 (0D 张量)                                             │
│  ┌───┐                                                     │
│  │ 5 │  shape: ()                                          │
│  └───┘                                                     │
│                                                             │
│  向量 (1D 张量)                                             │
│  ┌───┬───┬───┬───┐                                         │
│  │ 1 │ 2 │ 3 │ 4 │  shape: (4,)                            │
│  └───┴───┴───┴───┘                                         │
│                                                             │
│  矩阵 (2D 张量)                                             │
│  ┌───┬───┬───┐                                             │
│  │ 1 │ 2 │ 3 │  shape: (3, 3)                              │
│  ├───┼───┼───┤                                             │
│  │ 4 │ 5 │ 6 │                                             │
│  ├───┼───┼───┤                                             │
│  │ 7 │ 8 │ 9 │                                             │
│  └───┴───┴───┘                                             │
│                                                             │
│  3D 张量 (如: 图像)                                         │
│  shape: (height, width, channels)                          │
│  例: (28, 28, 3) 彩色图像                                   │
│                                                             │
│  4D 张量 (如: 图像批次)                                     │
│  shape: (batch, height, width, channels)                   │
│  例: (32, 28, 28, 3) 32张彩色图像                           │
│                                                             │
│  5D 张量 (如: 视频批次)                                     │
│  shape: (batch, frames, height, width, channels)           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

张量操作 #

python

import keras
import numpy as np

x = keras.ops.array([[1, 2, 3], [4, 5, 6]])

print(f"形状: {keras.ops.shape(x)}")
print(f"数据类型: {x.dtype}")
print(f"维度: {keras.ops.ndim(x)}")

y = keras.ops.reshape(x, (3, 2))
print(f"重塑后: {y}")

z = keras.ops.transpose(x)
print(f"转置后: {z}")

常用张量操作 #

python

import keras

a = keras.ops.array([[1, 2], [3, 4]])
b = keras.ops.array([[5, 6], [7, 8]])

print(keras.ops.matmul(a, b))

print(keras.ops.add(a, b))

print(keras.ops.mean(a))

print(keras.ops.sum(a, axis=0))

层（Layer） #

层是神经网络的基本构建块，接收输入张量，输出变换后的张量。

text

┌─────────────────────────────────────────────────────────────┐
│                    层的工作原理                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  输入张量 ──► 层 ──► 输出张量                               │
│                                                             │
│  层的核心组成：                                              │
│  1. 权重 (Weights): 可学习的参数                            │
│  2. 偏置 (Bias): 可学习的偏移量                             │
│  3. 计算 (Computation): 前向传播逻辑                        │
│                                                             │
│  Dense 层示例:                                              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  output = activation(dot(input, kernel) + bias)     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  参数计算:                                                  │
│  输入维度: n, 输出维度: m                                   │
│  权重参数: n × m                                            │
│  偏置参数: m                                                │
│  总参数: n × m + m                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Dense 层详解 #

python

import keras

dense = keras.layers.Dense(64, activation='relu', input_shape=(32,))

model = keras.Sequential([dense])

print(f"权重形状: {dense.kernel.shape}")
print(f"偏置形状: {dense.bias.shape}")
print(f"总参数: {dense.count_params()}")

常用层类型 #

text

┌─────────────────────────────────────────────────────────────┐
│                    层的分类                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  核心层                                                     │
│  ├── Dense: 全连接层                                        │
│  ├── Activation: 激活函数层                                 │
│  ├── Flatten: 展平层                                        │
│  └── Reshape: 重塑层                                        │
│                                                             │
│  卷积层                                                     │
│  ├── Conv1D: 一维卷积                                       │
│  ├── Conv2D: 二维卷积                                       │
│  ├── Conv3D: 三维卷积                                       │
│  ├── MaxPooling: 最大池化                                   │
│  └── AveragePooling: 平均池化                               │
│                                                             │
│  循环层                                                     │
│  ├── LSTM: 长短期记忆网络                                   │
│  ├── GRU: 门控循环单元                                      │
│  └── SimpleRNN: 简单循环网络                                │
│                                                             │
│  正则化层                                                   │
│  ├── Dropout: 随机失活                                      │
│  ├── BatchNormalization: 批归一化                           │
│  └── LayerNormalization: 层归一化                           │
│                                                             │
│  注意力层                                                   │
│  ├── Attention: 注意力机制                                  │
│  ├── MultiHeadAttention: 多头注意力                         │
│  └── Transformer: Transformer 编码器/解码器                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

模型（Model） #

模型是层的容器，定义了网络的完整架构。

Sequential 模型 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

text

┌─────────────────────────────────────────────────────────────┐
│                    Sequential 模型                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  特点: 线性堆叠，一层接一层                                  │
│                                                             │
│  输入 ──► [Dense] ──► [Dropout] ──► [Dense] ──► 输出       │
│                                                             │
│  优点:                                                      │
│  ✅ 简单易用                                                │
│  ✅ 适合简单网络                                            │
│                                                             │
│  局限:                                                      │
│  ❌ 不支持多输入/多输出                                     │
│  ❌ 不支持层共享                                            │
│  ❌ 不支持复杂拓扑                                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Functional API 模型 #

python

import keras

inputs = keras.Input(shape=(784,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

text

┌─────────────────────────────────────────────────────────────┐
│                    Functional API                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  特点: 函数式编程，灵活构建复杂网络                          │
│                                                             │
│  支持的特性:                                                │
│  ✅ 多输入/多输出                                           │
│  ✅ 层共享                                                  │
│  ✅ 残差连接                                                │
│  ✅ 任意复杂拓扑                                            │
│                                                             │
│  多输入示例:                                                │
│  ┌─────────┐                                               │
│  │ 输入 A  │──┐                                            │
│  └─────────┘  │   ┌─────────┐   ┌─────────┐               │
│               ├──►│  合并   │──►│  输出   │               │
│  ┌─────────┐  │   └─────────┘   └─────────┘               │
│  │ 输入 B  │──┘                                            │
│  └─────────┘                                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

多输入多输出模型 #

python

import keras

title_input = keras.Input(shape=(100,), name='title')
body_input = keras.Input(shape=(500,), name='body')

title_features = keras.layers.Dense(64, activation='relu')(title_input)
body_features = keras.layers.Dense(64, activation='relu')(body_input)

x = keras.layers.concatenate([title_features, body_features])

priority_output = keras.layers.Dense(1, activation='sigmoid', name='priority')(x)
department_output = keras.layers.Dense(4, activation='softmax', name='department')(x)

model = keras.Model(
    inputs=[title_input, body_input],
    outputs=[priority_output, department_output]
)

损失函数（Loss Function） #

损失函数衡量模型预测与真实值之间的差距。

text

┌─────────────────────────────────────────────────────────────┐
│                    损失函数选择                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  回归问题                                                   │
│  ├── MSE (均方误差): 预测值与真实值差的平方                 │
│  ├── MAE (平均绝对误差): 预测值与真实值差的绝对值           │
│  └── Huber Loss: 结合 MSE 和 MAE 的优点                     │
│                                                             │
│  二分类问题                                                 │
│  └── Binary Crossentropy: 二元交叉熵                        │
│                                                             │
│  多分类问题                                                 │
│  ├── Categorical Crossentropy: 独热编码标签                 │
│  └── Sparse Categorical Crossentropy: 整数标签              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

损失函数使用 #

python

import keras

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.compile(
    optimizer='adam',
    loss=keras.losses.MeanSquaredError(),
    metrics=['mae']
)

model.compile(
    optimizer='adam',
    loss=keras.losses.BinaryCrossentropy(),
    metrics=['accuracy']
)

优化器（Optimizer） #

优化器决定如何更新模型参数以最小化损失函数。

text

┌─────────────────────────────────────────────────────────────┐
│                    优化器对比                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  SGD (随机梯度下降)                                         │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ θ = θ - lr × gradient                               │   │
│  └─────────────────────────────────────────────────────┘   │
│  简单但收敛慢，需要仔细调整学习率                           │
│                                                             │
│  SGD + Momentum                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ v = momentum × v - lr × gradient                    │   │
│  │ θ = θ + v                                           │   │
│  └─────────────────────────────────────────────────────┘   │
│  加速收敛，减少震荡                                         │
│                                                             │
│  Adam (推荐)                                                │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ 自适应学习率 + 动量                                  │   │
│  │ 适合大多数任务                                       │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

优化器配置 #

python

import keras

optimizer = keras.optimizers.Adam(
    learning_rate=0.001,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07
)

model.compile(optimizer=optimizer, loss='mse')

optimizer = keras.optimizers.SGD(
    learning_rate=0.01,
    momentum=0.9,
    nesterov=True
)

激活函数（Activation） #

激活函数为网络引入非线性，使网络能学习复杂模式。

text

┌─────────────────────────────────────────────────────────────┐
│                    常用激活函数                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ReLU (推荐用于隐藏层)                                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ f(x) = max(0, x)                                    │   │
│  │                                                      │   │
│  │    ▲                                                 │   │
│  │   /│                                                 │   │
│  │  / │                                                 │   │
│  │ /  │                                                 │   │
│  │────┼────► x                                         │   │
│  │    │                                                 │   │
│  └─────────────────────────────────────────────────────┘   │
│  优点: 计算快，缓解梯度消失                                 │
│                                                             │
│  Sigmoid (用于二分类输出)                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ f(x) = 1 / (1 + e^(-x))                             │   │
│  │ 输出范围: (0, 1)                                     │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Softmax (用于多分类输出)                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ f(x_i) = e^(x_i) / Σ e^(x_j)                        │   │
│  │ 输出和为 1，表示概率分布                              │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Tanh                                                       │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ f(x) = (e^x - e^(-x)) / (e^x + e^(-x))              │   │
│  │ 输出范围: (-1, 1)                                    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

激活函数使用 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(64, activation=keras.activations.relu),
    keras.layers.Dense(10, activation='softmax')
])

model = keras.Sequential([
    keras.layers.Dense(64),
    keras.layers.Activation('relu'),
    keras.layers.Dense(10),
    keras.layers.Activation('softmax')
])

训练流程 #

text

┌─────────────────────────────────────────────────────────────┐
│                    训练循环                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  for epoch in range(epochs):                                │
│      for batch in dataset:                                  │
│          │                                                  │
│          ▼                                                  │
│      ┌─────────────────────────────────────┐               │
│      │ 1. 前向传播                         │               │
│      │    predictions = model(inputs)      │               │
│      └─────────────────────────────────────┘               │
│          │                                                  │
│          ▼                                                  │
│      ┌─────────────────────────────────────┐               │
│      │ 2. 计算损失                         │               │
│      │    loss = loss_fn(targets, preds)   │               │
│      └─────────────────────────────────────┘               │
│          │                                                  │
│          ▼                                                  │
│      ┌─────────────────────────────────────┐               │
│      │ 3. 计算梯度                         │               │
│      │    gradients = tape.gradient(loss)  │               │
│      └─────────────────────────────────────┘               │
│          │                                                  │
│          ▼                                                  │
│      ┌─────────────────────────────────────┐               │
│      │ 4. 更新权重                         │               │
│      │    optimizer.apply_gradients()      │               │
│      └─────────────────────────────────────┘               │
│          │                                                  │
│          ▼                                                  │
│      重复直到收敛                                           │
│                                                             │
└─────────────────────────────────────────────────────────────┘

自定义训练循环 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

optimizer = keras.optimizers.Adam()
loss_fn = keras.losses.SparseCategoricalCrossentropy()

train_acc_metric = keras.metrics.SparseCategoricalAccuracy()

for epoch in range(5):
    for step, (x_batch, y_batch) in enumerate(train_dataset):
        with keras.backend.GradientTape() as tape:
            logits = model(x_batch, training=True)
            loss = loss_fn(y_batch, logits)
        
        grads = tape.gradient(loss, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
        
        train_acc_metric.update_state(y_batch, logits)
    
    train_acc = train_acc_metric.result()
    print(f'Epoch {epoch}, Accuracy: {train_acc:.4f}')
    train_acc_metric.reset_state()

下一步 #

现在你已经掌握了 Keras 的核心概念，接下来学习 Sequential 模型，深入了解模型构建方法！