激活函数 #

什么是激活函数？ #

激活函数为神经网络引入非线性，使网络能够学习复杂的模式。

text

┌─────────────────────────────────────────────────────────────┐
│                    激活函数的作用                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  没有激活函数:                                              │
│  多层线性变换 = 单层线性变换                                │
│  无法学习复杂模式                                           │
│                                                             │
│  有激活函数:                                                │
│  每层引入非线性                                             │
│  可以拟合任意复杂函数                                       │
│                                                             │
│  输入 ──► 线性变换 ──► 激活函数 ──► 输出                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

ReLU（修正线性单元） #

原理 #

text

┌─────────────────────────────────────────────────────────────┐
│                    ReLU 函数                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  公式: f(x) = max(0, x)                                     │
│                                                             │
│        ▲                                                    │
│       /│                                                    │
│      / │                                                    │
│     /  │                                                    │
│ ───┼───┼───► x                                             │
│    │   │                                                    │
│    │  /                                                     │
│    │ /                                                      │
│    │/                                                       │
│                                                             │
│  特点:                                                      │
│  ├── 计算简单高效                                          │
│  ├── 缓解梯度消失                                          │
│  ├── 稀疏激活（负值输出为0）                               │
│  └── 可能出现"死亡ReLU"问题                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用方法 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model = keras.Sequential([
    keras.layers.Dense(64, input_shape=(784,)),
    keras.layers.Activation('relu'),
    keras.layers.Dense(10, activation='softmax')
])

model = keras.Sequential([
    keras.layers.Dense(64, activation=keras.activations.relu, input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

Sigmoid #

原理 #

text

┌─────────────────────────────────────────────────────────────┐
│                    Sigmoid 函数                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  公式: f(x) = 1 / (1 + e^(-x))                              │
│                                                             │
│        ▲ 1                                                  │
│       /│                                                    │
│      / │                                                    │
│     /  │                                                    │
│ ───┼───┼───► x                                             │
│   /│   │                                                    │
│  / │   │                                                    │
│ /  │   │ 0                                                  │
│    │                                                    │
│  输出范围: (0, 1)                                           │
│                                                             │
│  特点:                                                      │
│  ├── 输出可解释为概率                                      │
│  ├── 适合二分类输出                                        │
│  ├── 容易梯度消失                                          │
│  └── 输出不是零中心                                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用方法 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

Softmax #

原理 #

text

┌─────────────────────────────────────────────────────────────┐
│                    Softmax 函数                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  公式: f(x_i) = e^(x_i) / Σ e^(x_j)                         │
│                                                             │
│  输入: [2.0, 1.0, 0.1]                                      │
│                                                             │
│  计算:                                                      │
│  e^2.0 = 7.39                                               │
│  e^1.0 = 2.72                                               │
│  e^0.1 = 1.11                                               │
│  总和 = 11.22                                               │
│                                                             │
│  输出: [0.66, 0.24, 0.10]                                   │
│                                                             │
│  特点:                                                      │
│  ├── 输出和为 1                                            │
│  ├── 可解释为概率分布                                      │
│  ├── 适合多分类输出                                        │
│  └── 与交叉熵损失配合使用                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用方法 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

Tanh #

原理 #

text

┌─────────────────────────────────────────────────────────────┐
│                    Tanh 函数                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  公式: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))               │
│                                                             │
│        ▲ 1                                                  │
│       /│                                                    │
│      / │                                                    │
│     /  │                                                    │
│ ───┼───┼───► x                                             │
│    │  /                                                     │
│    │ /                                                      │
│    │/                                                       │
│   -1│                                                       │
│                                                             │
│  输出范围: (-1, 1)                                          │
│                                                             │
│  特点:                                                      │
│  ├── 零中心输出                                            │
│  ├── 比 Sigmoid 收敛快                                     │
│  ├── 仍有梯度消失问题                                      │
│  └── 适合 RNN 隐藏层                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用方法 #

python

import keras

model = keras.Sequential([
    keras.layers.LSTM(64, activation='tanh', input_shape=(100, 32)),
    keras.layers.Dense(10, activation='softmax')
])

LeakyReLU #

原理 #

text

┌─────────────────────────────────────────────────────────────┐
│                    LeakyReLU 函数                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  公式: f(x) = x if x > 0 else α*x                           │
│                                                             │
│        ▲                                                    │
│       /│                                                    │
│      / │                                                    │
│     /  │                                                    │
│ ───┼───┼───► x                                             │
│   /│   │                                                    │
│  / │   │                                                    │
│ /  │   │                                                    │
│    │                                                        │
│  解决"死亡ReLU"问题                                         │
│  负值区域有小的梯度                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

使用方法 #

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, input_shape=(784,)),
    keras.layers.LeakyReLU(alpha=0.1),
    keras.layers.Dense(32),
    keras.layers.LeakyReLU(alpha=0.1),
    keras.layers.Dense(10, activation='softmax')
])

PReLU #

PReLU 的负斜率是可学习的参数。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, input_shape=(784,)),
    keras.layers.PReLU(),
    keras.layers.Dense(10, activation='softmax')
])

ELU #

ELU 在负值区域有指数衰减。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='elu', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

SELU #

SELU 是自归一化的激活函数。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='selu', input_shape=(784,)),
    keras.layers.AlphaDropout(0.1),
    keras.layers.Dense(32, activation='selu'),
    keras.layers.AlphaDropout(0.1),
    keras.layers.Dense(10, activation='softmax')
])

Swish #

Swish 是 Google 提出的自门控激活函数。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='swish', input_shape=(784,)),
    keras.layers.Dense(32, activation='swish'),
    keras.layers.Dense(10, activation='softmax')
])

GELU #

GELU 在 Transformer 中广泛使用。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='gelu', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

Softplus #

Softplus 是 ReLU 的平滑版本。

python

import keras

model = keras.Sequential([
    keras.layers.Dense(64, activation='softplus', input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

激活函数选择指南 #

text

┌─────────────────────────────────────────────────────────────┐
│                    激活函数选择建议                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  隐藏层:                                                    │
│  ├── 首选: ReLU                                            │
│  ├── 深层网络: LeakyReLU / PReLU                           │
│  ├── 自归一化网络: SELU                                    │
│  └── Transformer: GELU / Swish                             │
│                                                             │
│  输出层:                                                    │
│  ├── 二分类: Sigmoid                                       │
│  ├── 多分类: Softmax                                       │
│  ├── 回归: 线性 (无激活)                                   │
│  └── 回归 (正输出): ReLU / Softplus                        │
│                                                             │
│  RNN:                                                       │
│  └── 隐藏状态: Tanh                                        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

自定义激活函数 #

python

import keras
import keras.ops as ops

def my_activation(x):
    return ops.sin(x)

model = keras.Sequential([
    keras.layers.Dense(64, activation=my_activation, input_shape=(784,)),
    keras.layers.Dense(10, activation='softmax')
])

下一步 #

现在你已经掌握了激活函数，接下来学习损失函数，衡量模型预测误差！