Sentry 告警配置 #

什么是告警? #

告警是 Sentry 的核心功能之一,帮助团队在问题发生时第一时间收到通知。

text
┌─────────────────────────────────────────────────────────────┐
│                    告警的价值                                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 快速响应                                                │
│     - 问题发生立即通知                                      │
│     - 减少问题发现时间                                      │
│                                                             │
│  2. 减少噪音                                                │
│     - 智能聚合相同问题                                      │
│     - 可配置告警频率                                        │
│                                                             │
│  3. 多渠道通知                                              │
│     - 邮件                                                  │
│     - Slack                                                 │
│     - PagerDuty                                             │
│     - 自定义 Webhook                                        │
│                                                             │
│  4. 分级告警                                                │
│     - 不同级别不同处理                                      │
│     - 按需升级                                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

告警规则 #

规则类型 #

text
┌─────────────────────────────────────────────────────────────┐
│                    告警规则类型                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Issue 告警                                              │
│     - 新问题出现                                            │
│     - 问题频率增加                                          │
│     - 问题状态变化                                          │
│                                                             │
│  2. 错误频率告警                                            │
│     - 错误数量超过阈值                                      │
│     - 错误率上升                                            │
│                                                             │
│  3. 性能告警                                                │
│     - 响应时间过长                                          │
│     - Crash Free Rate 下降                                  │
│                                                             │
│  4. Release 告警                                            │
│     - 新版本问题                                            │
│     - 部署失败                                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

创建告警规则 #

在 Sentry 控制台:

  1. 进入 SettingsProjects → 选择项目 → Alerts
  2. 点击 New Alert Rule
  3. 选择规则类型
  4. 配置条件和动作

Issue 告警规则 #

yaml
# Issue 告警规则示例
name: "High Priority Issues"
conditions:
  # 条件1: 问题级别为 error 或 fatal
  - type: level
    match: is
    values: [error, fatal]
  
  # 条件2: 问题首次出现
  - type: first_seen
    match: is
    value: true
  
  # 条件3: 影响用户数超过 10
  - type: users_affected
    match: greater_or_equal
    value: 10

actions:
  - type: email
    targets: ["dev-team@example.com"]
  - type: slack
    channel: "#alerts"

频率告警规则 #

yaml
# 频率告警规则示例
name: "Error Rate Spike"
conditions:
  # 10分钟内错误数超过 100
  - type: event_frequency
    comparison: greater_or_equal
    value: 100
    interval: 10m

actions:
  - type: email
    targets: ["oncall@example.com"]
  - type: pagerduty
    service_key: "${PAGERDUTY_SERVICE_KEY}"

性能告警规则 #

yaml
# 性能告警规则示例
name: "Slow API Response"
conditions:
  # P95 响应时间超过 1 秒
  - type: transaction_duration
    comparison: greater_than
    value: 1000  # 毫秒
    percentile: 95

actions:
  - type: slack
    channel: "#performance"

通知渠道 #

邮件通知 #

yaml
# 邮件配置
actions:
  - type: email
    targets:
      - "dev-team@example.com"
      - "oncall@example.com"
    
    # 邮件频率限制
    frequency: 5m  # 每 5 分钟最多发送一次

Slack 集成 #

安装 Slack 集成 #

  1. 进入 SettingsIntegrations
  2. 找到 Slack 并点击 Add Integration
  3. 授权 Sentry 访问 Slack 工作区
  4. 选择要发送通知的频道

配置 Slack 告警 #

yaml
# Slack 告警配置
actions:
  - type: slack
    channel: "#alerts"
    
    # 自定义消息格式
    blocks:
      - type: header
        text:
          type: plain_text
          text: "🚨 {{ issue.title }}"
      - type: section
        fields:
          - type: mrkdwn
            text: "*Environment:*\n{{ environment }}"
          - type: mrkdwn
            text: "*Release:*\n{{ release }}"
      - type: actions
        elements:
          - type: button
            text:
              type: plain_text
              text: "View Issue"
            url: "{{ issue.url }}"

PagerDuty 集成 #

yaml
# PagerDuty 配置
actions:
  - type: pagerduty
    service_key: "${PAGERDUTY_SERVICE_KEY}"
    
    # 严重级别映射
    severity_mapping:
      fatal: critical
      error: error
      warning: warning

Webhook 集成 #

yaml
# Webhook 配置
actions:
  - type: webhook
    url: "https://api.example.com/sentry-webhook"
    method: POST
    headers:
      Authorization: "Bearer ${WEBHOOK_TOKEN}"
      Content-Type: "application/json"
    body: |
      {
        "event_id": "{{ event_id }}",
        "issue_id": "{{ issue_id }}",
        "title": "{{ issue.title }}",
        "level": "{{ level }}",
        "environment": "{{ environment }}",
        "url": "{{ issue.url }}"
      }

Webhook 接收示例 #

javascript
// Node.js Express 接收 Webhook
const express = require("express");
const crypto = require("crypto");

const app = express();

app.post("/sentry-webhook", express.json(), (req, res) => {
  // 验证签名
  const signature = req.headers["sentry-hook-signature"];
  const expectedSignature = crypto
    .createHmac("sha256", process.env.SENTRY_WEBHOOK_SECRET)
    .update(JSON.stringify(req.body))
    .digest("hex");
  
  if (signature !== expectedSignature) {
    return res.status(401).send("Invalid signature");
  }
  
  // 处理告警
  const { action, data } = req.body;
  
  switch (action) {
    case "event_alert":
      handleEventAlert(data);
      break;
    case "issue_alert":
      handleIssueAlert(data);
      break;
  }
  
  res.status(200).send("OK");
});

function handleEventAlert(data) {
  console.log("Event alert:", data.event.title);
  // 发送到其他系统、创建工单等
}

app.listen(3000);
python
# Python Flask 接收 Webhook
from flask import Flask, request, jsonify
import hmac
import hashlib

app = Flask(__name__)

@app.route("/sentry-webhook", methods=["POST"])
def sentry_webhook():
    # 验证签名
    signature = request.headers.get("sentry-hook-signature", "")
    expected = hmac.new(
        app.config["SENTRY_WEBHOOK_SECRET"].encode(),
        request.data,
        hashlib.sha256,
    ).hexdigest()
    
    if not hmac.compare_digest(signature, expected):
        return jsonify({"error": "Invalid signature"}), 401
    
    # 处理告警
    data = request.json
    action = data.get("action")
    
    if action == "event_alert":
        handle_event_alert(data["data"])
    
    return jsonify({"status": "ok"}), 200

def handle_event_alert(data):
    print(f"Event alert: {data['event']['title']}")

告警策略 #

告警频率控制 #

yaml
# 告警频率配置
alert_settings:
  # 同一问题的告警间隔
  issue_alert_frequency: 5m
  
  # 每小时最大告警数
  max_alerts_per_hour: 10
  
  # 告警静默期
  quiet_hours:
    start: "22:00"
    end: "08:00"
    timezone: "Asia/Shanghai"

告警升级 #

yaml
# 告警升级规则
escalation_policy:
  # 第一阶段:发送到开发团队
  - level: 1
    delay: 0
    actions:
      - type: slack
        channel: "#dev-alerts"
  
  # 第二阶段:5分钟后未处理,发送邮件
  - level: 2
    delay: 5m
    actions:
      - type: email
        targets: ["dev-team@example.com"]
  
  # 第三阶段:15分钟后未处理,呼叫值班人员
  - level: 3
    delay: 15m
    actions:
      - type: pagerduty
        service_key: "${PAGERDUTY_SERVICE_KEY}"

告警分组 #

yaml
# 告警分组规则
grouping:
  # 按项目分组
  by_project: true
  
  # 按环境分组
  by_environment: true
  
  # 按错误类型分组
  by_error_type: true
  
  # 合并相似告警
  merge_similar: true
  merge_window: 1m

告警过滤 #

忽略特定错误 #

yaml
# 告警过滤规则
filters:
  # 忽略特定错误类型
  ignore_errors:
    - "NetworkError"
    - "Failed to fetch"
  
  # 忽略特定环境
  ignore_environments:
    - "development"
    - "testing"
  
  # 忽略特定用户
  ignore_users:
    - "test-user"
    - "bot-*"
  
  # 忽略特定路径
  ignore_paths:
    - "/health"
    - "/metrics"

条件过滤 #

yaml
# 条件告警
conditions:
  # 只在生产环境告警
  - type: environment
    match: is
    values: [production]
  
  # 只告警 error 及以上级别
  - type: level
    match: greater_or_equal
    value: error
  
  # 只告警影响用户数超过 5 的问题
  - type: users_affected
    match: greater_or_equal
    value: 5

告警模板 #

自定义告警消息 #

yaml
# 自定义告警模板
templates:
  email:
    subject: "[{{ level }}] {{ issue.title }}"
    body: |
      ## 错误详情
      
      **标题**: {{ issue.title }}
      **级别**: {{ level }}
      **环境**: {{ environment }}
      **版本**: {{ release }}
      
      **影响用户**: {{ users_affected }}
      **发生次数**: {{ event_count }}
      
      ## 错误信息
      
      ```
      {{ error.message }}
      ```
      
      ## 堆栈跟踪
      
      ```
      {{ error.stacktrace }}
      ```
      
      [查看详情]({{ issue.url }})
  
  slack:
    blocks:
      - type: header
        text:
          type: plain_text
          text: "🚨 {{ level }}: {{ issue.title }}"
      - type: section
        fields:
          - type: mrkdwn
            text: "*环境:*\n{{ environment }}"
          - type: mrkdwn
            text: "*版本:*\n{{ release }}"
          - type: mrkdwn
            text: "*影响用户:*\n{{ users_affected }}"
          - type: mrkdwn
            text: "*发生次数:*\n{{ event_count }}"
      - type: section
        text:
          type: mrkdwn
          text: |
            ```
            {{ error.message }}
            ```
      - type: actions
        elements:
          - type: button
            text:
              type: plain_text
              text: "查看详情"
            url: "{{ issue.url }}"
          - type: button
            text:
              type: plain_text
              text: "忽略"
            url: "{{ issue.url }}/ignore"

告警最佳实践 #

1. 分级告警 #

yaml
# 不同级别不同处理
rules:
  # 致命错误:立即通知
  - name: "Fatal Errors"
    conditions:
      - level: fatal
    actions:
      - type: pagerduty
      - type: slack
        channel: "#critical-alerts"
  
  # 错误:工作时间内通知
  - name: "Regular Errors"
    conditions:
      - level: error
    actions:
      - type: slack
        channel: "#errors"
  
  # 警告:每日汇总
  - name: "Warnings"
    conditions:
      - level: warning
    actions:
      - type: email
        targets: ["dev-team@example.com"]
        frequency: daily

2. 避免告警疲劳 #

yaml
# 告警频率控制
settings:
  # 同一问题 5 分钟内只告警一次
  issue_frequency: 5m
  
  # 每小时最多 20 条告警
  hourly_limit: 20
  
  # 超过限制后静默
  throttle_mode: silent

3. 按团队分配 #

yaml
# 按项目/功能分配告警
rules:
  # 支付相关错误 -> 支付团队
  - name: "Payment Errors"
    conditions:
      - tag: feature
        match: is
        value: payment
    actions:
      - type: slack
        channel: "#payment-team"
  
  # 用户相关错误 -> 用户团队
  - name: "User Errors"
    conditions:
      - tag: feature
        match: is
        value: user
    actions:
      - type: slack
        channel: "#user-team"

4. 值班轮换 #

yaml
# 值班轮换配置
oncall:
  # 使用 PagerDuty 值班表
  provider: pagerduty
  schedule_id: "${PAGERDUTY_SCHEDULE_ID}"
  
  # 告警规则
  rules:
    - name: "After Hours Critical"
      conditions:
        - level: fatal
      actions:
        - type: pagerduty
          use_oncall: true

5. 告警聚合 #

yaml
# 告警聚合配置
aggregation:
  # 按时间窗口聚合
  time_window: 5m
  
  # 按条件聚合
  group_by:
    - issue_id
    - environment
  
  # 聚合后发送摘要
  send_summary: true
  summary_template: |
    ## 告警摘要
    
    在过去 {{ time_window }} 内,收到 {{ count }} 条告警:
    
    {{#issues}}
    - {{ title }} ({{ count }} 次)
    {{/issues}}

告警监控 #

告警统计 #

在 Sentry 控制台:

  1. 进入 SettingsProjects → 选择项目 → Alerts
  2. 查看告警统计:
    • 告警数量趋势
    • 告警类型分布
    • 响应时间

告警健康检查 #

yaml
# 告警健康检查
health_check:
  # 检查告警是否正常发送
  test_alert:
    enabled: true
    frequency: daily
    
  # 检查通知渠道是否正常
  channel_check:
    enabled: true
    channels:
      - type: slack
        channel: "#alerts"
      - type: email
        targets: ["test@example.com"]

下一步 #

现在你已经掌握了告警配置的知识,接下来学习 Source Maps 了解如何还原压缩代码的错误堆栈!

最后更新:2026-03-29