Grafana 高级主题 #

概述 #

本章将介绍 Grafana 的高级功能,帮助你深入掌握 Grafana 的进阶用法,构建企业级监控平台。

text
┌─────────────────────────────────────────────────────────────┐
│                    高级主题概览                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  配置管理:                                                  │
│  ├── Provisioning    配置即代码                              │
│  └── 配置文件详解                                           │
│                                                             │
│  扩展开发:                                                  │
│  ├── 插件开发        自定义插件                              │
│  └── API 使用       Grafana HTTP API                        │
│                                                             │
│  部署运维:                                                  │
│  ├── 高可用部署      集群架构                                │
│  ├── 性能优化        调优策略                                │
│  └── 安全配置        安全加固                                │
│                                                             │
│  集成扩展:                                                  │
│  ├── 认证集成        LDAP/OAuth                             │
│  ├── 云服务集成      AWS/Azure/GCP                          │
│  └── Kubernetes 集成 云原生部署                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Provisioning(配置即代码) #

Provisioning 允许通过配置文件自动配置 Grafana,实现基础设施即代码(IaC)。

目录结构 #

text
┌─────────────────────────────────────────────────────────────┐
│                    Provisioning 目录结构                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  /etc/grafana/provisioning/                                 │
│  ├── datasources/                                           │
│  │   └── datasources.yaml    数据源配置                     │
│  ├── dashboards/                                            │
│  │   └── dashboards.yaml     仪表板配置                     │
│  ├── plugins/                                               │
│  │   └── plugins.yaml        插件配置                       │
│  ├── alerting/                                              │
│  │   ├── rules.yaml          告警规则                       │
│  │   └── contactpoints.yaml  联系点                         │
│  └── notifiers/                                             │
│      └── notifiers.yaml      通知渠道                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

数据源配置 #

yaml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    basicAuth: true
    basicAuthUser: admin
    secureJsonData:
      basicAuthPassword: admin123
    isDefault: true
    editable: false
    jsonData:
      httpMethod: POST
      manageAlerts: true
      prometheusType: Prometheus
      prometheusVersion: "2.40.0"
      cacheLevel: 'High'
    version: 1

  - name: InfluxDB
    type: influxdb
    access: proxy
    url: http://influxdb:8086
    database: telegraf
    user: admin
    secureJsonData:
      password: admin123
    jsonData:
      httpMode: POST
      timeInterval: "10s"

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      maxLines: 1000

deleteDatasources:
  - name: Old-Prometheus
    orgId: 1

仪表板配置 #

yaml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: 'Monitoring'
    folderUid: 'monitoring'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 30
    allowUiUpdates: true
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true

  - name: 'infrastructure'
    orgId: 1
    folder: 'Infrastructure'
    type: file
    options:
      path: /var/lib/grafana/dashboards/infrastructure

插件配置 #

yaml
apiVersion: 1

apps:
  - type: grafana-piechart-panel
    enabled: true
    version: 1.6.2

  - type: grafana-clock-panel
    enabled: true

  - type: alexanderzobnin-zabbix-app
    enabled: true
    jsonData:
      zabbixVersion: 5.0

告警规则配置 #

yaml
apiVersion: 1

groups:
  - orgId: 1
    name: system_alerts
    folder: Infrastructure
    interval: 1m
    rules:
      - uid: high_cpu_usage
        title: High CPU Usage
        condition: C
        data:
          - refId: A
            relativeTimeRange:
              from: 600
              to: 0
            datasourceUid: prometheus
            model:
              expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
              refId: A
          - refId: B
            relativeTimeRange:
              from: 600
              to: 0
            datasourceUid: __expr__
            model:
              type: reduce
              expression: A
              reducer: last
              refId: B
          - refId: C
            relativeTimeRange:
              from: 600
              to: 0
            datasourceUid: __expr__
            model:
              type: threshold
              expression: B
              conditions:
                - evaluator:
                    params:
                      - 80
                    type: gt
                  operator:
                    type: and
                - evaluator:
                    params:
                      - 90
                    type: gt
                  operator:
                    type: and
              refId: C
        noDataState: NoData
        execErrState: Error
        for: 5m
        annotations:
          summary: High CPU usage on {{ $labels.instance }}
          description: CPU usage is {{ $value }}%
        labels:
          severity: warning
          team: infrastructure

联系点配置 #

yaml
apiVersion: 1

contactPoints:
  - orgId: 1
    name: Email Team
    receivers:
      - uid: email-receiver
        type: email
        settings:
          addresses: team@example.com
          singleEmail: false

  - orgId: 1
    name: Slack Team
    receivers:
      - uid: slack-receiver
        type: slack
        settings:
          url: https://hooks.slack.com/services/xxx/xxx/xxx
          channel: '#alerts'
          username: Grafana

Grafana HTTP API #

API 认证 #

text
┌─────────────────────────────────────────────────────────────┐
│                    API 认证方式                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Basic Auth:                                                │
│  curl -u admin:admin http://localhost:3000/api/dashboards   │
│                                                             │
│  API Token:                                                 │
│  curl -H "Authorization: Bearer xxx" http://localhost:3000/api/dashboards│
│                                                             │
│  Service Account:                                           │
│  curl -H "Authorization: Bearer glsa_xxx" http://...        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

常用 API 端点 #

bash
GET /api/dashboards/home
GET /api/dashboards/uid/{uid}
POST /api/dashboards/db
DELETE /api/dashboards/uid/{uid}

GET /api/datasources
GET /api/datasources/{id}
POST /api/datasources
PUT /api/datasources/{id}
DELETE /api/datasources/{id}

GET /api/search
GET /api/folders
POST /api/folders

GET /api/alerts
GET /api/alerts/{id}
POST /api/alerts/test

GET /api/users
GET /api/teams
GET /api/orgs

创建仪表板示例 #

bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer glsa_xxx" \
  -d '{
    "dashboard": {
      "title": "API Dashboard",
      "uid": "api-dashboard",
      "panels": [
        {
          "title": "Requests",
          "type": "graph",
          "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
          "targets": [
            {
              "expr": "rate(http_requests_total[5m])",
              "refId": "A"
            }
          ]
        }
      ]
    },
    "overwrite": true
  }' \
  http://localhost:3000/api/dashboards/db

查询数据源示例 #

bash
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer glsa_xxx" \
  -d '{
    "queries": [
      {
        "refId": "A",
        "datasource": {
          "type": "prometheus",
          "uid": "prometheus"
        },
        "expr": "up"
      }
    ],
    "from": "now-1h",
    "to": "now"
  }' \
  http://localhost:3000/api/ds/query

插件开发 #

插件类型 #

text
┌─────────────────────────────────────────────────────────────┐
│                    插件类型                                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Panel Plugin(面板插件):                                  │
│  ├── 自定义可视化组件                                       │
│  └── 示例:Clock, Pie Chart, Worldmap                       │
│                                                             │
│  Datasource Plugin(数据源插件):                           │
│  ├── 新增数据源支持                                         │
│  └── 示例:MongoDB, Redis, Zabbix                           │
│                                                             │
│  App Plugin(应用插件):                                    │
│  ├── 完整功能扩展                                           │
│  ├── 包含数据源、面板、仪表板                               │
│  └── 示例:Zabbix, Kubernetes                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

插件开发环境 #

bash
npx @grafana/create-plugin@latest

cd my-plugin
npm install
npm run dev

面板插件结构 #

text
my-panel-plugin/
├── src/
│   ├── components/
│   │   └── SimplePanel.tsx
│   ├── module.ts
│   ├── plugin.json
│   └── types.ts
├── dist/
├── package.json
├── tsconfig.json
└── README.md

plugin.json 配置 #

json
{
  "type": "panel",
  "name": "My Panel",
  "id": "myorg-mypanel-panel",
  "info": {
    "description": "My custom panel plugin",
    "author": {
      "name": "My Org"
    },
    "keywords": ["panel", "custom"],
    "version": "1.0.0",
    "updated": "2024-01-01"
  },
  "dependencies": {
    "grafanaDependency": ">=10.0.0",
    "plugins": []
  }
}

面板组件示例 #

typescript
import React from 'react';
import { PanelProps } from '@grafana/data';
import { SimpleOptions } from 'types';

interface Props extends PanelProps<SimpleOptions> {}

export const SimplePanel: React.FC<Props> = ({ options, data, width, height }) => {
  const { showSeriesCount } = options;
  const seriesCount = data.series.length;

  return (
    <div
      style={{
        width,
        height,
        display: 'flex',
        alignItems: 'center',
        justifyContent: 'center',
      }}
    >
      {showSeriesCount && <div>Series Count: {seriesCount}</div>}
    </div>
  );
};

高可用部署 #

架构设计 #

text
┌─────────────────────────────────────────────────────────────┐
│                    高可用架构                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    ┌─────────────┐                          │
│                    │   LB/Nginx  │                          │
│                    └──────┬──────┘                          │
│                           │                                 │
│           ┌───────────────┼───────────────┐                 │
│           │               │               │                 │
│           ▼               ▼               ▼                 │
│    ┌─────────────┐ ┌─────────────┐ ┌─────────────┐         │
│    │  Grafana 1  │ │  Grafana 2  │ │  Grafana 3  │         │
│    └──────┬──────┘ └──────┬──────┘ └──────┬──────┘         │
│           │               │               │                 │
│           └───────────────┼───────────────┘                 │
│                           │                                 │
│                           ▼                                 │
│                    ┌─────────────┐                          │
│                    │   Database  │                          │
│                    │ PostgreSQL  │                          │
│                    └─────────────┘                          │
│                                                             │
│                    ┌─────────────┐                          │
│                    │   Storage   │                          │
│                    │   (S3/NFS)  │                          │
│                    └─────────────┘                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

数据库配置 #

ini
[database]
type = postgres
host = postgres:5432
name = grafana
user = grafana
password = ${GRAFANA_DB_PASSWORD}
ssl_mode = require
max_open_conn = 20
max_idle_conn = 10
conn_max_lifetime = 14400

Session 配置 #

ini
[session]
provider = postgres
provider_config = host=postgres port=5432 user=grafana password=${GRAFANA_DB_PASSWORD} dbname=grafana sslmode=require
cookie_secure = true
cookie_samesite = strict

存储配置 #

ini
[storage]
type = s3

[storage.s3]
bucket = grafana-storage
region = us-east-1
access_key = ${AWS_ACCESS_KEY}
secret_key = ${AWS_SECRET_KEY}

Docker Compose 高可用配置 #

yaml
version: '3.8'

services:
  grafana:
    image: grafana/grafana:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    environment:
      - GF_DATABASE_TYPE=postgres
      - GF_DATABASE_HOST=postgres:5432
      - GF_DATABASE_NAME=grafana
      - GF_DATABASE_USER=grafana
      - GF_DATABASE_PASSWORD=${GRAFANA_DB_PASSWORD}
      - GF_SESSION_PROVIDER=postgres
      - GF_SESSION_PROVIDER_CONFIG=host=postgres:5432 user=grafana password=${GRAFANA_DB_PASSWORD} dbname=grafana sslmode=disable
    volumes:
      - grafana-plugins:/var/lib/grafana/plugins
    networks:
      - grafana-net
    depends_on:
      - postgres

  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=grafana
      - POSTGRES_USER=grafana
      - POSTGRES_PASSWORD=${GRAFANA_DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - grafana-net

  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    networks:
      - grafana-net
    depends_on:
      - grafana

volumes:
  grafana-plugins:
  postgres-data:

networks:
  grafana-net:

性能优化 #

数据库优化 #

text
┌─────────────────────────────────────────────────────────────┐
│                    数据库优化建议                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  连接池配置:                                                │
│  ├── max_open_conn: 20-50                                   │
│  ├── max_idle_conn: 10-20                                   │
│  └── conn_max_lifetime: 14400                               │
│                                                             │
│  数据库选择:                                                │
│  ├── SQLite: 仅用于开发测试                                 │
│  ├── MySQL: 中小规模部署                                    │
│  └── PostgreSQL: 大规模生产环境                             │
│                                                             │
│  索引优化:                                                  │
│  └── 定期执行 VACUUM/ANALYZE                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

查询优化 #

ini
[dataproxy]
timeout = 30
dial_timeout = 10
send_user_header = false

[unified_alerting]
execute_alerts = true
evaluation_timeout = 30s
max_concurrent_evaluations = 20

缓存配置 #

ini
[caching]
enabled = true
type = redis

[caching.redis]
host = redis:6379
password = ${REDIS_PASSWORD}
db = 0
prefix = grafana_

资源限制 #

yaml
resources:
  limits:
    cpu: "2"
    memory: "4Gi"
  requests:
    cpu: "500m"
    memory: "1Gi"

安全配置 #

认证配置 #

text
┌─────────────────────────────────────────────────────────────┐
│                    认证方式                                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Basic Auth:                                                │
│  [auth.basic]                                               │
│  enabled = true                                             │
│                                                             │
│  LDAP:                                                      │
│  [auth.ldap]                                                │
│  enabled = true                                             │
│  config_file = /etc/grafana/ldap.toml                       │
│                                                             │
│  OAuth:                                                     │
│  ├── GitHub                                                 │
│  ├── GitLab                                                 │
│  ├── Google                                                 │
│  ├── Azure AD                                               │
│  └── Okta                                                   │
│                                                             │
│  SAML:                                                      │
│  [auth.saml]                                                │
│  enabled = true                                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

LDAP 配置 #

toml
[[servers]]
host = "ldap.example.com"
port = 389
use_ssl = false
start_tls = false

bind_dn = "cn=admin,dc=example,dc=com"
bind_password = "${LDAP_PASSWORD}"

search_filter = "(cn=%s)"
search_base_dns = ["dc=example,dc=com"]

[servers.attributes]
name = "givenName"
surname = "sn"
username = "cn"
email = "email"

[[servers.group_mappings]]
group_dn = "cn=admin,ou=groups,dc=example,dc=com"
org_role = "Admin"

[[servers.group_mappings]]
group_dn = "cn=editor,ou=groups,dc=example,dc=com"
org_role = "Editor"

[[servers.group_mappings]]
group_dn = "*"
org_role = "Viewer"

OAuth 配置 #

ini
[auth.github]
enabled = true
allow_sign_up = true
client_id = ${GITHUB_CLIENT_ID}
client_secret = ${GITHUB_CLIENT_SECRET}
scopes = user:email,read:org
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
team_ids =
allowed_organizations =

安全加固 #

ini
[security]
admin_user = admin
admin_password = ${GRAFANA_ADMIN_PASSWORD}
secret_key = ${GRAFANA_SECRET_KEY}
disable_initial_admin_creation = false
disable_gravatar = true
cookie_secure = true
cookie_samesite = strict
allow_embedding = false
strict_transport_security = true
strict_transport_security_max_age_seconds = 86400
strict_transport_security_preload = true
strict_transport_security_subdomains = true
x_content_type_options = true
x_xss_protection = true
content_security_policy = true
content_security_policy_template = "script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'none';"

Kubernetes 部署 #

Helm Values 配置 #

yaml
replicas: 3

persistence:
  enabled: false

admin:
  existingSecret: grafana-admin-secret
  userKey: user
  passwordKey: password

env:
  GF_DATABASE_TYPE: postgres
  GF_DATABASE_HOST: postgres:5432
  GF_DATABASE_NAME: grafana
  GF_DATABASE_USER: grafana
  GF_DATABASE_PASSWORD: ${GRAFANA_DB_PASSWORD}
  GF_SESSION_PROVIDER: postgres

envFromSecret: grafana-db-secret

resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - grafana.example.com
  tls:
    - secretName: grafana-tls
      hosts:
        - grafana.example.com

sidecar:
  dashboards:
    enabled: true
    label: grafana_dashboard
    folder: /tmp/dashboards
  datasources:
    enabled: true
    label: grafana_datasource

plugins:
  - grafana-piechart-panel
  - grafana-clock-panel

extraObjects:
  - apiVersion: v1
    kind: Secret
    metadata:
      name: grafana-db-secret
    stringData:
      GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}

安装 Grafana #

bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install grafana grafana/grafana \
  -f values.yaml \
  -n monitoring \
  --create-namespace

监控 Grafana 自身 #

Prometheus 监控配置 #

yaml
grafana:
  env:
    GF_METRICS_ENABLED: "true"
  
  serviceMonitor:
    enabled: true
    labels:
      release: prometheus
    interval: 30s
    path: /metrics

Grafana 指标 #

text
┌─────────────────────────────────────────────────────────────┐
│                    Grafana 自身指标                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  请求指标:                                                  │
│  grafana_http_request_duration_seconds_bucket              │
│  grafana_http_request_duration_seconds_count               │
│  grafana_http_requests_total                               │
│                                                             │
│  数据库指标:                                                │
│  grafana_db_query_duration_seconds_bucket                  │
│  grafana_db_query_total                                    │
│                                                             │
│  会话指标:                                                  │
│  grafana_active_sessions_count                             │
│  grafana_session_duration_seconds                          │
│                                                             │
│  告警指标:                                                  │
│  grafana_alerting_alerts                                   │
│  grafana_alerting_notifications_total                      │
│                                                             │
│  系统指标:                                                  │
│  grafana_build_info                                        │
│  grafana_instance_start_time_seconds                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最佳实践总结 #

配置管理 #

text
┌─────────────────────────────────────────────────────────────┐
│                    配置管理最佳实践                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 使用 Provisioning 管理配置                              │
│     └── 版本控制所有配置文件                                │
│                                                             │
│  2. 敏感信息使用环境变量                                    │
│     └── 密码、Token 等                                      │
│                                                             │
│  3. 分离环境配置                                            │
│     └── dev/staging/prod 使用不同配置                       │
│                                                             │
│  4. 定期备份                                                │
│     └── 数据库和配置文件                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

运维建议 #

text
┌─────────────────────────────────────────────────────────────┐
│                    运维最佳实践                               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 监控 Grafana 自身                                       │
│     └── 使用 Prometheus 监控 Grafana                        │
│                                                             │
│  2. 设置资源限制                                            │
│     └── CPU 和内存限制                                      │
│                                                             │
│  3. 配置日志收集                                            │
│     └── 集中收集和分析日志                                  │
│                                                             │
│  4. 定期更新                                                │
│     └── 保持 Grafana 版本更新                               │
│                                                             │
│  5. 灾难恢复                                                │
│     └── 制定备份和恢复计划                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

总结 #

恭喜你完成了 Grafana 完全指南的学习!现在你已经掌握了:

  • Grafana 的基本概念和安装配置
  • 仪表板的创建和管理
  • 各种面板类型的使用
  • 数据源的配置方法
  • 告警系统的配置
  • 高级功能如 Provisioning、插件开发、高可用部署

继续实践和探索,你将成为 Grafana 监控可视化专家!

最后更新:2026-03-29