Grafana 高级主题 #
概述 #
本章将介绍 Grafana 的高级功能,帮助你深入掌握 Grafana 的进阶用法,构建企业级监控平台。
text
┌─────────────────────────────────────────────────────────────┐
│ 高级主题概览 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 配置管理: │
│ ├── Provisioning 配置即代码 │
│ └── 配置文件详解 │
│ │
│ 扩展开发: │
│ ├── 插件开发 自定义插件 │
│ └── API 使用 Grafana HTTP API │
│ │
│ 部署运维: │
│ ├── 高可用部署 集群架构 │
│ ├── 性能优化 调优策略 │
│ └── 安全配置 安全加固 │
│ │
│ 集成扩展: │
│ ├── 认证集成 LDAP/OAuth │
│ ├── 云服务集成 AWS/Azure/GCP │
│ └── Kubernetes 集成 云原生部署 │
│ │
└─────────────────────────────────────────────────────────────┘
Provisioning(配置即代码) #
Provisioning 允许通过配置文件自动配置 Grafana,实现基础设施即代码(IaC)。
目录结构 #
text
┌─────────────────────────────────────────────────────────────┐
│ Provisioning 目录结构 │
├─────────────────────────────────────────────────────────────┤
│ │
│ /etc/grafana/provisioning/ │
│ ├── datasources/ │
│ │ └── datasources.yaml 数据源配置 │
│ ├── dashboards/ │
│ │ └── dashboards.yaml 仪表板配置 │
│ ├── plugins/ │
│ │ └── plugins.yaml 插件配置 │
│ ├── alerting/ │
│ │ ├── rules.yaml 告警规则 │
│ │ └── contactpoints.yaml 联系点 │
│ └── notifiers/ │
│ └── notifiers.yaml 通知渠道 │
│ │
└─────────────────────────────────────────────────────────────┘
数据源配置 #
yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
basicAuth: true
basicAuthUser: admin
secureJsonData:
basicAuthPassword: admin123
isDefault: true
editable: false
jsonData:
httpMethod: POST
manageAlerts: true
prometheusType: Prometheus
prometheusVersion: "2.40.0"
cacheLevel: 'High'
version: 1
- name: InfluxDB
type: influxdb
access: proxy
url: http://influxdb:8086
database: telegraf
user: admin
secureJsonData:
password: admin123
jsonData:
httpMode: POST
timeInterval: "10s"
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
deleteDatasources:
- name: Old-Prometheus
orgId: 1
仪表板配置 #
yaml
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: 'Monitoring'
folderUid: 'monitoring'
type: file
disableDeletion: false
updateIntervalSeconds: 30
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true
- name: 'infrastructure'
orgId: 1
folder: 'Infrastructure'
type: file
options:
path: /var/lib/grafana/dashboards/infrastructure
插件配置 #
yaml
apiVersion: 1
apps:
- type: grafana-piechart-panel
enabled: true
version: 1.6.2
- type: grafana-clock-panel
enabled: true
- type: alexanderzobnin-zabbix-app
enabled: true
jsonData:
zabbixVersion: 5.0
告警规则配置 #
yaml
apiVersion: 1
groups:
- orgId: 1
name: system_alerts
folder: Infrastructure
interval: 1m
rules:
- uid: high_cpu_usage
title: High CPU Usage
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: prometheus
model:
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
refId: A
- refId: B
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
type: reduce
expression: A
reducer: last
refId: B
- refId: C
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
type: threshold
expression: B
conditions:
- evaluator:
params:
- 80
type: gt
operator:
type: and
- evaluator:
params:
- 90
type: gt
operator:
type: and
refId: C
noDataState: NoData
execErrState: Error
for: 5m
annotations:
summary: High CPU usage on {{ $labels.instance }}
description: CPU usage is {{ $value }}%
labels:
severity: warning
team: infrastructure
联系点配置 #
yaml
apiVersion: 1
contactPoints:
- orgId: 1
name: Email Team
receivers:
- uid: email-receiver
type: email
settings:
addresses: team@example.com
singleEmail: false
- orgId: 1
name: Slack Team
receivers:
- uid: slack-receiver
type: slack
settings:
url: https://hooks.slack.com/services/xxx/xxx/xxx
channel: '#alerts'
username: Grafana
Grafana HTTP API #
API 认证 #
text
┌─────────────────────────────────────────────────────────────┐
│ API 认证方式 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Basic Auth: │
│ curl -u admin:admin http://localhost:3000/api/dashboards │
│ │
│ API Token: │
│ curl -H "Authorization: Bearer xxx" http://localhost:3000/api/dashboards│
│ │
│ Service Account: │
│ curl -H "Authorization: Bearer glsa_xxx" http://... │
│ │
└─────────────────────────────────────────────────────────────┘
常用 API 端点 #
bash
GET /api/dashboards/home
GET /api/dashboards/uid/{uid}
POST /api/dashboards/db
DELETE /api/dashboards/uid/{uid}
GET /api/datasources
GET /api/datasources/{id}
POST /api/datasources
PUT /api/datasources/{id}
DELETE /api/datasources/{id}
GET /api/search
GET /api/folders
POST /api/folders
GET /api/alerts
GET /api/alerts/{id}
POST /api/alerts/test
GET /api/users
GET /api/teams
GET /api/orgs
创建仪表板示例 #
bash
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer glsa_xxx" \
-d '{
"dashboard": {
"title": "API Dashboard",
"uid": "api-dashboard",
"panels": [
{
"title": "Requests",
"type": "graph",
"gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"refId": "A"
}
]
}
]
},
"overwrite": true
}' \
http://localhost:3000/api/dashboards/db
查询数据源示例 #
bash
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer glsa_xxx" \
-d '{
"queries": [
{
"refId": "A",
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"expr": "up"
}
],
"from": "now-1h",
"to": "now"
}' \
http://localhost:3000/api/ds/query
插件开发 #
插件类型 #
text
┌─────────────────────────────────────────────────────────────┐
│ 插件类型 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Panel Plugin(面板插件): │
│ ├── 自定义可视化组件 │
│ └── 示例:Clock, Pie Chart, Worldmap │
│ │
│ Datasource Plugin(数据源插件): │
│ ├── 新增数据源支持 │
│ └── 示例:MongoDB, Redis, Zabbix │
│ │
│ App Plugin(应用插件): │
│ ├── 完整功能扩展 │
│ ├── 包含数据源、面板、仪表板 │
│ └── 示例:Zabbix, Kubernetes │
│ │
└─────────────────────────────────────────────────────────────┘
插件开发环境 #
bash
npx @grafana/create-plugin@latest
cd my-plugin
npm install
npm run dev
面板插件结构 #
text
my-panel-plugin/
├── src/
│ ├── components/
│ │ └── SimplePanel.tsx
│ ├── module.ts
│ ├── plugin.json
│ └── types.ts
├── dist/
├── package.json
├── tsconfig.json
└── README.md
plugin.json 配置 #
json
{
"type": "panel",
"name": "My Panel",
"id": "myorg-mypanel-panel",
"info": {
"description": "My custom panel plugin",
"author": {
"name": "My Org"
},
"keywords": ["panel", "custom"],
"version": "1.0.0",
"updated": "2024-01-01"
},
"dependencies": {
"grafanaDependency": ">=10.0.0",
"plugins": []
}
}
面板组件示例 #
typescript
import React from 'react';
import { PanelProps } from '@grafana/data';
import { SimpleOptions } from 'types';
interface Props extends PanelProps<SimpleOptions> {}
export const SimplePanel: React.FC<Props> = ({ options, data, width, height }) => {
const { showSeriesCount } = options;
const seriesCount = data.series.length;
return (
<div
style={{
width,
height,
display: 'flex',
alignItems: 'center',
justifyContent: 'center',
}}
>
{showSeriesCount && <div>Series Count: {seriesCount}</div>}
</div>
);
};
高可用部署 #
架构设计 #
text
┌─────────────────────────────────────────────────────────────┐
│ 高可用架构 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ LB/Nginx │ │
│ └──────┬──────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Grafana 1 │ │ Grafana 2 │ │ Grafana 3 │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Database │ │
│ │ PostgreSQL │ │
│ └─────────────┘ │
│ │
│ ┌─────────────┐ │
│ │ Storage │ │
│ │ (S3/NFS) │ │
│ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
数据库配置 #
ini
[database]
type = postgres
host = postgres:5432
name = grafana
user = grafana
password = ${GRAFANA_DB_PASSWORD}
ssl_mode = require
max_open_conn = 20
max_idle_conn = 10
conn_max_lifetime = 14400
Session 配置 #
ini
[session]
provider = postgres
provider_config = host=postgres port=5432 user=grafana password=${GRAFANA_DB_PASSWORD} dbname=grafana sslmode=require
cookie_secure = true
cookie_samesite = strict
存储配置 #
ini
[storage]
type = s3
[storage.s3]
bucket = grafana-storage
region = us-east-1
access_key = ${AWS_ACCESS_KEY}
secret_key = ${AWS_SECRET_KEY}
Docker Compose 高可用配置 #
yaml
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
environment:
- GF_DATABASE_TYPE=postgres
- GF_DATABASE_HOST=postgres:5432
- GF_DATABASE_NAME=grafana
- GF_DATABASE_USER=grafana
- GF_DATABASE_PASSWORD=${GRAFANA_DB_PASSWORD}
- GF_SESSION_PROVIDER=postgres
- GF_SESSION_PROVIDER_CONFIG=host=postgres:5432 user=grafana password=${GRAFANA_DB_PASSWORD} dbname=grafana sslmode=disable
volumes:
- grafana-plugins:/var/lib/grafana/plugins
networks:
- grafana-net
depends_on:
- postgres
postgres:
image: postgres:15
environment:
- POSTGRES_DB=grafana
- POSTGRES_USER=grafana
- POSTGRES_PASSWORD=${GRAFANA_DB_PASSWORD}
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- grafana-net
nginx:
image: nginx:latest
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
networks:
- grafana-net
depends_on:
- grafana
volumes:
grafana-plugins:
postgres-data:
networks:
grafana-net:
性能优化 #
数据库优化 #
text
┌─────────────────────────────────────────────────────────────┐
│ 数据库优化建议 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 连接池配置: │
│ ├── max_open_conn: 20-50 │
│ ├── max_idle_conn: 10-20 │
│ └── conn_max_lifetime: 14400 │
│ │
│ 数据库选择: │
│ ├── SQLite: 仅用于开发测试 │
│ ├── MySQL: 中小规模部署 │
│ └── PostgreSQL: 大规模生产环境 │
│ │
│ 索引优化: │
│ └── 定期执行 VACUUM/ANALYZE │
│ │
└─────────────────────────────────────────────────────────────┘
查询优化 #
ini
[dataproxy]
timeout = 30
dial_timeout = 10
send_user_header = false
[unified_alerting]
execute_alerts = true
evaluation_timeout = 30s
max_concurrent_evaluations = 20
缓存配置 #
ini
[caching]
enabled = true
type = redis
[caching.redis]
host = redis:6379
password = ${REDIS_PASSWORD}
db = 0
prefix = grafana_
资源限制 #
yaml
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "500m"
memory: "1Gi"
安全配置 #
认证配置 #
text
┌─────────────────────────────────────────────────────────────┐
│ 认证方式 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Basic Auth: │
│ [auth.basic] │
│ enabled = true │
│ │
│ LDAP: │
│ [auth.ldap] │
│ enabled = true │
│ config_file = /etc/grafana/ldap.toml │
│ │
│ OAuth: │
│ ├── GitHub │
│ ├── GitLab │
│ ├── Google │
│ ├── Azure AD │
│ └── Okta │
│ │
│ SAML: │
│ [auth.saml] │
│ enabled = true │
│ │
└─────────────────────────────────────────────────────────────┘
LDAP 配置 #
toml
[[servers]]
host = "ldap.example.com"
port = 389
use_ssl = false
start_tls = false
bind_dn = "cn=admin,dc=example,dc=com"
bind_password = "${LDAP_PASSWORD}"
search_filter = "(cn=%s)"
search_base_dns = ["dc=example,dc=com"]
[servers.attributes]
name = "givenName"
surname = "sn"
username = "cn"
email = "email"
[[servers.group_mappings]]
group_dn = "cn=admin,ou=groups,dc=example,dc=com"
org_role = "Admin"
[[servers.group_mappings]]
group_dn = "cn=editor,ou=groups,dc=example,dc=com"
org_role = "Editor"
[[servers.group_mappings]]
group_dn = "*"
org_role = "Viewer"
OAuth 配置 #
ini
[auth.github]
enabled = true
allow_sign_up = true
client_id = ${GITHUB_CLIENT_ID}
client_secret = ${GITHUB_CLIENT_SECRET}
scopes = user:email,read:org
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
team_ids =
allowed_organizations =
安全加固 #
ini
[security]
admin_user = admin
admin_password = ${GRAFANA_ADMIN_PASSWORD}
secret_key = ${GRAFANA_SECRET_KEY}
disable_initial_admin_creation = false
disable_gravatar = true
cookie_secure = true
cookie_samesite = strict
allow_embedding = false
strict_transport_security = true
strict_transport_security_max_age_seconds = 86400
strict_transport_security_preload = true
strict_transport_security_subdomains = true
x_content_type_options = true
x_xss_protection = true
content_security_policy = true
content_security_policy_template = "script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'none';"
Kubernetes 部署 #
Helm Values 配置 #
yaml
replicas: 3
persistence:
enabled: false
admin:
existingSecret: grafana-admin-secret
userKey: user
passwordKey: password
env:
GF_DATABASE_TYPE: postgres
GF_DATABASE_HOST: postgres:5432
GF_DATABASE_NAME: grafana
GF_DATABASE_USER: grafana
GF_DATABASE_PASSWORD: ${GRAFANA_DB_PASSWORD}
GF_SESSION_PROVIDER: postgres
envFromSecret: grafana-db-secret
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- grafana.example.com
tls:
- secretName: grafana-tls
hosts:
- grafana.example.com
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
folder: /tmp/dashboards
datasources:
enabled: true
label: grafana_datasource
plugins:
- grafana-piechart-panel
- grafana-clock-panel
extraObjects:
- apiVersion: v1
kind: Secret
metadata:
name: grafana-db-secret
stringData:
GRAFANA_DB_PASSWORD: ${GRAFANA_DB_PASSWORD}
安装 Grafana #
bash
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install grafana grafana/grafana \
-f values.yaml \
-n monitoring \
--create-namespace
监控 Grafana 自身 #
Prometheus 监控配置 #
yaml
grafana:
env:
GF_METRICS_ENABLED: "true"
serviceMonitor:
enabled: true
labels:
release: prometheus
interval: 30s
path: /metrics
Grafana 指标 #
text
┌─────────────────────────────────────────────────────────────┐
│ Grafana 自身指标 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 请求指标: │
│ grafana_http_request_duration_seconds_bucket │
│ grafana_http_request_duration_seconds_count │
│ grafana_http_requests_total │
│ │
│ 数据库指标: │
│ grafana_db_query_duration_seconds_bucket │
│ grafana_db_query_total │
│ │
│ 会话指标: │
│ grafana_active_sessions_count │
│ grafana_session_duration_seconds │
│ │
│ 告警指标: │
│ grafana_alerting_alerts │
│ grafana_alerting_notifications_total │
│ │
│ 系统指标: │
│ grafana_build_info │
│ grafana_instance_start_time_seconds │
│ │
└─────────────────────────────────────────────────────────────┘
最佳实践总结 #
配置管理 #
text
┌─────────────────────────────────────────────────────────────┐
│ 配置管理最佳实践 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. 使用 Provisioning 管理配置 │
│ └── 版本控制所有配置文件 │
│ │
│ 2. 敏感信息使用环境变量 │
│ └── 密码、Token 等 │
│ │
│ 3. 分离环境配置 │
│ └── dev/staging/prod 使用不同配置 │
│ │
│ 4. 定期备份 │
│ └── 数据库和配置文件 │
│ │
└─────────────────────────────────────────────────────────────┘
运维建议 #
text
┌─────────────────────────────────────────────────────────────┐
│ 运维最佳实践 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. 监控 Grafana 自身 │
│ └── 使用 Prometheus 监控 Grafana │
│ │
│ 2. 设置资源限制 │
│ └── CPU 和内存限制 │
│ │
│ 3. 配置日志收集 │
│ └── 集中收集和分析日志 │
│ │
│ 4. 定期更新 │
│ └── 保持 Grafana 版本更新 │
│ │
│ 5. 灾难恢复 │
│ └── 制定备份和恢复计划 │
│ │
└─────────────────────────────────────────────────────────────┘
总结 #
恭喜你完成了 Grafana 完全指南的学习!现在你已经掌握了:
- Grafana 的基本概念和安装配置
- 仪表板的创建和管理
- 各种面板类型的使用
- 数据源的配置方法
- 告警系统的配置
- 高级功能如 Provisioning、插件开发、高可用部署
继续实践和探索,你将成为 Grafana 监控可视化专家!
最后更新:2026-03-29