Runner高级 #

一、自动扩缩 #

Docker Machine自动扩缩 #

Docker Machine可以自动创建和销毁虚拟机来执行作业。

toml
[[runners]]
  name = "autoscale-runner"
  executor = "docker+machine"
  limit = 10
  [runners.docker]
    image = "node:18"
  [runners.machine]
    IdleCount = 2
    IdleTime = 1800
    MaxBuilds = 100
    MachineDriver = "amazonec2"
    MachineName = "runner-%s"
    MachineOptions = [
      "amazonec2-access-key=YOUR_ACCESS_KEY",
      "amazonec2-secret-key=YOUR_SECRET_KEY",
      "amazonec2-region=us-east-1",
      "amazonec2-instance-type=t2.medium",
      "amazonec2-ssh-keypath=~/.ssh/id_rsa"
    ]

扩缩参数 #

参数 说明
IdleCount 空闲机器数量
IdleTime 空闲时间(秒)后销毁
MaxBuilds 最大构建次数后销毁
limit 最大机器数量

AWS EC2配置 #

toml
[[runners]]
  [runners.machine]
    MachineDriver = "amazonec2"
    MachineOptions = [
      "amazonec2-access-key=YOUR_ACCESS_KEY",
      "amazonec2-secret-key=YOUR_SECRET_KEY",
      "amazonec2-region=us-east-1",
      "amazonec2-instance-type=t2.medium",
      "amazonec2-vpc-id=vpc-xxxxx",
      "amazonec2-subnet-id=subnet-xxxxx",
      "amazonec2-security-group=ci-runners",
      "amazonec2-tags=Name,gitlab-runner"
    ]

Google Cloud配置 #

toml
[[runners]]
  [runners.machine]
    MachineDriver = "google"
    MachineOptions = [
      "google-project=your-project",
      "google-zone=us-central1-a",
      "google-machine-type=n1-standard-2",
      "google-tags=gitlab-runner"
    ]

Azure配置 #

toml
[[runners]]
  [runners.machine]
    MachineDriver = "azure"
    MachineOptions = [
      "azure-subscription-id=YOUR_SUBSCRIPTION_ID",
      "azure-client-id=YOUR_CLIENT_ID",
      "azure-client-secret=YOUR_CLIENT_SECRET",
      "azure-tenant-id=YOUR_TENANT_ID",
      "azure-location=eastus",
      "azure-size=Standard_D2s_v3"
    ]

二、Kubernetes自动扩缩 #

Horizontal Pod Autoscaler #

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gitlab-runner-hpa
  namespace: gitlab-runner
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: gitlab-runner
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Helm配置 #

yaml
# values.yaml
replicas: 2

concurrent: 10

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

runners:
  config: |
    [[runners]]
      executor = "kubernetes"
      limit = 10
      [runners.kubernetes]
        namespace = "gitlab-runner"
        image = "node:18"

三、Runner监控 #

Prometheus指标 #

toml
listen_address = ":9252"

[[runners]]
  name = "monitored-runner"
  metrics_server = ":9252"

关键指标 #

text
# Runner状态
gitlab_runner_up

# 作业总数
gitlab_runner_jobs_total

# 正在运行的作业
gitlab_runner_jobs_running_total

# 作业执行时间
gitlab_runner_job_duration_seconds

# 作业失败数
gitlab_runner_jobs_failed_total

# 缓存操作
gitlab_runner_cache_operations_total

Prometheus配置 #

yaml
# prometheus.yml
scrape_configs:
  - job_name: 'gitlab-runner'
    static_configs:
      - targets: ['runner1:9252', 'runner2:9252']

Grafana Dashboard #

导入GitLab Runner Dashboard:

  1. 进入Grafana
  2. 导入Dashboard ID: 9630
  3. 配置数据源

告警规则 #

yaml
groups:
  - name: gitlab-runner
    rules:
      - alert: RunnerDown
        expr: gitlab_runner_up == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "GitLab Runner is down"
          
      - alert: HighJobFailureRate
        expr: rate(gitlab_runner_jobs_failed_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High job failure rate"

四、日志管理 #

日志级别 #

toml
log_level = "debug"
log_format = "json"

日志级别说明 #

级别 说明
debug 调试信息
info 一般信息
warn 警告信息
error 错误信息
fatal 致命错误

JSON格式日志 #

toml
log_format = "json"

输出示例:

json
{"level":"info","time":"2024-01-01T00:00:00Z","msg":"Job succeeded","job":12345}

日志收集 #

使用ELK收集日志:

yaml
# filebeat.yml
filebeat.inputs:
- type: log
  paths:
    - /var/log/gitlab-runner/*.log
  json.keys_under_root: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

五、Runner维护 #

定期重启 #

bash
# 每周重启Runner
0 3 * * 0 /usr/bin/systemctl restart gitlab-runner

清理缓存 #

bash
# 清理Docker缓存
docker system prune -af --volumes

# 清理Runner缓存
rm -rf /srv/gitlab-runner/cache/*

更新Runner #

bash
# 停止服务
sudo gitlab-runner stop

# 更新二进制文件
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get install gitlab-runner

# 启动服务
sudo gitlab-runner start

验证配置 #

bash
# 验证Runner配置
sudo gitlab-runner verify

# 检查Runner状态
sudo gitlab-runner check

六、性能优化 #

并发配置 #

toml
# 全局并发
concurrent = 10

[[runners]]
  # Runner并发
  limit = 5
  
  # 请求并发
  request_concurrency = 3

缓存优化 #

toml
[[runners]]
  [runners.docker]
    disable_cache = false
    cache_dir = "/cache"
    
  [runners.cache]
    Type = "s3"
    Shared = true

镜像拉取优化 #

toml
[[runners]]
  [runners.docker]
    pull_policy = ["if-not-present"]
    allowed_pull_policies = ["always", "if-not-present", "never"]

资源限制 #

toml
[[runners]]
  [runners.docker]
    memory = "2g"
    cpus = "2"
    memory_swap = "4g"

七、安全加固 #

镜像白名单 #

toml
[[runners]]
  [runners.docker]
    allowed_images = [
      "node:*",
      "python:*",
      "golang:*"
    ]

服务白名单 #

toml
[[runners]]
  [runners.docker]
    allowed_services = [
      "postgres:*",
      "redis:*",
      "mysql:*"
    ]

禁用特权模式 #

toml
[[runners]]
  [runners.docker]
    privileged = false
    security_opt = ["no-new-privileges"]

网络隔离 #

toml
[[runners]]
  [runners.docker]
    network_mode = "none"
    disable_network = true

证书验证 #

toml
[[runners]]
  [runners.docker]
    tls_verify = true
    tls_cert_path = "/certs"

八、高可用配置 #

多Runner部署 #

text
┌─────────────────────────────────────────────────────────────┐
│                    GitLab Server                             │
└─────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │ Runner 1 │  │ Runner 2 │  │ Runner 3 │
        │ (Active) │  │ (Active) │  │ (Active) │
        └──────────┘  └──────────┘  └──────────┘

负载均衡 #

toml
# Runner 1
[[runners]]
  name = "runner-1"
  limit = 5

# Runner 2
[[runners]]
  name = "runner-2"
  limit = 5

故障转移 #

yaml
# 使用Kubernetes部署
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab-runner
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gitlab-runner
  template:
    spec:
      containers:
      - name: gitlab-runner
        image: gitlab/gitlab-runner:latest

九、故障排查 #

常见问题 #

1. Runner无法连接 #

bash
# 检查网络
curl -v https://gitlab.example.com

# 检查配置
sudo gitlab-runner verify

2. 作业执行失败 #

bash
# 查看日志
sudo journalctl -u gitlab-runner -f

# 调试模式
sudo gitlab-runner --debug run

3. 内存不足 #

bash
# 检查内存
free -h

# 检查Docker
docker stats

日志分析 #

bash
# 查看错误日志
sudo journalctl -u gitlab-runner | grep -i error

# 查看最近日志
sudo journalctl -u gitlab-runner -n 100

十、最佳实践 #

1. 使用标签分组 #

yaml
# 生产环境
tags:
  - production
  - docker

# 测试环境
tags:
  - staging
  - docker

2. 配置资源限制 #

toml
[[runners]]
  limit = 5
  [runners.docker]
    memory = "2g"
    cpus = "2"

3. 启用监控 #

toml
listen_address = ":9252"

4. 定期维护 #

bash
# 清理脚本
#!/bin/bash
docker system prune -af --volumes
rm -rf /srv/gitlab-runner/cache/*
sudo gitlab-runner verify

5. 备份配置 #

bash
# 备份配置
cp /etc/gitlab-runner/config.toml /etc/gitlab-runner/config.toml.bak

下一步 #

现在你已经掌握了Runner高级特性,接下来让我们学习 模板与继承

最后更新:2026-03-28