集群管理 #

一、集群规划 #

1.1 硬件规划 #

text

硬件规划建议：

CPU
├── 最小：2核
├── 推荐：8核+
└── 生产：16核+

内存
├── 最小：4GB
├── 推荐：16GB+
└── 生产：32GB+

磁盘
├── 类型：SSD（必须）
├── 最小：50GB
├── 推荐：500GB+
└── 建议：数据和提交日志分开

网络
├── 最小：百兆
├── 推荐：千兆
└── 生产：万兆

1.2 集群规模 #

text

集群规模建议：

开发环境
├── 节点数：1
└── RF：1

测试环境
├── 节点数：3
└── RF：2

生产环境
├── 节点数：3+（奇数）
├── RF：3
└── 多数据中心：每DC 3+

容量规划
├── 数据量 / (节点数 × 压缩比) = 每节点存储
├── 预留30%空间给压缩
└── 预留20%空间给增长

1.3 网络拓扑 #

text

网络拓扑设计：

单数据中心
┌─────────────────────────────────────────────────────────┐
│                    Data Center 1                        │
├─────────────────────────────────────────────────────────┤
│  Rack 1              Rack 2              Rack 3         │
│  ┌─────────┐         ┌─────────┐         ┌─────────┐   │
│  │ Node 1  │         │ Node 2  │         │ Node 3  │   │
│  └─────────┘         └─────────┘         └─────────┘   │
└─────────────────────────────────────────────────────────┘

多数据中心
┌───────────────────────┐       ┌───────────────────────┐
│    Data Center 1      │       │    Data Center 2      │
│    (北京)             │       │    (上海)             │
├───────────────────────┤       ├───────────────────────┤
│  ┌─────┐  ┌─────┐     │       │  ┌─────┐  ┌─────┐     │
│  │ N1  │  │ N2  │     │       │  │ N4  │  │ N5  │     │
│  └─────┘  └─────┘     │       │  └─────┘  └─────┘     │
│  ┌─────┐              │       │  ┌─────┐              │
│  │ N3  │              │       │  │ N6  │              │
│  └─────┘              │       │  └─────┘              │
└───────────────────────┘       └───────────────────────┘

二、集群部署 #

2.1 配置文件 #

yaml

# cassandra.yaml 核心配置

# 集群名称
cluster_name: 'Production Cluster'

# 种子节点（至少2个）
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "192.168.1.1,192.168.1.2,192.168.1.3"

# 监听地址
listen_address: 192.168.1.1
rpc_address: 192.168.1.1

# 端口
native_transport_port: 9042
storage_port: 7000
ssl_storage_port: 7001

# 数据目录
data_file_directories:
  - /data/cassandra/data
commitlog_directory: /data/cassandra/commitlog
saved_caches_directory: /data/cassandra/saved_caches

# 内存配置
max_heap_size: 8G
heap_new_size: 2G

# 并发配置
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

# Snitch配置
endpoint_snitch: GossipingPropertyFileSnitch

2.2 机架配置 #

properties

# cassandra-rackdc.properties
dc=DC1
rack=RAC1

2.3 JVM配置 #

text

# jvm.options

# 堆内存
-Xms8G
-Xmx8G

# 年轻代
-Xmn2G

# GC配置
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

# GC日志
-Xlog:gc*:file=/var/log/cassandra/gc.log:time,uptime,level,tags

三、集群操作 #

3.1 启动集群 #

bash

# 启动单个节点
bin/cassandra

# 后台启动
bin/cassandra -R

# 检查状态
nodetool status

3.2 停止集群 #

bash

# 正常停止
nodetool drain
bin/stop-server

# 强制停止（不推荐）
kill -9 <pid>

3.3 滚动重启 #

bash

# 滚动重启流程（逐节点执行）

# 1. 停止节点
nodetool drain
bin/stop-server

# 2. 更新配置（如需要）

# 3. 启动节点
bin/cassandra

# 4. 验证状态
nodetool status
nodetool netstats

# 5. 等待稳定后继续下一个节点

四、集群扩容 #

4.1 添加节点 #

bash

# 1. 配置新节点
# 修改cassandra.yaml
cluster_name: 'Production Cluster'
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
      - seeds: "现有种子节点"
listen_address: 新节点IP
auto_bootstrap: true

# 2. 启动新节点
bin/cassandra

# 3. 监控数据迁移
nodetool netstats

# 4. 验证状态
nodetool status

4.2 数据平衡 #

bash

# 查看数据分布
nodetool ring

# 清理多余数据
nodetool cleanup

# 查看负载
nodetool info

五、集群缩容 #

5.1 移除节点 #

bash

# 方式1：正常退役（节点在线）
nodetool decommission

# 方式2：强制移除（节点离线）
# 在其他节点执行
nodetool removenode <host_id>

# 查看host_id
nodetool status

5.2 替换节点 #

bash

# 1. 获取故障节点Token
nodetool status

# 2. 配置新节点
auto_bootstrap: false

# 3. 启动时指定替换
bin/cassandra -Dcassandra.replace_address=<故障节点IP>

# 4. 验证
nodetool status

六、集群维护 #

6.1 日常检查 #

bash

# 集群状态
nodetool status

# 节点信息
nodetool info

# 数据统计
nodetool tablestats

# 网络状态
nodetool netstats

# 压缩状态
nodetool compactionstats

6.2 定期维护 #

bash

# 修复数据一致性
nodetool repair

# 清理墓碑
nodetool garbagecollect

# 刷新MemTable
nodetool flush

# 压缩
nodetool compact

七、总结 #

集群管理要点：

硬件规划：CPU、内存、磁盘、网络
集群规模：至少3节点，RF=3
配置管理：cassandra.yaml、JVM配置
滚动操作：逐节点重启，保证可用性
扩容缩容：添加/移除节点流程
日常维护：状态检查、数据修复

下一步，让我们学习节点管理！