HBase二级索引 #

一、二级索引概述 #

HBase只支持RowKey索引，二级索引用于加速非RowKey列的查询。

1.1 为什么需要二级索引 #

text

HBase查询限制
├── 只支持RowKey查询
│   └── get/scan基于RowKey
│
├── 非RowKey列查询
│   └── 需要全表扫描
│
└── 二级索引解决
    └── 加速非RowKey列查询

1.2 二级索引方案 #

text

二级索引方案
├── Phoenix索引
│   └── 全局索引、本地索引
│
├── Coprocessor索引
│   └── 使用协处理器维护索引
│
├── 外部索引
│   ├── Elasticsearch
│   └── Lucene
│
└── 双写方案
    └── 应用层维护索引表

二、Phoenix二级索引 #

2.1 全局索引 #

text

全局索引特点
├── 独立的索引表
├── 索引数据分布在所有Region
├── 适合读多写少场景
└── 写入需要跨Region更新索引

sql

-- 创建全局索引
CREATE INDEX idx_user_email ON user (email);

-- 创建覆盖索引
CREATE INDEX idx_user_email_name ON user (email) INCLUDE (name, age);

-- 使用索引查询
SELECT email, name, age FROM user WHERE email = 'test@example.com';

2.2 本地索引 #

text

本地索引特点
├── 索引数据与原数据在同一Region
├── 写入性能好
├── 适合写多读少场景
└── 查询需要访问所有Region

sql

-- 创建本地索引
CREATE LOCAL INDEX idx_local_email ON user (email);

-- 使用索引查询
SELECT * FROM user WHERE email = 'test@example.com';

2.3 函数索引 #

sql

-- 创建函数索引
CREATE INDEX idx_lower_email ON user (LOWER(email));

-- 使用函数索引
SELECT * FROM user WHERE LOWER(email) = 'test@example.com';

2.4 覆盖索引 #

sql

-- 创建覆盖索引
CREATE INDEX idx_cover ON user (email) INCLUDE (name, age);

-- 覆盖索引查询（不需要回表）
SELECT email, name, age FROM user WHERE email = 'test@example.com';

2.5 索引管理 #

sql

-- 查看索引
!indexes user

-- 禁用索引
ALTER INDEX idx_user_email ON user DISABLE;

-- 启用索引
ALTER INDEX idx_user_email ON user ENABLE;

-- 重建索引
ALTER INDEX idx_user_email ON user REBUILD;

-- 删除索引
DROP INDEX idx_user_email ON user;

三、Coprocessor二级索引 #

3.1 实现原理 #

text

Coprocessor索引原理
├── RegionObserver监听数据变更
│   ├── prePut/postPut
│   └── preDelete/postDelete
│
├── 同步更新索引表
│   └── 写入索引数据
│
└── 查询时先查索引表
    └── 获取RowKey后再查原表

3.2 实现示例 #

java

import org.apache.hadoop.hbase.coprocessor.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class IndexObserver implements RegionObserver {
    
    private static final byte[] INDEX_TABLE = Bytes.toBytes("user_email_index");
    private static final byte[] CF = Bytes.toBytes("info");
    private static final byte[] EMAIL_COL = Bytes.toBytes("email");
    private static final byte[] IDX_CF = Bytes.toBytes("idx");
    private static final byte[] ROWKEY_COL = Bytes.toBytes("rowkey");
    
    @Override
    public void postPut(ObserverContext<RegionCoprocessorEnvironment> c,
                        Put put, WALEdit edit, Durability durability) 
        throws IOException {
        
        // 获取email值
        byte[] emailValue = getValue(put, CF, EMAIL_COL);
        if (emailValue == null) return;
        
        // 获取RowKey
        byte[] rowKey = put.getRow();
        
        // 写入索引表
        Connection conn = c.getEnvironment().getConnection();
        Table indexTable = conn.getTable(TableName.valueOf(INDEX_TABLE));
        
        Put indexPut = new Put(emailValue);
        indexPut.addColumn(IDX_CF, ROWKEY_COL, rowKey);
        indexTable.put(indexPut);
        
        indexTable.close();
    }
    
    @Override
    public void postDelete(ObserverContext<RegionCoprocessorEnvironment> c,
                           Delete delete, WALEdit edit, Durability durability) 
        throws IOException {
        
        // 删除索引数据
        // 实现删除逻辑
    }
    
    private byte[] getValue(Put put, byte[] cf, byte[] qualifier) {
        List<Cell> cells = put.get(cf, qualifier);
        return cells.isEmpty() ? null : CellUtil.cloneValue(cells.get(0));
    }
}

3.3 查询索引 #

java

public byte[] queryByEmail(Connection conn, String email) throws IOException {
    // 查询索引表
    Table indexTable = conn.getTable(TableName.valueOf("user_email_index"));
    Get get = new Get(Bytes.toBytes(email));
    Result result = indexTable.get(get);
    
    if (result.isEmpty()) {
        return null;
    }
    
    // 获取RowKey
    byte[] rowKey = result.getValue(Bytes.toBytes("idx"), Bytes.toBytes("rowkey"));
    
    indexTable.close();
    return rowKey;
}

public Result queryUserByEmail(Connection conn, String email) throws IOException {
    byte[] rowKey = queryByEmail(conn, email);
    if (rowKey == null) {
        return null;
    }
    
    // 查询原表
    Table userTable = conn.getTable(TableName.valueOf("user"));
    Get get = new Get(rowKey);
    Result result = userTable.get(get);
    
    userTable.close();
    return result;
}

四、Elasticsearch索引 #

4.1 架构设计 #

text

Elasticsearch索引架构
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  ┌─────────────┐         ┌─────────────┐                          │
│  │ Application │ ──────► │   HBase     │                          │
│  └─────────────┘         └─────────────┘                          │
│         │                       │                                  │
│         │                       │                                  │
│         ▼                       ▼                                  │
│  ┌─────────────┐         ┌─────────────┐                          │
│  │Elasticsearch│ ◄────── │   Sync      │                          │
│  │   (索引)    │         │  (同步)     │                          │
│  └─────────────┘         └─────────────┘                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

4.2 同步方案 #

java

import org.apache.hadoop.hbase.client.*;
import org.elasticsearch.client.*;
import org.elasticsearch.action.index.IndexRequest;
import com.fasterxml.jackson.databind.ObjectMapper;

public class HBaseESSync {
    
    private RestHighLevelClient esClient;
    private ObjectMapper mapper = new ObjectMapper();
    
    public void syncToES(Connection hbaseConn, String tableName) throws Exception {
        Table table = hbaseConn.getTable(TableName.valueOf(tableName));
        Scan scan = new Scan();
        ResultScanner scanner = table.getScanner(scan);
        
        for (Result result : scanner) {
            // 构建文档
            Map<String, Object> doc = new HashMap<>();
            String rowKey = Bytes.toString(result.getRow());
            doc.put("id", rowKey);
            
            // 添加其他字段
            byte[] nameValue = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));
            if (nameValue != null) {
                doc.put("name", Bytes.toString(nameValue));
            }
            
            byte[] emailValue = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("email"));
            if (emailValue != null) {
                doc.put("email", Bytes.toString(emailValue));
            }
            
            // 索引到ES
            IndexRequest request = new IndexRequest("user_index")
                .id(rowKey)
                .source(mapper.writeValueAsString(doc), XContentType.JSON);
            
            esClient.index(request, RequestOptions.DEFAULT);
        }
        
        scanner.close();
        table.close();
    }
}

4.3 查询流程 #

java

public List<String> searchUserIds(String query) throws Exception {
    // 1. 查询ES获取RowKey列表
    SearchRequest request = new SearchRequest("user_index");
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(QueryBuilders.matchQuery("name", query));
    sourceBuilder.fetchSource(false);  // 只返回ID
    request.source(sourceBuilder);
    
    SearchResponse response = esClient.search(request, RequestOptions.DEFAULT);
    
    List<String> rowKeys = new ArrayList<>();
    for (SearchHit hit : response.getHits()) {
        rowKeys.add(hit.getId());
    }
    
    return rowKeys;
}

public List<Result> queryUsersBySearch(Connection conn, String query) throws Exception {
    List<String> rowKeys = searchUserIds(query);
    
    // 2. 根据RowKey批量查询HBase
    List<Get> gets = new ArrayList<>();
    for (String rowKey : rowKeys) {
        gets.add(new Get(Bytes.toBytes(rowKey)));
    }
    
    Table table = conn.getTable(TableName.valueOf("user"));
    Result[] results = table.get(gets);
    table.close();
    
    return Arrays.asList(results);
}

五、索引方案对比 #

5.1 方案对比 #

方案	优势	劣势	适用场景
Phoenix索引	简单易用、SQL支持	写入性能影响	读多写少
Coprocessor索引	灵活可控	开发复杂	自定义需求
Elasticsearch索引	强大搜索能力	需要同步	全文搜索
双写方案	简单直接	一致性难保证	简单场景

5.2 选择建议 #

text

索引方案选择
├── 简单SQL查询
│   └── Phoenix索引
│
├── 自定义索引逻辑
│   └── Coprocessor索引
│
├── 全文搜索需求
│   └── Elasticsearch索引
│
└── 简单场景
    └── 双写方案

六、索引设计原则 #

6.1 索引列选择 #

text

索引列选择原则
├── 高选择性列
│   └── 值分布均匀
│
├── 高频查询列
│   └── 经常作为查询条件
│
├── 组合索引
│   └── 多列组合查询
│
└── 避免过度索引
    └── 影响写入性能

6.2 索引维护 #

text

索引维护建议
├── 定期重建索引
│   └── 修复不一致
│
├── 监控索引性能
│   └── 关注查询效率
│
├── 索引数据备份
│   └── 防止数据丢失
│
└── 索引空间管理
    └── 控制索引数量

七、索引最佳实践 #

7.1 Phoenix索引最佳实践 #

sql

-- 1. 选择合适的索引类型
-- 读多写少：全局索引
CREATE INDEX idx_global ON user (email);

-- 写多读少：本地索引
CREATE LOCAL INDEX idx_local ON user (email);

-- 2. 使用覆盖索引避免回表
CREATE INDEX idx_cover ON user (email) INCLUDE (name, age);

-- 3. 异步构建大表索引
CREATE INDEX idx_async ON user (email) ASYNC;

7.2 索引一致性保证 #

text

索引一致性保证
├── 使用事务
│   └── Phoenix事务支持
│
├── 定期校验
│   └── 对比原表和索引表
│
├── 重建索引
│   └── 发现不一致时重建
│
└── 监控告警
    └── 监控索引状态

7.3 性能优化 #

text

索引性能优化
├── 批量写入
│   └── 减少索引更新次数
│
├── 异步索引
│   └── 异步构建索引
│
├── 索引压缩
│   └── 减少索引存储空间
│
└── 合理分区
    └── 索引表预分区

八、常见问题 #

8.1 索引不一致 #

sql

-- 问题：索引数据与原数据不一致
-- 解决：重建索引

ALTER INDEX idx_user_email ON user REBUILD;

8.2 索引写入慢 #

sql

-- 问题：写入性能下降
-- 解决：使用本地索引或异步索引

-- 使用本地索引
CREATE LOCAL INDEX idx_local ON user (email);

-- 使用异步索引
CREATE INDEX idx_async ON user (email) ASYNC;

8.3 索引查询不生效 #

sql

-- 问题：查询未使用索引
-- 解决：使用Hint或检查查询条件

-- 使用Hint
SELECT /*+ INDEX(user idx_email) */ * FROM user WHERE email = 'test@example.com';

-- 检查索引
!indexes user

九、总结 #

本节介绍了HBase二级索引：

方案	特点
Phoenix索引	简单易用，SQL支持
Coprocessor索引	灵活可控，自定义逻辑
Elasticsearch索引	强大搜索能力
双写方案	简单直接

下一步，让我们学习管理与运维！