使用纠删码系统

提示

快速体验请看 单机部署

编译构建

$ git clone https://github.com/cubefs/cubefs.git
$ source build/cgo_env.sh
$ make blobstore

构建成功后,将在 build/bin/blobstore 目录中生成以下可执行文件:

├── build/bin/blobstore
│   ├── access
│   ├── clustermgr
│   ├── proxy
│   ├── scheduler
│   ├── blobnode
│   └── blobstore-cli

集群部署

由于模块之间具有一定的关联性,需按照以下顺序进行部署,避免服务依赖导致部署失败。

基础环境

  1. 支持平台

    Linux

  2. 依赖组件

    Kafkaopen in new window

    Consulopen in new window

  3. 语言环境

    Goopen in new window (1.17.x)

启动 Clustermgr

提示

部署 Clustermgr 至少需要三个节点,以保证服务可用性。

启动节点示例如下,节点启动需要更改对应配置文件,并保证集群节点之间的关联配置是一致。

  1. 启动(三节点集群)
# 示例,推荐采用进程托管启停服务,下同
nohup ./clustermgr -f clustermgr.conf
nohup ./clustermgr -f clustermgr1.conf
nohup ./clustermgr -f clustermgr2.conf
  1. 三个节点的集群配置,示例节点一:clustermgr.conf
{
     "bind_addr":":9998",
     "cluster_id":1,
     "idc":["z0"],
     "chunk_size": 16777216,
     "log": {
         "level": "info",
         "filename": "./run/logs/clustermgr.log"
      },
     "auth": {
         "enable_auth": false,
         "secret": "testsecret"
     },
     "region": "test-region",
     "db_path":"./run/db0",
     "code_mode_policies": [ 
         {"mode_name":"EC3P3","min_size":0,"max_size":50331648,"size_ratio":1,"enable":true}
     ],
     "raft_config": {
         "server_config": {
             "nodeId": 1,
             "listen_port": 10110,
             "raft_wal_dir": "./run/raftwal0"
         },
         "raft_node_config":{
             "node_protocol": "http://",
             "members": [
                     {"id":1, "host":"127.0.0.1:10110", "learner": false, "node_host":"127.0.0.1:9998"},
                     {"id":2, "host":"127.0.0.1:10111", "learner": false, "node_host":"127.0.0.1:9999"},
                     {"id":3, "host":"127.0.0.1:10112", "learner": false, "node_host":"127.0.0.1:10000"}]
         }
     },
     "volume_mgr_config": {
       "allocatable_size": 10485760
     },
     "disk_mgr_config": {
         "refresh_interval_s": 10,
         "rack_aware":false,
         "host_aware":false
     }
}

启动成功后,参照 ClusterMgr 管理 API 中提及的后台任务类型,设置初始值

#示例
$> curl -X POST http://127.0.0.1:9998/config/set -d '{"key":"balance","value":"false"}' --header 'Content-Type: application/json'

启动 Proxy

  1. proxy 依赖 kafka 组件,需要提前创建 blob_delete_topic、shard_repair_topic、shard_repair_priority_topic 对应主题

提示

kafka 也可以用其他主题名,需要保证 Proxy 与 Scheduler 两个服务模块的 kafka 一致。

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic blob_delete shard_repair shard_repair_priority
  1. 启动服务。保证可用性,每个机房 idc 至少需要部署一个 proxy 节点
nohup ./proxy -f proxy.conf &
  1. 示例 proxy.conf:
{
   "bind_addr": ":9600",
   "host": "http://127.0.0.1:9600",
   "idc": "z0",
   "cluster_id": 1,
   "clustermgr": {
     "hosts": [
       "http://127.0.0.1:9998",
       "http://127.0.0.1:9999",
       "http://127.0.0.1:10000"
       ]
   },
   "auth": {
       "enable_auth": false,
       "secret": "test"
   },
   "mq": {
     "blob_delete_topic": "blob_delete",
     "shard_repair_topic": "shard_repair",
     "shard_repair_priority_topic": "shard_repair_prior",
     "msg_sender": {
       "broker_list": ["127.0.0.1:9092"]
     }
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/proxy.log"
   }
}

启动 Scheduler

  1. 启动服务
nohup ./scheduler -f scheduler.conf &
  1. 示例 scheduler.conf: 注意 Scheduler 模块单节点部署
{
   "bind_addr": ":9800",
   "cluster_id": 1,
   "services": { 
     "leader": 1,
     "node_id": 1,
     "members": {"1": "127.0.0.1:9800"}
   },
   "service_register": {
     "host": "http://127.0.0.1:9800",
     "idc": "z0"
   },
   "clustermgr": { 
     "hosts": ["http://127.0.0.1:9998", "http://127.0.0.1:9999", "http://127.0.0.1:10000"]
   },
   "kafka": {
     "broker_list": ["127.0.0.1:9092"]
   },
   "blob_delete": {
     "max_batch_size": 10,
     "batch_interval_s": 2,
     "delete_log": {
       "dir": "./run/delete_log"
     }
   },
   "shard_repair": {
     "orphan_shard_log": {
       "dir": "./run/orphan_shard_log"
     }
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/scheduler.log"
   },
   "task_log": {
     "dir": "./run/task_log"
   }
}

启动 BlobNode

  1. 在编译好的 blobnode 二进制目录下 创建相关目录
# 该目录对应配置文件的路径
mkdir -p ./run/disks/disk{1..8} # 每个目录需要挂载磁盘,保证数据收集准确性
  1. 启动服务
nohup ./blobnode -f blobnode.conf
  1. 示例 blobnode.conf:
{
   "bind_addr": ":8899",
   "cluster_id": 1,
   "idc": "z0",
   "rack": "testrack",
   "host": "http://127.0.0.1:8899",
   "dropped_bid_record": {
     "dir": "./run/logs/blobnode_dropped"
   },
   "disks": [
     {
       "path": "./run/disks/disk1",
       "auto_format": true,
       "max_chunks": 1024 
     },
     {
       "path": "./run/disks/disk2",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk3",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk4",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk5",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk6",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk7",
       "auto_format": true,
       "max_chunks": 1024
     },
     {
       "path": "./run/disks/disk8",
       "auto_format": true,
       "max_chunks": 1024
     }
   ],
   "clustermgr": {
     "hosts": [
       "http://127.0.0.1:9998",
       "http://127.0.0.1:9999",
       "http://127.0.0.1:10000"
     ]
   },
   "disk_config":{
     "disk_reserved_space_B":1
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/blobnode.log"
   }
}

启动 Access

提示

Access 模块为无状态服务节点,可以部署多个节点

  1. 启动服务。
nohup ./access -f access.conf
  1. 示例 access.conf:
{
     "bind_addr": ":9500",
     "log": {
         "level": "info",
         "filename": "./run/logs/access.log"
      },
     "stream": {
         "idc": "z0",
         "cluster_config": {
             "region": "test-region",
             "clusters":[
                 {"cluster_id":1,"hosts":["http://127.0.0.1:9998","http://127.0.0.1:9999","http://127.0.0.1:10000"]}]
         }
     }
}

配置说明

部署提示

  1. 对于 Clustermgr 和 BlobNode 部署失败后,重新部署需清理残留数据,避免注册盘失败或者数据显示错误,命令如下:
# blobnode示例
rm -f -r ./run/disks/disk*/.*
rm -f -r ./run/disks/disk*/*

# clustermgr示例
rm -f -r /tmp/raftdb0
rm -f -r /tmp/volumedb0
rm -f -r /tmp/clustermgr
rm -f -r /tmp/normaldb0
rm -f -r /tmp/normalwal0
  1. clustermgr 增加 learner 节点

提示

learner 节点一般用于数据备份,故障恢复

  • 在新的节点启用 Clustermgr 服务,将新服务中的配置中加上当前节点的成员信息;
  • 调用 成员添加接口API 将刚启动的 learner 节点加到集群中;
    curl -X POST --header 'Content-Type: application/json' -d '{"peer_id": 4, "host": "127.0.0.1:10113","node_host": "127.0.0.1:10001", "member_type": 1}' "http://127.0.0.1:9998/member/add" 
    
  • 添加成功后,数据会自动同步

参考配置如下 Clustermgr-learner.conf:

{
     "bind_addr":":10001",
     "cluster_id":1,
     "idc":["z0"],
     "chunk_size": 16777216,
     "log": {
         "level": "info",
         "filename": "./run/logs/clustermgr3.log"
      },
     "auth": {
         "enable_auth": false,
         "secret": "testsecret"
     },
     "region": "test-region",
     "db_path":"./run/db3",
     "code_mode_policies": [ 
         {"mode_name":"EC3P3","min_size":0,"max_size":50331648,"size_ratio":1,"enable":true}
     ],
     "raft_config": {
         "server_config": {
             "nodeId": 4,
             "listen_port": 10113,
             "raft_wal_dir": "./run/raftwal3"
         },
         "raft_node_config":{
             "node_protocol": "http://",
             "members": [
                     {"id":1, "host":"127.0.0.1:10110", "learner": false, "node_host":"127.0.0.1:9998"},
                     {"id":2, "host":"127.0.0.1:10111", "learner": false, "node_host":"127.0.0.1:9999"},
                     {"id":3, "host":"127.0.0.1:10112", "learner": false, "node_host":"127.0.0.1:10000"},
                     {"id":4, "host":"127.0.0.1:10113", "learner": true, "node_host": "127.0.0.1:10001"}]
         }
     },
     "disk_mgr_config": {
         "refresh_interval_s": 10,
         "rack_aware":false,
         "host_aware":false
     }
}
  1. 所有模块部署成功后,上传验证需要延缓一段时间,等待创建卷成功。

上传测试

可参考快速使用

修改 Master 配置支持纠删码

修改 Master 配置文件中的 ebsAddr 配置项(更多配置参考),配置为 Access 节点注册的 Consul 地址。

创建纠删码卷

参考 创建卷

附录

  1. 编码策略: 常用策略表
类别描述
EC12P4{N: 12, M: 04, L: 0, AZCount: 1, PutQuorum: 15, GetQuorum: 0, MinShardSize: 2048}
EC3P3{N: 3, M: 3, L: 0, AZCount: 1, PutQuorum: 5, GetQuorum: 0, MinShardSize: 2048}
EC16P20L2{N: 16, M: 20, L: 2, AZCount: 2, PutQuorum: 34, GetQuorum: 0, MinShardSize: 2048}
EC6P10L2{N: 6, M: 10, L: 2, AZCount: 2, PutQuorum: 14, GetQuorum: 0, MinShardSize: 2048}
EC12P9{N: 12, M: 9, L: 0, AZCount: 3, PutQuorum: 20, GetQuorum: 0, MinShardSize: 2048}
EC15P12{N: 15, M: 12, L: 0, AZCount: 3, PutQuorum: 24, GetQuorum: 0, MinShardSize: 2048}
EC6P6{N: 6, M: 6, L: 0, AZCount: 3, PutQuorum: 11, GetQuorum: 0, MinShardSize: 2048}

其中

  • N: 数据块数量,M: 校验块数量, L: 本地校验块数量,AZCount: AZ数量
  • PutQuorum: (N + M) / AZCount + N \<= PutQuorum \<= M + N
  • MinShardSize: 最小shard大小,将数据连续填充到0-N分片中,如果数据大小小于MinShardSize*N,则与零字节对齐,详见代码open in new window
在github上编辑