EN
< Back

CubeFS v3.2.1 Official Release

2023-03-16CubeFS

Abstract

CubeFS Release 3.2.1 was released on March 16, 2023, with many updates including new features, stability optimizations, and bugfixes. Major features:

  • Support for directory file number quotas to prevent excessive memory consumption in MetaNode MP caused by large directories.
  • Control of data partition offline migration concurrency speed.
  • Master interface rate limiting to prevent overload due to sudden high concurrency.
  • Operation audit logs.
  • Integration with S3 erasure coding subsystem (Blobstore).

Major bugfixes:

  • During master snapshot synchronization, metadata information was only added from the leader, resulting in metadata that had already been deleted on the leader not being cleaned up.
  • When there are too many data partitions, master restarts may consume large amounts of memory and pose an OOM risk.
  • A large number of extent cleanup requests may cause datanode TCP connections to become overloaded.

Version link: https://github.com/cubefs/cubefs/releases/tag/v3.2.1

Feature introduction

Support for directory file number quotas

Limit the number of child files or directories in a single directory to avoid excessive memory usage in the meta partition where the parent directory resides, leading to uneven memory usage in MetaNode and possible OOM.

  • The default quota for the number of children per directory is 20 million and can be configured with a minimum value of 1 million and no upper limit. If the number of children created in a single directory exceeds the quota, creation will fail.
  • The configured quota value applies to the entire cluster and is persisted in the master.
  • Full support requires upgrading to version 3.2.1 of Client, MetaNode, and Master.

Data partition offline rate limiting

During the offline process of a data partition, a target node is selected from the nodeset of the source node/disk where the data partition is located to migrate into the data partition. Therefore, each nodeset will include a decommissionList for offline data partitions to be decommissioned periodically (every 5 seconds) to achieve rate limiting.

Limit the number of data partitions that can be offline in parallel for each nodeset.

ar

DecommissionDataPartitionList is a FIFO queue. When a data partition is offline (successfully or unsuccessfully) or stopped, it is removed from the queue.

The maximum number of data partitions that can be offline in parallel is configurable for DecommissionDataPartitionList.

Configuration interface: /admin/updateDecommissionLimit

Master interface rate limiting

To prevent situations where the master becomes unavailable due to a sudden surge of high-concurrency requests, all interfaces now support rate limiting.

  • Interface-level rate limiting can control the rate limit value for specified interfaces. If not configured, there is no rate limit.
  • The waiting time after configuring rate limiting for an interface can be set. If the waiting time is exceeded, the interface returns a 429 response code to prevent cascading failures.

Operation audit logs

The operation audit trail for the mount point is stored in a directory specified by the client locally, making it easy to integrate with third-party log collection platforms.

  • The client can choose to start or stop audit functionality independently without the need to remount.
  • Audit format:
[Cluster name (master domain name or IP), volume name, subdir, mountpoint, timestamp, clientip, client hostname, operation type (create, delete, rename), source path, target path, error message, operation duration, source file inode, destination file inode]

Integration with S3 erasure coding subsystem (Blobstore)

  • Support for accessing erasure-coded volumes through the S3 protocol.

Future version plans

  • Preview of version 3.2.2 updates:
  • Posix interface atomicity.
  • Directory space quotas.
  • User UID quotas.
  • Improvement of operation and stability.
  • Blobstore caches volume and disk information on Proxy nodes, optimizing background task concurrency strategies.

From the BIGO team:

  • M1eyu, @M1eyu2018
  • litao, @tomscut
  • qinyuren, @liubingxing

From the OPPO team:

  • Patrick Wu,@wuchunhuan
  • Whale Tang,@true1064
  • baijiaruo, @baijiaruo
  • Victor1319, @Victor1319
  • leonchang,@leonrayang
  • zhangtianjiong,@zhangtianjiong
  • AmazingChi, @bboyCH4