Skip to content

Understanding BTRFS Qgroups: User Space Applications and System Calls

Introduction

I was recently tasked with building an interface to wrap certain Btrfs utilities—such as creating qgroup IDs, setting inheritance, and applying limits—that go beyond the functionality exposed by libbtrfsutil-dev. To address this gap, I studied how btrfs-progs leverages direct system calls to perform operations not covered by the standard utilities.

In this post, I’ll share my findings on Btrfs qgroups (quota groups), a powerful feature for storage management, and walk through how user-space applications can interact with them via available APIs, system calls, and practical implementation techniques.

BTRFS Utilities Library Overview

The BTRFS utilities library (libbtrfsutil-dev) provides a comprehensive set of functions for managing BTRFS subvolumes and related operations. However, it's important to note that qgroup creation and management operations are not exposed through this library, likely due to their critical nature and the need for direct kernel interaction.

Available Operations in BTRFS Utilities

The library exposes several categories of operations:

1. GET OPERATIONS

// Get error string
const char *btrfs_util_strerror(enum btrfs_util_error err);

// Get subvolume information
enum btrfs_util_error btrfs_util_subvolume_id(const char *path, uint64_t *id_ret);
enum btrfs_util_error btrfs_util_subvolume_id_fd(int fd, uint64_t *id_ret);
enum btrfs_util_error btrfs_util_subvolume_path(const char *path, uint64_t id, char **path_ret);
enum btrfs_util_error btrfs_util_subvolume_path_fd(int fd, uint64_t id, char **path_ret);
enum btrfs_util_error btrfs_util_subvolume_info(const char *path, uint64_t id, struct btrfs_util_subvolume_info *subvol);
enum btrfs_util_error btrfs_util_subvolume_info_fd(int fd, uint64_t id, struct btrfs_util_subvolume_info *subvol);

// Get read-only status
enum btrfs_util_error btrfs_util_get_subvolume_read_only(const char *path, bool *ret);
enum btrfs_util_error btrfs_util_get_subvolume_read_only_fd(int fd, bool *ret);

// Get default subvolume
enum btrfs_util_error btrfs_util_get_default_subvolume(const char *path, uint64_t *id_ret);
enum btrfs_util_error btrfs_util_get_default_subvolume_fd(int fd, uint64_t *id_ret);

// Get deleted subvolumes
enum btrfs_util_error btrfs_util_deleted_subvolumes(const char *path, uint64_t **ids, size_t *n);
enum btrfs_util_error btrfs_util_deleted_subvolumes_fd(int fd, uint64_t **ids, size_t *n);

// Get qgroup information
void btrfs_util_qgroup_inherit_get_groups(const struct btrfs_util_qgroup_inherit *inherit, const uint64_t **groups, size_t *n);

2. SET OPERATIONS

// Set read-only status
enum btrfs_util_error btrfs_util_set_subvolume_read_only(const char *path, bool read_only);
enum btrfs_util_error btrfs_util_set_subvolume_read_only_fd(int fd, bool read_only);

// Set default subvolume
enum btrfs_util_error btrfs_util_set_default_subvolume(const char *path, uint64_t id);
enum btrfs_util_error btrfs_util_set_default_subvolume_fd(int fd, uint64_t id);

3. CREATE OPERATIONS

// Create subvolumes
enum btrfs_util_error btrfs_util_create_subvolume(const char *path, int flags, uint64_t *unused, struct btrfs_util_qgroup_inherit *qgroup_inherit);
enum btrfs_util_error btrfs_util_create_subvolume_fd(int parent_fd, const char *name, int flags, uint64_t *unused, struct btrfs_util_qgroup_inherit *qgroup_inherit);

// Create snapshots
enum btrfs_util_error btrfs_util_create_snapshot(const char *source, const char *path, int flags, uint64_t *unused, struct btrfs_util_qgroup_inherit *qgroup_inherit);
enum btrfs_util_error btrfs_util_create_snapshot_fd(int fd, const char *path, int flags, uint64_t *unused, struct btrfs_util_qgroup_inherit *qgroup_inherit);
enum btrfs_util_error btrfs_util_create_snapshot_fd2(int fd, int parent_fd, const char *name, int flags, uint64_t *unused, struct btrfs_util_qgroup_inherit *qgroup_inherit);

// Create iterators
enum btrfs_util_error btrfs_util_create_subvolume_iterator(const char *path, uint64_t top, int flags, struct btrfs_util_subvolume_iterator **ret);
enum btrfs_util_error btrfs_util_create_subvolume_iterator_fd(int fd, uint64_t top, int flags, struct btrfs_util_subvolume_iterator **ret);

// Create qgroup inheritance
enum btrfs_util_error btrfs_util_create_qgroup_inherit(int flags, struct btrfs_util_qgroup_inherit **ret);

4. DELETE OPERATIONS

// Delete subvolumes
enum btrfs_util_error btrfs_util_delete_subvolume(const char *path, int flags);
enum btrfs_util_error btrfs_util_delete_subvolume_fd(int parent_fd, const char *name, int flags);
enum btrfs_util_error btrfs_util_delete_subvolume_by_id_fd(int fd, uint64_t subvolid);

// Destroy resources
void btrfs_util_destroy_subvolume_iterator(struct btrfs_util_subvolume_iterator *iter);
void btrfs_util_destroy_qgroup_inherit(struct btrfs_util_qgroup_inherit *inherit);

5. CHECK OPERATIONS

// Check if path is subvolume
enum btrfs_util_error btrfs_util_is_subvolume(const char *path);
enum btrfs_util_error btrfs_util_is_subvolume_fd(int fd);

6. SYNC OPERATIONS

// Sync operations
enum btrfs_util_error btrfs_util_sync(const char *path);
enum btrfs_util_error btrfs_util_sync_fd(int fd);
enum btrfs_util_error btrfs_util_start_sync(const char *path, uint64_t *transid);
enum btrfs_util_error btrfs_util_start_sync_fd(int fd, uint64_t *transid);
enum btrfs_util_error btrfs_util_wait_sync(const char *path, uint64_t transid);
enum btrfs_util_error btrfs_util_wait_sync_fd(int fd, uint64_t transid);

7. ITERATOR OPERATIONS

// Iterator operations
int btrfs_util_subvolume_iterator_fd(const struct btrfs_util_subvolume_iterator *iter);
enum btrfs_util_error btrfs_util_subvolume_iterator_next(struct btrfs_util_subvolume_iterator *iter, char **path_ret, uint64_t *id_ret);
enum btrfs_util_error btrfs_util_subvolume_iterator_next_info(struct btrfs_util_subvolume_iterator *iter, char **path_ret, struct btrfs_util_subvolume_info *subvol);

8. ADD OPERATIONS

// Add to qgroup inheritance
enum btrfs_util_error btrfs_util_qgroup_inherit_add_group(struct btrfs_util_qgroup_inherit **inherit, uint64_t qgroupid);

The Missing Piece: Qgroup Creation Operations

Critical observation: Operations related to creating qgroup IDs are not included in the utilities package. This is likely because these operations are considered more critical and the library doesn't take responsibility for user management of these operations.

Solution for developers: To manipulate qgroups, developers must use ioctl() system calls directly. The implementation approach involves referencing the source code of btrfs-progs to understand which system calls are responsible for specific CLI commands.

Understanding BTRFS Qgroup ID Structure

BTRFS qgroups follow a hierarchical structure with level-based organization:

Level / ID                          
                          +---+
                          |2/1|
                          +---+
                         /     \
                   +---+/       \+---+    // High-level (manually creating):
                   |1/1|         |1/2|    // organizational structure
                   +---+         +---+    // unnecessarily binded with a subvol
                  /     \       /     \
            +---+/       \+---+/       \+---+  // Level 0:
qgroups     |0/1|         |0/2|         |0/3|  // qgroups that bind with 
            +-+-+         +---+         +---+  // a subvolume, level 0 qgroup
              |          /     \       /     \
              |         /       \     /       \
              |        /         \   /         \ 
extents       1       2            3            4 // track the disk used (extents)    

Creating a Qgroup Inheritance Tree

To create a qgroup inheritance tree like the one above, you need:

  1. Create qgroup IDs: This operation requires using ioctl() system calls
  2. Establish parent-child dependencies: Create relationships between child qgroup IDs and parent qgroup IDs

System Calls for Qgroup Operations

BTRFS-progs provides CLI commands for these operations. By tracing the source code, we can identify the specific system calls that interact with the BTRFS kernel module.

Key IOCTL Operations

1. Set Subvolume Quota

// Implementation: set_subvolume_quota
// This action requires promoting the current operating thread to root user
// to perform the syscall; so applied a flow of user -> root -> user

ioctl(fd, BTRFS_IOC_QGROUP_LIMIT, &args);

2. Enable Quota

// Implementation: enable_quota
// The root of the BTRFS filesystem will be enabled during the setting up of BtrfsAdaptor
// This means that quota is default enabled for the whole BTRFS filesystem

ioctl(fd, BTRFS_IOC_QUOTA_CTL, &args);

3. Disable Quota

// Implementation: disable_quota
// Currently not used

ioctl(fd, BTRFS_IOC_QUOTA_CTL, &args);

4. Get Subvolume Quota

// Implementation: get_subvolume_quota
// As the exact max referenced storage of a subvolume is managed by kernel's BTRFS module,
// user space app only needs to query the subvolume
// User space app does not need to build-up red-black tree as btrfs-progs,
// as filesystem admin interface is one-shot retrieving single subvolume quota info

ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);

Note: Depending on your specific goal and which CLI command you want to replicate, you can trace and find the ioctl() responsible for that command.

To successfully query qgroup data from the kernel, you need to understand how to use tree search operations.

Basic Tree Search via System Call

ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &search);

Search Criteria Structure

To indicate what to query, modify the search by initializing a btrfs_ioctl_search_args:

/* Search criteria for the BTRFS SEARCH ioctl family. */
struct btrfs_ioctl_search_key {
    /*
     * The tree we're searching in. 1 is the tree of tree roots, 2 is the
     * extent tree, etc...
     *
     * A special tree_id value of 0 will cause a search in the subvolume
     * tree that the inode which is passed to the ioctl is part of.
     */
    __u64 tree_id;      /* in */

    /*
     * When doing a tree search, we're actually taking a slice from a
     * linear search space of 136-bit keys.
     *
     * A full 136-bit tree key is composed as:
     *   (objectid << 72) + (type << 64) + offset
     *
     * The individual min and max values for objectid, type and offset
     * define the min_key and max_key values for the search range. All
     * metadata items with a key in the interval [min_key, max_key] will be
     * returned.
     *
     * Additionally, we can filter the items returned on transaction id of
     * the metadata block they're stored in by specifying a transid range.
     * Be aware that this transaction id only denotes when the metadata
     * page that currently contains the item got written the last time as
     * result of a COW operation. The number does not have any meaning
     * related to the transaction in which an individual item that is being
     * returned was created or changed.
     */
    __u64 min_objectid; /* in */
    __u64 max_objectid; /* in */
    __u64 min_offset;   /* in */
    __u64 max_offset;   /* in */
    __u64 min_transid;  /* in */
    __u64 max_transid;  /* in */
    __u32 min_type;     /* in */
    __u32 max_type;     /* in */

    /*
     * input: The maximum amount of results desired.
     * output: The actual amount of items returned, restricted by any of:
     *  - reaching the upper bound of the search range
     *  - reaching the input nr_items amount of items
     *  - completely filling the supplied memory buffer
     */
    __u32 nr_items;     /* in/out */

    /* align to 64 bits */
    __u32 unused;

    /* some extra for later */
    __u64 unused1;
    __u64 unused2;
    __u64 unused3;
    __u64 unused4;
};

The CLI command btrfs subvolume show actually performs a search for a single item:

struct btrfs_ioctl_search_args search = {
    .key = {
        .tree_id = BTRFS_ROOT_TREE_OBJECTID,
        .min_objectid = id,
        .max_objectid = id,
        .min_type = BTRFS_ROOT_BACKREF_KEY,
        .max_type = BTRFS_ROOT_BACKREF_KEY,
        .min_offset = 0,
        .max_offset = UINT64_MAX,
        .min_transid = 0,
        .max_transid = UINT64_MAX,
        .nr_items = 1,
    },
};

The CLI command btrfs qgroup show -pcre triggers an all-query-keys, all-ranged search with pagination to get all qgroups' data, forming a userland data structure in a red-black tree and providing quick look-up when printing to the user via CLI.

In this case, the searching structure looks like:

struct btrfs_ioctl_search_args args = {
    .key = {
        .tree_id = BTRFS_QUOTA_TREE_OBJECTID,
        .max_type = BTRFS_QGROUP_RELATION_KEY,
        .min_type = BTRFS_QGROUP_STATUS_KEY,
        .max_objectid = (u64)-1,
        .max_offset = (u64)-1,
        .max_transid = (u64)-1,
        .nr_items = 4096,
    },
};

Each query returns a maximum 4KB byte buffer called a "page". When a search returns more items than the 4KB buffer can store, pagination is needed.

Important notes:

  • No official developer documentation exists for this
  • By experiment, a page stores approximately 120-150 items depending on which types are queried
  • A relation like the picture below creates a total of ~1000 items which fall into 4 types of items (status, info, limit, relations)
  • Parent-child relations are queried by type BTRFS_QGROUP_RELATION_KEY
                          +--v--+
                          | 2/1 |
                          +--+--+
                          /  |  \
                 +-------+   |   +--------+
                /            |             \
            +--v--+       +--v--+        +--v-----+
            | 1/1 |       | ... |        | 1/ 256 |
            +--+--+       +--+--+        +--+-----+
               \            |              /
                \           |             /
                 +--        |       +-----
                    \       |      /
                     \      |     /
                      +-----v----+
                      |   0/5    |
                      +----------+
/*
 * the key defines the order in the tree, and so it also defines (optimal)
 * block layout. objectid corresponds to the inode number. The flags
 * tells us things about the object, and is a kind of stream selector.
 * so for a given inode, keys with flags of 1 might refer to the inode
 * data, flags of 2 may point to file data in the btree and flags == 3
 * may point to extents.
 *
 * offset is the starting byte offset for this key in the stream.
 *
 * btrfs_disk_key is in disk byte order. struct btrfs_key is always
 * in cpu native order. Otherwise they are identical and their sizes
 * should be the same (ie both packed)
 */
struct btrfs_disk_key {
    __le64 objectid;
    u8 type;
    __le64 offset;
} __attribute__ ((__packed__));

struct btrfs_key {
    u64 objectid;
    u8 type;
    u64 offset;
} __attribute__ ((__packed__));

Setting Qgroup Limits

The primary purpose of qgroups, beyond inheritance, is to set limits for qgroups.

Setting Limits via Qgroup (btrfs-progs)

CLI Command: btrfs qgroup limit

root@ubuntu24:/btrfs_mount# btrfs qgroup limit --help
usage: btrfs qgroup limit [options] <size>|none [<qgroupid>] <path>

    Set the limits a subvolume quota group.

    -c   limit amount of data after compression. This is the default,
         it is currently not possible to turn off this option.
    -e   limit space exclusively assigned to this qgroup

Complete Command Example

btrfs qgroup limit -c <size> -e <size> <qgroupid> <mountpoint>

Parameter Handling in User Space

The parameters are handled at the user space level by btrfs-progs:

static int cmd_qgroup_limit(const struct cmd_struct *cmd, int argc, char **argv)

Options are structured into flags and sent to the kernel module:

if (compressed)
    args.lim.flags |= BTRFS_QGROUP_LIMIT_RFER_CMPR |
                      BTRFS_QGROUP_LIMIT_EXCL_CMPR;
if (exclusive) {
    args.lim.flags |= BTRFS_QGROUP_LIMIT_MAX_EXCL;
    args.lim.max_exclusive = size;
} else {
    args.lim.flags |= BTRFS_QGROUP_LIMIT_MAX_RFER;
    args.lim.max_referenced = size;
}

fd = btrfs_open_dir(path, &dirstream, 1);
ret = ioctl(fd, BTRFS_IOC_QGROUP_LIMIT, &args);
close_file_or_dir(fd, dirstream);

Conclusion

Understanding BTRFS qgroups and their interaction with user space applications requires knowledge of both the utilities library and direct system calls. While the BTRFS utilities library provides convenient functions for many operations, critical qgroup management requires direct ioctl() calls to the kernel module.

The key takeaways are:

  1. Utilities Library: Provides safe, high-level operations for subvolume management
  2. Direct IOCTL: Required for qgroup creation and advanced quota operations
  3. Tree Search: Essential for querying qgroup relationships and metadata
  4. Pagination: Important for handling large datasets in tree searches
  5. System Calls: The foundation for all BTRFS qgroup operations

This understanding enables developers to build robust applications that can effectively manage BTRFS filesystems and implement sophisticated quota and resource tracking systems.


Note: This article covers the technical aspects of BTRFS qgroup management. For production use, always refer to the latest BTRFS documentation and kernel sources.

Share this page:

LinkedIn Twitter Facebook

Comments