Connecting Ceph clients. File access with CephFS

With this article, I close the series on the basics of Ceph deployment. Previously, we looked at how to deploy Ceph, and how block and object access is provided.

This article will briefly describe the procedure for providing file access in Ceph using CephFS. This topic is very extensive and a lot may be missed, so please refer to the official documentation for more information.

Note: This article is intended to demonstrate basic functionality only and may contain inaccuracies, including wording. Fault tolerance and other things are not discussed here.

My lab environment is built on Ceph Reef (18). The cluster consists of 6 nodes, 3 of which are dedicated to system services, like Ceph Monitor and Ceph Manager, and the remaining 3 nodes are used for Ceph OSD. All systems, including the client, are based on Rocky Linux 9.

Let’s start setting up from the Ceph side. I perform all procedures from the control node. In this case, it is ceph-mon-01.

To begin providing file access functionality, you must perform the following steps:

  1. Create two Ceph Pools. One of which is used for CephFS metadata, and the second for data storage;
  2. Start Ceph MDS services.

Let’s start and create the required pools:

[root@ceph-mon-01 ~]# ceph osd pool create cephfs_metadata
pool 'cephfs_metadata' created

[root@ceph-mon-01 ~]# ceph osd pool create cephfs_data
pool 'cephfs_data' created

As usual, by default pools are created with two additional copies of each object:

[root@ceph-mon-01 ~]# ceph osd pool get cephfs_metadata size
size: 3
[root@ceph-mon-01 ~]# ceph osd pool get cephfs_metadata min_size
min_size: 2

[root@ceph-mon-01 ~]# ceph osd pool get cephfs_data size
size: 3
[root@ceph-mon-01 ~]# ceph osd pool get cephfs_data min_size
min_size: 2

Now let’s try to launch CephFS, using two previously created pools:

[root@ceph-mon-01 ~]# ceph fs new vmik_share cephfs_metadata cephfs_data
  Pool 'cephfs_data' (id '10') has pg autoscale mode 'on' but is not marked as bulk.
  Consider setting the flag by running
    # ceph osd pool set cephfs_data bulk true
  new fs with metadata pool 9 and data pool 10

When creating a file system, you first specify the pool defined for metadata, and then the pool in which the data itself will be stored.

Ceph also recommends marking the newly created data pool as bulk:

[root@ceph-mon-01 ~]# ceph osd pool set cephfs_data bulk true
set pool 10 bulk to true

Let’s look at the list of created file systems:

[root@ceph-mon-01 ~]# ceph fs ls
name: vmik_share, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

From the output, we can see the pool is used for metadata, as well as the pool in which the data is stored.

Pay attention to the current status of the cluster:

[root@ceph-mon-01 ~]# ceph -s
    health: HEALTH_ERR
            1 filesystem is offline
            1 filesystem is online with fewer MDS than max_mds
services:
     mds: 0/0 daemons up

As I mentioned at the beginning, for CephFS we need Ceph MDS services running.

Let’s launch three MDS services, one on each of the service nodes (ceph-mon):

[root@ceph-mon-01 ~]# ceph orch apply mds cephfs --placement="3 ceph-mon-01 ceph-mon-02 ceph-mon-03"
Scheduled mds.cephfs update...

If you list the services in the cluster, you will notice new ones – mds:

[root@ceph-mon-01 ~]# ceph orch ls
NAME                     PORTS                 RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager             ?:9093,9094               1/1  0s ago     2w   count:1
ceph-exporter                                      8/8  7m ago     2w   *
crash                                              8/8  7m ago     2w   *
grafana                  ?:3000                    1/1  0s ago     2w   count:1
ingress.rgw.s3.vmik.lab  10.10.10.22:443,1967      4/4  6m ago     2d   ceph-rgw-01;ceph-rgw-02
mds.cephfs                                         3/3  1s ago     11s  ceph-mon-01;ceph-mon-02;ceph-mon-03;count:3
mgr                                                3/3  1s ago     2w   ceph-mon-01;ceph-mon-02;ceph-mon-03;count:3
mon                                                3/3  1s ago     2w   ceph-mon-01;ceph-mon-02;ceph-mon-03;count:3
node-exporter            ?:9100                    8/8  7m ago     2w   *
osd                                                  9  7m ago     -    <unmanaged>
prometheus               ?:9095                    1/1  0s ago     2w   count:1
rgw.s3.vmik.lab          ?:8080                    2/2  6m ago     2d   count-per-host:1;label:rgw

Now the cluster status does not show any errors:

cluster:
    health: HEALTH_OK
services:
    mds: 1/1 daemons up, 2 standby

The whole process above was needed to understand what is needed to get started with CephFS, let’s look at a shorter version of the initial deployment.

Delete the previously created file system:

[root@ceph-mon-01 ~]# ceph fs volume rm vmik_share --yes-i-really-mean-it
metadata pool: cephfs_metadata data pool: ['cephfs_data'] removed

Please note that the pools that were allocated for CephFS are also deleted.

All the above actions can be performed with one command:

[root@ceph-mon-01 ~]# ceph fs volume create vmik_share

What does this command do? – It creates pools, a root file system – Volume, and automatically starts Ceph MDS services. Cool.

Pools are created by the name of the file system:

[root@ceph-mon-01 ~]# ceph fs volume create vmik_share
11 cephfs.vmik_share.meta
12 cephfs.vmik_share.data

Important: if you follow the article from top to bottom, the old MDS services were not removed. The command above automatically started two more services on the available nodes. This will require manual intervention – stop old MDS services, and transfer new ones to other nodes – ceph-mon’s.

A little terminology, although, to be honest, I don’t understand it a little myself.

Volume –abstraction level over Ceph File System. This may be a wrong interpretation, but I would describe this level as a file server, where some shared directories are located. Or as the root directory. It is also technically possible to mount it on the client side.

Subvolume – abstraction level above Volume. Several Subvolumes can be created within one Volume. Moreover, we can separate access permissions and set quotas at the level of each Subvolume separately. This may be a misinterpretation, but I would describe this element as a file share.

Subvolume groups –a Subvolume group that allows you to apply various types of policies.

Inside Volume, we can create Subvolumes and Subvolume groups. If Volume is a tree, then Subvolumes are the branches of the tree.

First, let’s create a Subvolume group called data:

[root@ceph-mon-01 ~]# ceph fs subvolumegroup create vmik_share data

Although you don’t have to create it, in this case, all Subvolumes will be placed into the default group _nogroup. And it won’t be very pretty.

Now let’s create two Subvolumes, each 1Gb in size:

[root@ceph-mon-01 ~]# ceph fs subvolume create vmik_share data1 --group_name data --size 1073741824

[root@ceph-mon-01 ~]# ceph fs subvolume create vmik_share data2 --group_name data --size 1073741824

Here we specify the name of the Volume in which we want to create a Subvolume, the name of the group, as well as the size in bytes. And the Subvolume name, accordingly. In this case, the names are data1 and data2.

We can easily view the list of Subvolumes related to a specific group of a specific FS:

[root@ceph-mon-01 ~]# ceph fs subvolume ls vmik_share --group_name data
[
    {
        "name": "data2"
    },
    {
        "name": "data1"
    }
]

Now we will grant access permissions to our two Subvolumes. For data1 to user vmik1, for data2 to user vmik2:

[root@ceph-mon-01 ~]# ceph fs subvolume authorize vmik_share --group_name data data1 vmik1 --access-level=rw

[root@ceph-mon-01 ~]# ceph fs subvolume authorize vmik_share --group_name data data2 vmik2 --access-level=rw

Please note that you do not need to create Ceph accounts first.

The command automatically creates accounts and issues the required level of permissions for access:

[root@ceph-mon-01 ~]# ceph auth ls
client.vmik1
        key: AQDp0d5lO7xfLhAAWZ8bSFW1YB5heNgoVTaQAw==
        caps: [mds] allow rw path=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7
        caps: [mon] allow r
        caps: [osd] allow rw pool=cephfs.vmik_share.data
client.vmik2
        key: AQD20d5lDBwNCRAAj8HHVF/I8zAzdnkjWRYUlA==
        caps: [mds] allow rw path=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37
        caps: [mon] allow r
        caps: [osd] allow rw pool=cephfs.vmik_share.data

In the output above, we are interested in the account name, the key field, and also the path field.

All of this will be useful during the mounting of the file system on the client side.

By the way, you can get the path to the share with the command below:

[root@ceph-mon-01 ~]# ceph fs subvolume getpath vmik_share data1 --group_name data
/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7

[root@ceph-mon-01 ~]# ceph fs subvolume getpath vmik_share data2 --group_name data
/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37

So, at the moment we have completed the following steps:

  1. Launched CephFS;
  2. Created Volume – Subvolume Group – Subvolume;
  3. Issued the corresponding RW permissions to Subvolume;
  4. Decided on the data that needs to be collected and transferred to the client.

Let’s move on to the client side. In this case, we will use the Kernel Driver, available in all modern kernels, to connect. Ceph-Fuse will not be considered.

First, on the client side, you need to connect the Ceph repositories:

[root@ceph-client-fs ~]# vi /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph $basearch
baseurl=https://download.ceph.com/rpm-reef/el9/$basearch
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg

[Ceph-noarch]
name=Ceph noarch
baseurl=https://download.ceph.com/rpm-reef/el9/noarch
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg

[Ceph-source]
name=Ceph SRPMS
baseurl=https://download.ceph.com/rpm-reef/el9/SRPMS
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg

Connect Epel repository:

[root@ceph-client-fs ~]# dnf install epel-release

And install the ceph-common package:

[root@ceph-client-fs ~]# dnf install ceph-common

Let’s create directories for mounting:

[root@ceph-client-fs ~]# mkdir -p /vmik_data1 /vmik_data2

And mount Ceph Subvolumes with the usual mount command:

[root@ceph-client-fs /]# mount -t ceph vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7 /vmik_data1/ -o mon_addr=ceph-mon-01:6789/ceph-mon-02:6789/ceph-mon-03:6789 -o secret=AQDp0d5lO7xfLhAAWZ8bSFW1YB5heNgoVTaQAw==

[root@ceph-client-fs /]# mount -t ceph vmik2@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37 /vmik_data2/ -o mon_addr=ceph-mon-01:6789/ceph-mon-02:6789/ceph-mon-03:6789 -o secret=AQD20d5lDBwNCRAAj8HHVF/I8zAzdnkjWRYUlA==

Looks very scary. I’ll try to explain:

  1. mount –t ceph – indicates that we are mounting the file system Ceph;
  2. Next, we indicate the account for which permissions were issued – vmik1 or vmik2;
  3. Using @ we indicate the FSID of the cluster. You can get it from the output of the ceph –s command from the control node;
  4. After the cluster FSID there is a mandatory dot (.) character. This character is followed by the name of the file system or Volume;
  5. Next comes =, followed by the full path of Subvolume. How to get it was described above;
  6. Next, we indicate the directory in which the file system will be mounted;
  7. After the –o we specify a list of the Ceph Monitor addresses, separated by /, as well as the secret for the user.

With this long command, you can mount CephFS:

[root@ceph-client-fs /]# df -h
Filesystem                                                                                                      Size  Used Avail Use% Mounted on
vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7  1.0G     0  1.0G   0% /vmik_data1
vmik2@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37  1.0G     0  1.0G   0% /vmik_data2

If you try to mount a partition under a user who does not have permissions, you may receive the following error:

[root@ceph-client-fs /]# mount -t ceph vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37 /vmik_data2/ -o mon_addr=ceph-mon-01:6789/ceph-mon-02:6789/ceph-mon-03:6789 -o secret=AQD20d5lDBwNCRAAj8HHVF/I8zAzdnkjWRYUlA==
mount error: no mds server is up or the cluster is laggy

If you receive a similar error when working with CephFS, check your permissions.

Let’s simplify the task and reduce the size of the command. To shorten this, the following is required on the client machine:

  1. Place a minimal Ceph configuration file on the client machine, which will contain the Ceph Monitor addresses, as well as the cluster FSID;
  2. Place on the client machine the keyring file of the user.

Typically, the above data must be provided by the administrator. But in this example, we will request the data ourselves.

Request the cluster configuration file via SSH:

[root@ceph-client-fs /]# mkdir -p /etc/ceph
[root@ceph-client-fs /]# ssh root@ceph-mon-01 "ceph config generate-minimal-conf" | tee /etc/ceph/ceph.conf
# minimal ceph.conf for 24c20e62-c4da-11ee-ba95-005056aad62a
[global]
        fsid = 24c20e62-c4da-11ee-ba95-005056aad62a
        mon_host = [v2:10.10.10.13:3300/0,v1:10.10.10.13:6789/0] [v2:10.10.10.14:3300/0,v1:10.10.10.14:6789/0] [v2:10.10.10.15:3300/0,v1:10.10.10.15:6789/0]

And, also keyring files for users vmik1 and vmik2:

[root@ceph-client-fs /]# ssh root@ceph-mon-01 "ceph auth get-or-create client.vmik1" | tee /etc/ceph/ceph.client.vmik1.keyring
[client.vmik1]
        key = AQDp0d5lO7xfLhAAWZ8bSFW1YB5heNgoVTaQAw==

[root@ceph-client-fs /]# ssh root@ceph-mon-01 "ceph auth get-or-create client.vmik2" | tee /etc/ceph/ceph.client.vmik2.keyring
[client.vmik2]
        key = AQD20d5lDBwNCRAAj8HHVF/I8zAzdnkjWRYUlA==

Pay attention to the name of the keyring file. It is recommended to save the file name format.

Now, the /etc/ceph directory should contain ceph.conf files and two keyring files:

[root@ceph-client-fs /]# ls /etc/ceph/
ceph.client.vmik1.keyring  ceph.client.vmik2.keyring  ceph.conf  rbdmap

Let’s mount file shares:

[root@ceph-client-fs /]# mount -t ceph vmik1@.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7 /vmik_data1/

[root@ceph-client-fs /]# mount -t ceph vmik2@.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37 /vmik_data2/
[root@ceph-client-fs /]# df -h
Filesystem                                                                                                      Size  Used Avail Use% Mounted on
vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7  1.0G     0  1.0G   0% /vmik_data1
vmik2@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37  1.0G     0  1.0G   0% /vmik_data2

As you can see, the command format has been quite shortened. Pay attention to this – vmik1@.

Previously, after the dot, we indicated the FSID of the cluster. Now when we have ceph.conf there is no need to do this, however, the dot should still be saved in the command.

A curious reader may ask? Is it possible to automatically connect after rebooting the OS? – Yes, you can. The fstab file format is as follows:

### CephFS
vmik1@.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7 /vmik_data1/ ceph    _netdev 0 2
vmik2@.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37 /vmik_data2/ ceph    _netdev 0 2

This works, of course, when the system has ceph.conf and keyring files.

[root@ceph-client-fs /# reboot

[root@ceph-client-fs /]# df -h
Filesystem                                                                                                      Size  Used Avail Use% Mounted on
vmik2@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37  1.0G     0  1.0G   0% /vmik_data2
vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7  1.0G     0  1.0G   0% /vmik_data1

That’s all about connecting the client.

A little about management.

You can view the list of access permissions as follows:

[root@ceph-mon-01 ~]# ceph fs subvolume authorized_list vmik_share data2 --group_name data
[
    {
        "vmik2": "rw"
    }
]

How to revoke the permissions?

[root@ceph-mon-01 ~]# ceph fs subvolume deauthorize vmik_share data2 --group_name data vmik2

After executing this command, the user will still be able to write and read. But he will no longer be able to mount the resource.

There is also an interesting command – evict. Restricts all current user actions:

[root@ceph-mon-01 ~]# ceph fs subvolume evict vmik_share data2 --group_name data vmik2

After executing this command, the user will not be able to perform further actions on the mounted resource:

[root@ceph-client-fs /]# ls -la
ls: cannot access 'vmik_data2': Permission denied

However, he can unmount it, mount it back, and work as before.

We can change the Subvolume size online:

[root@ceph-mon-01 ~]# ceph fs subvolume resize vmik_share data1 2073741824 --group_name data
[
    {
        "bytes_used": 0
    },
    {
        "bytes_quota": 2073741824
    },
    {
        "bytes_pcent": "0.00"
    }
]

The user will instantly see the changes:

vmik1@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data1/c4a84311-514d-4517-9e70-e3e9a1078df7  2.0G     0  2.0G   0% /vmik_data1

You can also set a size smaller than the current, but why do that?

[root@ceph-mon-01 ~]# ceph fs subvolume resize vmik_share data2 73741824 --group_name data
[
    {
        "bytes_used": 104857600
    },
    {
        "bytes_quota": 73741824
    },
    {
        "bytes_pcent": "142.20"
    }
]
vmik2@24c20e62-c4da-11ee-ba95-005056aad62a.vmik_share=/volumes/data/data2/988e03e8-4db8-4586-b47f-9547020e9d37   68M   68M     0 100% /vmik_data2

To avoid a situation where we specify a size smaller than the current, we can specify the –no_shrink option, which will prevent resizing:

[root@ceph-mon-01 ~]# ceph fs subvolume resize vmik_share data2 73741824 --group_name data --no_shrink
Error EINVAL: Can't resize the subvolume. The new size '73741824' would be lesser than the current used size '104857600'

And finally, let’s delete Subvolume:

[root@ceph-mon-01 ~]# ceph fs subvolume rm vmik_share data2 --group_name data

In conclusion:

Until recently, I did not work with CephFS due to lack of need. But they asked for an article on this topic both in the comments and by mail, so I couldn’t help but cover this functionality of Ceph.

CephFS is very rich in functionality and what is described in the article is not even the tip of the iceberg. As was said at the beginning, read more in the documentation. I’ll probably read it too.

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *