Ceph and OpenStack – Best Practices Part I

Ceph and OpenStack have now become a standard pairing in IaaS. According to the June-December 2017 OpenStack User Survey, 57% of all OpenStack deployments use Ceph RBD as the Cinder backend. Naturally, there are best practices for tuning configurations when using Ceph as the backend for OpenStack Glance and Cinder. This article will introduce how to make these adjustments and the reasons behind them.

Table of Contents

Using show_image_direct_url in Glance

When using Ceph RBD, the RBD Layering feature is enabled by default. You can think of it as a read-write snapshot. This creates a clone of the original image, and Ceph only creates RADOS objects for the parts that differ from the original image. This implies two things:

Save space — Since only RADOS objects that have changed relative to the original image are created, significant space can be saved when multiple instance volumes are based on the same image.
Unchanged parts, which belong to the original image, are read from the initial volume. This means that regardless of which clone is used, the same RADOS objects—and thus the same OSDs—are accessed during reads. Consequently, there is a high probability that these objects will be read from the OSD's page cache, i.e., from RAM. Since RAM is faster than any modern persistent storage device, serving data from a clone is faster than from a full volume copy.

Both Cinder and Nova utilize RBD Layering by default, but it needs to be enabled in Glance's glance-api.conf 設定 show_image_direct_url=true and requires using the Glance v2 API.

Update

Due to security concerns, the Ceph community currently recommends setting show_image_direct_url to false。

Using RBD Cache on Compute Nodes

librbd, the driver for communication between qemu/kvm and RBD storage devices, can utilize the host's RAM as a disk cache for RBD.

Using this type of cache is safe,virtio-blk and the Qemu RBD storage driver ensures that data is correctly flushed. When an application within the VM signals that "I want this file on disk," Qemu and Ceph will only report that the data has been written after completing the following:

Write to the primary OSD
Replicate to other OSDs
Acknowledged once placed in the OSD's persistent journal

Ceph also has a built-in fail-safe mechanism; even if the cache is set to write-back mode, Ceph will operate in write-through mode until it receives the first flush request from the user. The corresponding setting for this mechanism is rbd cache writethrough until flush, which is set by default to true, and never, ever try to disable it.

To enable Ceph caching, you must configure it in the nova.conf file of nova-compute.

[libvirt]
...
images_type = rbd
disk_cachemodes="network=writeback"

Using Separate Pools for Cinder, Glance, and Nova

There are several reasons for having three different services use separate Ceph pools:

Using different pools allows for distinct permission settings. This means that in the unfortunate event your nova-compute node is compromised, an attacker could damage or delete your Nova disks. While that sounds bad, the situation would be even worse if they could also compromise your Glance images.
Using different pools also allows for specific configurations for each pool, such as size or pg_num settings.
Most importantly, you can apply different crush_rulesets to each pool. For example, you can have Cinder use high-speed SSDs, Nova use HDDs, and Glance use an Erasure Coded Pool.

Some might worry that RBD layering will stop working after separating the pools, but there's no need to worry—clones can be used across different pools.

Therefore, it is very common to see three different Ceph pools in an OpenStack deployment: one for Cinder, one for Glance, and one for Nova.

Using All-Flash OSD Pools

Using SSDs for WAL and DB will not increase read speeds. To take advantage of the fast read speeds of SSDs, you should set them up as independent OSDs and use crush_ruleset to configure an All-flash OSD pool. Since the Luminous release, Ceph automatically detects device classes, making it very easy to create All-flash crush rules.

For example, to create an All-flash pool (failure domain: host) named 'flash', use the command: ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <device-class>

ceph osd crush rule create-replicated fast default host ssd

Installing OpenStack and Ceph

For tutorials on installing OpenStack and Ceph, you can refer to my previous articles:

Continue Reading Part II

Reference

The Dos and Don'ts for Ceph in OpenStack