Ceph and OpenStack – Best Practices Part I

Ceph and OpenStack – Best Practices Part I

Ceph and OpenStack are now a mature and widely adopted combination in IaaS. OpenStack User Survey 2017 (June to December)In all OpenStack deployments, 57% of installations use Ceph RBD as the Cinder backend. Using Ceph as the backend for OpenStack Glance and Cinder offers several advantages, and this article will explain how to configure them properly and why these configurations are recommended.

Use show_image_direct_url in Glance

When using Ceph RBD, the default enables RBD Layering, which allows you to think of it as a snapshot-capable volume. This creates a clone of the original image, but Ceph only stores the differing parts as RADOS objects compared to the original image. This illustrates two key points:

  1. Save space — because only the modified RADOS objects from the original image are stored, when multiple volumes are based on the same image, significant space savings are achieved
  2. In unchanged portions, i.e., the original image, the data is always read from the first volume that was created. This means that regardless of which clone is accessed, the same RADOS objects are retrieved — the same OSDs. As these objects are frequently accessed, they are likely to be found in the OSD’s page cache, meaning they are read from RAM. RAM is much faster than any persistent storage, so data from clones is retrieved significantly faster than from a complete original volume.

Cinder and Nova both rely on RBD Layering features, but require Glance to be configured glance-api.conf 設定 show_image_direct_url=true and to use the Glance v2 API.

Update

Due to security concerns, the current Ceph community recommends setting show_image_direct_url to false

Use RBD Cache on the Compute Node

librbd, which is the QEMU/KVM and RBD storage driver that can use the host's RAM as a disk cache for RBD.

Using this cache is safe,virtio-blk and the QEMU RBD storage driver ensures data is properly flushed. When an application in the VM signals, "I need this file on disk," QEMU and Ceph only confirm the data has been written after completing the following steps:

  • Access primary OSD
  • Recover to other OSDs
  • Notified that the persistent journal has already been placed on the OSD

Ceph also has built-in write protection mechanisms; even if the cache is set to write-back mode, Ceph operates in write-through mode until it receives the first flush request from the user. This default behavior is rbd cache writethrough until flush, the default is set to truenever, never block access to it.

Ceph's cache must be configured in the nova.conf file on nova-compute.

[libvirt]
...
images_type = rbd
disk_cachemodes="network=writeback"

Have Cinder, Glance, and Nova use different pools

There are several reasons why three different services should use different Ceph pools:

  • Using different pools allows you to set different limitations for these pools. This means that if something goes very wrong, your nova-compute service could be compromised, and attackers could corrupt or delete your nova disks. It's quite dangerous, but if they can also corrupt your Glance image at the same time, the situation becomes even worse.
  • Using different pools also allows you to set different configurations, such as size or pg_num.
  • Most importantly, you can assign different crush_rulesets to pools. For example, you can configure Cinder to use fast SSDs, Nova to use HDDs, and Glance to use Erasure Coded Pools.

Some may be concerned that after splitting the pool, RBD layering will no longer work, but there's no need to worry—these clones can still be used across pools.

Therefore, it's quite common in OpenStack deployments to see three different Ceph pools: one for Cinder, one for Glance, and one for Nova.

Use All-Flash OSD Pool

Using SSD WAL and DB will not increase read speed. To take full advantage of SSD's fast read performance, set up the SSDs as separate OSDs and use crush_ruleset to define an All-flash OSD Pool. After the Luminous release, Ceph automatically detects device class, so creating an All-flash crush rule is very easy.

For example, to create an All-flash pool (failure domain: host) named flash, use the command ceph osd crush rule create-replicated <rule-name> <root> <failure-domain-type> <device-class>

ceph osd crush rule create-replicated fast default host ssd

Install OpenStack and Ceph

To install OpenStack and Ceph, you can refer to previously published articles for guidance:

Continue Part II

Reference

The Dos and Don'ts for Ceph for OpenStack

Leave a Reply