How to Monitor Container PSI Information

如何監控 container 的 PSI 資訊

Introduction

Previous articleIn the previous article, we explored PSI (Pressure Stall Information) and how to monitor system-wide PSI data. This article will delve into how to monitor PSI information for a single container.

PSI and cgroupv2

In systems with the cgroup2 file system mounted, pressure stall information for individual cgroups can be tracked. Each cgroup controller's subdirectory within the cgroupfs mount point contains cpu.pressure, memory.pressure, and io.pressure files.

You can query the PSI of a specific cgroup by running the following command. This example queries the cpu.pressure of a cgroup named cg1:

cat /sys/fs/cgroup/cg1/cpu.pressure

PSI and runc

runc will support retrieving pressure stall information for container cgroups in the upcoming version 1.2.0. The following command can be used to obtain this information:

runc  --root <container_root>  events --stats <container_id>

Where container_root is the directory location where container information is stored. For example, in Docker it might be /var/run/docker/runtime-runc/moby/, while in containerd it might be /var/run/containerd/runc , etc.

After execution, the output will be presented in the following JSON format:

{
  "type": "stats",
  "id": "9eef3a09b21e11a6c54823ecdbe7b71a204d439acfeb7392a97e60a4baf64a74",
  "data": {
    "cpu": {
      "usage": {
        ...
      },
      "throttling": {},
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 201
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 201
        }
      }
    },
    "cpuset": {
        ...
    },
    "memory": {
      "usage": {
        ...
      },
      "swap": {
        ...
      },
      "kernel": {
        ...
      },
      "kernelTCP": {
        ...
      },
      "raw": {
         ...
      },
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        }
      }
    },
    "pids": {
        ...
    },
    "blkio": {
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        }
      }
    },
    "hugetlb": {},
    "intel_rdt": {},
    "network_interfaces": null
  }
}

The psi fields within cpu, memory, and blkio represent their corresponding PSI data.

Prometheus Support

Currently, cAdvisor is waiting for the official release of runc 1.2.0 to provide support; for more details, please refer tothis PR. Once support is implemented, PSI can be read via cAdvisor in conjunction with Prometheus.

Additionally, there are other tools currently available on the market, such as the one developed by Cloudflare, psi_exporter Created with Mosquito cgroups-exporter

Summary

Monitoring container PSI information is vital for understanding and optimizing resource management in containerized environments. With support from tools like runc and cAdvisor, we can now capture this data more accurately, enabling more effective management and adjustment of resource allocation to ensure high system efficiency.

Reference

Leave a Reply