如何監控 container 的 PSI 資訊

Table of Contents

前言

在上一篇文章中，我們探討了 PSI (Pressure Stall Information) 以及如何監控系統的 PSI 資訊。本文將深入探討如何監控單一 container 的 PSI 資訊。

PSI 與 cgroupv2

在掛載了 cgroup2 file system 的系統中，可以追蹤各個 cgroups 的壓力延遲資訊。cgroupfs mount point 中每個 cgroup controller 的子目錄包含 cpu.pressure、memory.pressure 和 io.pressure 檔案。

您可以透過執行以下命令來查詢某個 cgroup 的 PSI。此範例查詢名為 cg1 的 cgroup 的 cpu.pressure：

cat /sys/fs/cgroup/cg1/cpu.pressure

PSI 與 runc

runc 在即將推出的 1.2.0 版本中將支援獲取 container cgroup 的壓力延遲資訊。以下指令可用於獲取該資訊：

runc  --root <container_root>  events --stats <container_id>

其中，container_root 是存放 container 資訊的目錄位置。例如，在 docker 中可能為 /var/run/docker/runtime-runc/moby/，而在 contianerd 中可能為 /var/run/containerd/runc 等。

執行後，輸出將呈現為以下的 json 格式：

{
  "type": "stats",
  "id": "9eef3a09b21e11a6c54823ecdbe7b71a204d439acfeb7392a97e60a4baf64a74",
  "data": {
    "cpu": {
      "usage": {
        ...
      },
      "throttling": {},
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 201
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 201
        }
      }
    },
    "cpuset": {
        ...
    },
    "memory": {
      "usage": {
        ...
      },
      "swap": {
        ...
      },
      "kernel": {
        ...
      },
      "kernelTCP": {
        ...
      },
      "raw": {
         ...
      },
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        }
      }
    },
    "pids": {
        ...
    },
    "blkio": {
      "psi": {
        "some": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        },
        "full": {
          "avg10": 0,
          "avg60": 0,
          "avg300": 0,
          "total": 0
        }
      }
    },
    "hugetlb": {},
    "intel_rdt": {},
    "network_interfaces": null
  }
}

其中 cpu, memory 和 blkio 中的 psi 就是其相對應的 PSI。

Prometheus 支援

目前 cAdvisor 正在等待 runc 1.2.0 正式版的發布，以提供支援，相關細節可參考這個 PR。一旦支援完成，便可透過 cAdvisor 結合 Prometheus 來讀取 PSI。

此外，目前市面上還有其他工具可用，如 Cloudflare 開發的 psi_exporter 和 Mosquito 製作的 cgroups-exporter。

小結

監控 container 的 PSI 資訊對於理解和最佳化 containerized 環境中的資源管理至關重要。隨著工具如 runc 和 cAdvisor 的相繼支援，我們現在能夠更精確地獲取這些資訊，從而更有效地管理和調整資源分配，確保系統的高效運行。

如何監控 container 的 PSI 資訊

前言

PSI 與 cgroupv2

PSI 與 runc

Prometheus 支援

小結

Reference

Leave a Reply Cancel reply