Deploying Prometheus monitoring system on Arista switches

Overview

This article will explain how to Arista run Docker containers on switches node_exportersnmp_exporterto actually demonstrate the functionality of monitoring switch status via Prometheus.

System Architecture

This monitoring solution includes the following main components:

  • node_exporter: Collect metrics from main server layers
  • snmp_exporter: Collect network device metrics via SNMP protocol
  • PrometheusPrometheus: metric collection and monitoring for time series databases

Container Management Configuration

Basic Container Setup

container-manager
   container-profile default
      networking mode host

Node Exporter Configuration

container node-exporter
   image prom/node-exporter
   no shutdown
   profile default
   command --collector.disable-defaults --collector.cpu --collector.hwmon --collector.meminfo --collector.vmstat --collector.stat
   persist storage
      mount src file:/ dst /host

This configuration:

  • Using official node_exporter images
  • Utilizing specific collectors for monitoring CPU, disk, memory, and other metrics
  • Exporting host system metrics via a collection pipeline

SNMP Exporter Configuration

container snmp-exporter
   image prom/snmp-exporter:latest
   no shutdown
   profile default

SNMP exporter is configured as a centralized monitoring proxy and can simultaneously monitor thousands of devices.

Network Retrieval Control Configuration

On Arista's standard control plane ACL foundation, we need to explicitly open the ports required by node_exporter and snmp_exporter. The complete ACL configuration is as follows:

ip access-list default-with-exporter
   counters per-entry
   10 permit icmp any any
   20 permit ip any any tracked
   30 permit udp any any eq bfd ttl eq 255
   40 permit udp any any eq bfd-echo ttl eq 254
   50 permit udp any any eq multihop-bfd micro-bfd sbfd
   60 permit udp any eq sbfd any eq sbfd-initiator
   70 permit ospf any any
   80 permit tcp any any eq ssh telnet www snmp bgp https msdp ldp netconf-ssh gnmi
   90 permit udp any any eq bootps bootpc ntp snmp ptp-event ptp-general rip ldp
   100 permit tcp any any eq mlag ttl eq 255
   110 permit udp any any eq mlag ttl eq 255
   120 permit vrrp any any
   130 permit ahp any any
   140 permit pim any any
   150 permit igmp any any
   160 permit tcp any any range 5900 5910
   170 permit tcp any any range 50000 50100
   180 permit udp any any range 51000 51100
   190 permit tcp any any eq 3333
   200 permit tcp any any eq nat ttl eq 255
   210 permit tcp any eq bgp any
   220 permit rsvp any any
   230 permit tcp any any eq 9340
   240 permit tcp any any eq 9559
   250 permit udp any any eq 8503
   260 permit udp any any eq lsp-ping
   270 permit udp any eq lsp-ping any
   280 permit tcp any any eq 9116
   290 permit tcp any any eq 9100

This ACL configuration includes:

  • Arista switch policy-based control plane data collection
  • Required ports for external Prometheus exporters:
    • 9100: default port for node_exporter
    • 9116: default port for snmp_exporter

Apply the ACL to the control plane:

system control-plane
   ip access-group default-with-exporter in

SNMP Configuration

snmp-server community public ro

Define a read-only SNMP community string to allow data collection.

Deployment Verification

After deployment, we can use the command to verify that the exporters are running normally: curl Expected export output similar to the following:

Test node_exporter

# 測試 node_exporter 是否正常運行
curl -s localhost:9100/metrics | head -n 5

If you can see export output similar to the above, it indicates that the exporters are running normally and successfully receiving Prometheus scraping requests.

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 8953.12
node_cpu_seconds_total{cpu="0",mode="system"} 245.45
node_cpu_seconds_total{cpu="0",mode="user"} 189.76

Test snmp_exporter

# 測試 snmp_exporter 是否正常運行
curl -s localhost:9116/metrics | head -n 5

If you can see export output similar to the above, it indicates that the exporters are running normally and successfully receiving Prometheus scraping requests.

# HELP snmp_exporter_build_info A metric with a constant '1' value labeled by version
# TYPE snmp_exporter_build_info gauge
snmp_exporter_build_info{version="0.20.0"} 1

Through these configurations, we can:

Summary

This solution is particularly suitable for large-scale network infrastructure deployments, enabling centralized management and monitoring of multiple network devices. After deployment, only need to configure the corresponding target on the Prometheus server to begin collecting monitoring metrics.

  1. Running monitoring configurations natively on Arista switches
  2. Collecting complete system and network performance metrics
  3. Integrating with existing Prometheus monitoring systems

This solution is especially suitable for large-scale network infrastructure monitoring, enabling centralized management and monitoring of multiple network devices. After deployment, only the corresponding target needs to be configured on the Prometheus server to start collecting monitoring metrics.

Reference

Leave a Reply