Skip to content

Monitoring ESS Hardware metrics

hwassman edited this page Aug 23, 2025 · 1 revision

In version 5.2.2, IBM Storage Scale introduced a new sensor, 'GPFSHardware', which collects metrics about fan rotation speed, power supply unit (PSU) power consumption, and the temperature of various enclosure components.

Background:

The IBM Storage Scale System 6000 (ISS 6000) is an implementation of the IBM Storage Scale software in hardware, optimised for the most demanding AI, HPC, analytics and hybrid cloud workloads.

It includes node-to-node communication through an internal Ethernet private network and nontransparent bridge (NTB) for peer node diagnostic and control. Remote console through serial over LAN (SOL) using a Baseboard Management Controller (BMC) and Intelligent Platform Management Interface (IPMI) is available for monitoring and controlling the ISS 6000 enclosure and to assist with deployment and installation.

IPMI( Intelligent Platform Management Interface) is a standardised, message-based hardware management interface. At the core of IPMI is a hardware chip known as the Baseboard Management Controller (BMC) or Management Controller (MC). The BMC provides the various interfaces needed to monitor the health of the hardware components, such as temperature, voltage and fan speed.

Each IBM Storage Scale System cluster requires a minimum of one Enterprise Management Server (EMS). The EMS, also referred to as the Utility Node, is the central management component in an IBM Storage Scale System. The EMS is implemented as a RHEL KVM VM and serves as the control hub for the Storage Scale System 6000. Finally, it provides capabilities to acess the hardware health and performance data captured from the hardware abstraction layer (HAL) and stored using the IBM Performance Monitoring tool.

You can use the Grafana software and the ESS hardware sample dashboard bundle to explore hardware metrics for key components directly in your web browser.

User Guide

Installation

Configuration

Maintenance

Troubleshooting

Use cases

Designing dashboards

Developer Guide

Clone this wiki locally