“HPE Revolutionizes Data Center Efficiency with Cutting-Edge AI Ops Research and Development”
Hewlett Packard Enterprise (HPE) Partners with NREL to Advance AI in Data Centers
Today, Hewlett Packard Enterprise (HPE) announced an exciting collaboration with the U.S. Department of Energy’s National Renewable Energy Laboratory (NREL) focused on AI Ops research and development. This partnership aims to innovate Artificial Intelligence (AI) and Machine Learning (ML) technologies designed to automate processes and enhance operational efficiency—including resiliency and energy consumption—in data centers tailored for the exascale era. This initiative aligns with NREL’s mission as a global frontrunner in promoting energy efficiency and renewable energy technologies, seeking to devise and implement cutting-edge strategies that significantly reduce energy usage and operational costs.
The project is set over a three-year collaboration that will incorporate monitoring and predictive analytics into the power and cooling systems at NREL’s Energy Systems Integration Facility (ESIF) HPC Data Center.
In this endeavor, HPE and NREL will utilize over five years’ worth of historical data—amounting to more than 16 terabytes—gathered from sensors embedded in NREL’s supercomputers, Peregrine and Eagle. This data will be harnessed to train models aimed at detecting anomalies, enabling proactive issue detection and prevention before they arise.
This collaboration is poised to address the anticipated increases in water and energy consumption associated with data centers, projecting that in the U.S. alone, consumption will reach around 73 billion kWh and 174 billion gallons of water by 2020. HPE and NREL will concentrate on monitoring energy usage to enhance efficiency and sustainability, measured by key performance indicators such as Power Usage Effectiveness (PUE), Water Usage Effectiveness (WUE), and Carbon Usage Effectiveness (CUE).
Preliminary results from models trained with historical data have already successfully forecasted or recognized past events within NREL’s data center, affirming the potential of predictive analytics for future data center operations.
The AI Ops project evolved from HPE’s ongoing R&D initiatives linked to PathForward, a Department of Energy-supported program aimed at accelerating the United States’ technological roadmap for exascale computing—the next major evolution in supercomputing. Acknowledging the critical demand for AI and automation capabilities, HPE is dedicated to enhancing the management and optimization of data center environments for exascale operations. Integrating AI-driven operational processes in exascale supercomputers—which promise a performance increase by a thousandfold compared to today’s systems—will not only facilitate energy-efficient operations but also bolster resiliency and reliability through intelligent automation.
“We are passionate about crafting new technologies that significantly impact the future of innovation with exascale computing and its vast operational demands,” stated Mike Vildibill, Vice President of the Advanced Technologies Group at HPE. “Our collaboration with NREL, a long-standing and innovative partner, signifies a pivotal journey towards developing and testing AI Ops, enabling the industry to construct and sustain more intelligent and efficient supercomputing data centers that can scale in power and performance.”
Kristin Munch, Manager for Data, Analysis, and Visualization Group at NREL, added, “Our research collaboration will encompass data management, analytics, and AI/ML optimization strategies for both manual and autonomous interventions in data center operations. We look forward to collaborating with HPE in this expansive, multi-staged project, with the ultimate goal of establishing capabilities for a state-of-the-art smart facility, based on the successful demonstration of these sophisticated techniques in our current data center.”
The project will utilize open-source software and libraries, including TensorFlow, NumPy, and Scikit-learn, to create machine learning algorithms. The focus areas of the project include:
- Monitoring: Gathering, processing, and analyzing vast amounts of IT and facility telemetry from various sources before implementing algorithms on data in real-time.
- Analytics: Employing big data analytics and machine learning to scrutinize data from diverse tools and devices across the data center facility.
- Control: Deploying algorithms to empower machines to autonomously resolve issues and intelligently automate repetitive tasks, alongside executing predictive maintenance on both IT and facility operations.
- Data Center Operations: AI Ops will evolve into a validation tool enabling continuous integration (CI) and continuous deployment (CD) for essential IT functions throughout the modern data center environment.
Looking ahead, HPE plans to unveil additional capabilities with enhancements to the HPE High Performance Cluster Management (HPCM) system, facilitating full provisioning, management, and monitoring for cluster configurations scaling up to 100,000 nodes at an accelerated rate. Further exploration will include integrating HPE InfoSight, a cloud-based AI-driven management tool that actively monitors, gathers, and analyzes IT infrastructure data to predict and preempt serving performance issues, maintaining overall system health.
This groundbreaking solution will be showcased at HPE booth 1325 during Supercomputing 2019 (SC 19) in Denver, Colorado.