At ISC High Performance 2024, Intel, in collaboration with Argonne National Laboratory and Hewlett Packard Enterprise (HPE), announced that the Auror
At ISC High Performance 2024, Intel, in collaboration with Argonne National Laboratory and Hewlett Packard Enterprise (HPE), announced that the Aurora supercomputer has surpassed the exascale barrier with a performance of 1.012 exaflops, making it the fastest AI system globally dedicated to AI for open science, achieving 10.6 AI exaflops. Intel also highlighted the importance of open ecosystems in driving AI-accelerated high-performance computing (HPC).
Why It Matters
Aurora, designed as an AI-centric system from the start, will enable researchers to leverage generative AI models for faster scientific discovery. Argonne’s early AI-driven research has made significant strides, including mapping the human brain’s 80 billion neurons, enhancing high-energy particle physics with deep learning, and accelerating drug design and discovery through machine learning.
Aurora Supercomputer’s Details
Aurora is a vast system with 166 racks, 10,624 compute blades, 21,248 Intel® Xeon® CPU Max Series processors, and 63,744 Intel® Data Center GPU Max Series units, making it one of the world’s largest GPU clusters. It features the largest open, Ethernet-based supercomputing interconnect with 84,992 HPE slingshot fabric endpoints. Aurora achieved 1.012 exaflops using 9,234 nodes, which is only 87% of its system capacity. It also placed third on the high-performance conjugate gradient (HPCG) benchmark with 5,612 teraflops per second (TF/s), using 39% of the machine. This benchmark assesses more realistic scenarios, providing insights into communication and memory access patterns essential for real-world HPC applications, complementing benchmarks like LINPACK.
How AI is Optimised
The Intel Data Center GPU Max Series is central to Aurora’s performance. Built on the Intel Xe GPU architecture, it features specialised hardware for matrix and vector computations, optimising both AI and HPC tasks. This architecture’s design has propelled Aurora to the top spot in the high-performance LINPACK-mixed precision (HPL-MxP) benchmark, emphasising the importance of AI workloads in HPC.
The Xe architecture excels in parallel processing, crucial for the intricate matrix-vector operations in neural networks. These compute cores accelerate matrix operations vital for deep learning models. Intel’s software tools, including the Intel® oneAPI DPC++/C++ Compiler, performance libraries, and optimised AI frameworks, support a flexible and scalable open ecosystem for developers.
Advancing Accelerated Computing with Open Software and Compute Capacity
In a special session at ISC 2024, Andrew Richards, CEO of Codeplay (an Intel company), will discuss the growing demand for accelerated computing and software in HPC and AI. He will highlight oneAPI, which offers a unified programming model across various architectures. Built on open standards, oneAPI allows developers to write code that runs seamlessly on different hardware without extensive modifications or vendor lock-in. The Linux Foundation’s Unified Acceleration Foundation (UXL), with members like Arm, Google, Intel, and Qualcomm, is working towards an open ecosystem for all accelerators to eliminate proprietary lock-in. The UXL Foundation is expanding its coalition.
Intel® Tiber™ Developer Cloud is enhancing its compute capacity with new hardware and service capabilities. This allows enterprises and developers to test the latest Intel architecture, quickly innovate and optimise AI models and workloads, and deploy AI models at scale. New hardware includes previews of Intel® Xeon® 6 E-core and P-core systems for select customers and large-scale clusters based on Intel® Gaudi® 2 and Intel® Data Center GPU Max Series. New capabilities include Intel® Kubernetes Service for cloud-native AI training and inference workloads and multi user accounts.
What’s Next
New supercomputers featuring Intel Xeon CPU Max Series and Intel Data Center GPU Max Series technologies will advance HPC and AI. These systems include the Euro-Mediterranean Centre on Climate Change’s (CMCC) Cassandra for climate change modelling, the Italian National Agency for New Technologies’ (ENEA) CRESCO 8 for fusion energy breakthroughs, the Texas Advanced Computing Center (TACC) for data analysis in various scientific fields, and the United Kingdom Atomic Energy Authority (UKAEA) for solving memory-bound problems in future fusion power plant design.
The mixed-precision AI benchmark results will be crucial for Intel’s next-generation GPU for AI and HPC, code-named Falcon Shores. This next-gen GPU will combine the best of Intel® Xe architecture and Intel® Gaudi®, providing a unified programming interface.
Early performance results show that Intel® Xeon® 6 with P-cores and Multiplexer Combined Ranks (MCR) memory at 8800 megatransfers per second (MT/s) delivers up to 2.3x performance improvement for real-world HPC applications like the Nucleus for European Modeling of the Ocean (NEMO), compared to the previous generation, establishing a strong foundation as the preferred host CPU for HPC solutions.
COMMENTS