Skip to main content

TECHNOLOGY AND CODE article

Front. High Perform. Comput.
Sec. Parallel and Distributed Software
Volume 2 - 2024 | doi: 10.3389/fhpcp.2024.1444337

Parallel and scalable AI in HPC systems for CFD applications and beyond

Provisionally accepted
  • Jülich Supercomputing Center, Institute for Advanced Simulation, Julich Research Center, Helmholtz Association of German Research Centers (HZ), Jülich, Germany

The final, formatted version of the article will be published soon.

    This manuscript presents the library AI4HPC with its architecture and components. The library enables large-scale trainings of AI models on High-Performance Computing systems. It addresses challenges in handling non-uniform datasets through data manipulation routines, model complexity through specialized ML architectures, scalability through extensive code optimizations that augment performance, HyperParameter Optimization (HPO), and performance monitoring. The scalability of the library is demonstrated by strong scaling experiments on up to 3,664 Graphical Processing Units (GPUs) resulting in a scaling efficiency of 96%, using the performance on 1 node as baseline. Furthermore, code optimizations and communication/computation bottlenecks are discussed for training a neural network on an actuated Turbulent Boundary Layer (TBL) simulation dataset (8.3 TB) on the HPC system JURECA at the Jülich Supercomputing Centre.The distributed training approach significantly influences the accuracy, which can be drastically compromised by varying mini-batch sizes. Therefore, AI4HPC implements learning rate scaling and adaptive summation algorithms, which are tested and evaluated in this work. For the TBL use case, results scaled up to 64 workers are shown. A further increase in the number of workers causes an additional overhead due to too small dataset samples per worker. Finally, the library is applied for the reconstruction of TBL flows with a convolutional autoencoder-based architecture and a diffusion model. In case of the autoencoder, a modal decomposition shows that the network provides accurate reconstructions of the underlying field and achieves a mean drag prediction error of ≈ 5%. With the diffusion model, a reconstruction error of ≈ 4% is achieved when superresolution is applied to five-fold coarsened velocity fields. The AI4HPC library is agnostic to the underlying network and can be adapted across various scientific and technical disciplines.

    Keywords: Distributed training, high-performance computing, artificial intelligence, computational fluid dynamics, Turbulent boundary layer, Autoencoder

    Received: 05 Jun 2024; Accepted: 10 Sep 2024.

    Copyright: © 2024 Sarma, Inanc, Aach and Lintermann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Rakesh Sarma, Jülich Supercomputing Center, Institute for Advanced Simulation, Julich Research Center, Helmholtz Association of German Research Centers (HZ), Jülich, Germany

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.