Host: Patrick Diehl
Co-Host: Philipp Edelmann
The Lightweight Communication Interface (LCI) is an experimental communication library aiming for better asynchronous multithreaded communication support, both in terms of performance and programmability. It is also a research tool helping us understand how to design communication libraries to better fit the needs of dynamic programming systems/applications. It features a simple, incrementally refinable interface unifying all common point-to-point communication primitives and an atomic-based runtime for maximum threading efficiency. It has been integrated into established asynchronous many-task systems such as HPX and shown significant performance improvement on microbenchmarks/real-world applications. This talk will present an overview of its interface and software design and showcase its performance.
Bio:Jiakun Yan is a fifth-year Ph.D. student at UIUC, advised by Prof. Marc Snir. His research involves exploring better communication library designs for highly dynamic/irregular programming systems and applications. He is the main contributor to the Lightweight Communication Interface (LCI) Project and the HPX LCI parcelport.
In the realm of High-Performance Computing (HPC), achieving peak system performance while maintaining code safety and concurrency has always been a challenging endeavor. Traditional HPC frameworks often struggle with memory safety issues, race conditions, and the complexities of parallel programming. In this talk, I introduce Lamellar, a new HPC runtime that leverages the modern programming language Rust to address these enduring challenges. Rust, known for its powerful type system and memory safety guarantees without a garbage collector, is rapidly gaining traction within the systems programming community. However, its potential in the HPC domain is yet to be fully explored. The Lamellar runtime harnesses Rust's strengths, providing a robust, scalable, and safe environment for developing and executing high-performance applications. This talk is designed for computational domain and computer scientists who are well-acquainted with HPC concepts but may be new to Rust. The talk will cover the following key topics:
Bio:
Dr. Ryan Friese is a senior computer scientist at Pacific Northwest National Laboratory in the Future Computing Technologies group. His research interests span hardware/software co-design of runtime and system software for novel architectures, HPC (high performance computing) network simulation and modeling, the analysis and optimization of data movement in large scale distributed workflows, and performance modeling of irregular applications. His recent work has focused on enabling memory safe programming on HPC systems by leading the development of the Lamellar Runtime, an asynchronous distributed runtime written in the Rust Programming Language. He received his PhD in Electrical & Computer Engineering in 2015 from Colorado State University.
Julia has been supported as a first-class language on NERSC's systems for over 10 years [1]. In this talk we will discuss how Julia has been deployed at scale, the technical issues encountered and how they were eventually overcome. We will also explore the challenges faced with developing intuitive and portable distributed HPC applications that are capable of targeting modern GPU architectures, and NERSC's vision for supporting modern interdisciplinary and multi-facility workflows.
[1] https://info.juliahub.com/case-studies/celeste
Johannes Blaschke is a HPC workflow performance expert leading the NERSC Science Acceleration Program (NESAP). His research interests include urgent and interactive HPC; and programming environments and models for cross-facility workflows. Johannes supports Julia on NERSC's systems, including one of the first examples of integrating MPI.jl and Distributed.jl with the HPE's Slingshot network technology. Johannes is a zealous advocate for Julia as an HPC programming language, and a contributor and organizer of Julia tutorials and BoFs at SC, JuliaCon and within the DoE.
We present a summary of our research and community efforts exploring the Julia language for the scientific mission of the US Department of Energy (DOE) at the intersection of high-performance computing (HPC) and high-productivity. Powered by the LLVM compiler infrastructure combined with a unifying ecosystem and friendly scientific syntax, Julia attempts to lower cost of a “two-language and multiple ecosystems” paradigm (e.g. Python+compiled language). Along with the Julia intro and HPC hands-on tutorials, we present our efforts on: (i) building an accessible performance portable CPU/GPU library: JACC.jl, (ii) the outcome of external venues (SC BoFs, tutorials) and workshops at Oak Ridge National Laboratory (ORNL), and (iii) our research, best paper at SC23 WORKS, on the unifying value for using a single front-end language on Frontier, the second fastest supercomputer in the world, and (iv) our work, best paper at SC24 XLOOP, connecting ORNL’s experimental and computational facilities using JACC.jl. Hence, Julia aspires to make more accessible the future landscape of heterogeneous, AI-driven, and energy-aware computing by leveraging existing investments outside DOE in LLVM and commercial applications of the language.
Bio:William Godoy is a senior computer scientist in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL). His interests are in high-performance computing, parallel programming systems, scientific software and workflows. At ORNL, he contributed to the Exascale Computing Project applications -QMCPACK- and software technologies portfolios – ADIOS2, Julia/LLVM, and projects impacting ORNL’s computing and neutron science facilities. Godoy currently works across research projects funded by the US Department of Energy Advanced Scientific Computing Research (ASCR) program. Prior to ORNL, he was a staff member at Intel Corporation and a postdoctoral fellow at NASA Langley Research Center. Godoy received PhD and MSc degrees from the University at Buffalo, The State University of New York, and a BSc from the National Engineering University (UNI) Lima, Peru, in mechanical engineering. He is a senior member of the IEEE, and a member of ACM, ASME and US-RSE serving in several venues and technical committees.
Traditional static resource allocation in supercomputers (jobs retain a fixed set of resources) leads to inefficiencies. Resource adaptivity (jobs can change resources at runtime) significantly increases supercomputer efficiency.
This talk will exploit Asynchronous Many-Task (AMT) programming, which is well suited for adaptivity due to its transparent resource management. An AMT runtime system dynamically assigns user-defined small tasks to workers to achieve load balancing and adapt to resource changes.
We will discuss techniques for malleability and evolving capabilities that allow programs to dynamically change resources without interrupting computation. Automatic load detection heuristics determine when to start or terminate processes, which is particularly beneficial for unpredictable workloads. Practicality is demonstrated by adapting the GLB library. A generic communication interface allows interaction between programs and resource managers. Evaluations with a prototype resource manager show significant improvements in batch makespan, node utilization, and job turnaround time for both malleable and evolving jobs.
Jonas is a dedicated computer scientist specializing in High Performance Computing. He received his Bachelor’s and Master’s degrees from the University of Kassel, Germany, where he also earned his Ph.D. in 2022. He is currently working as an substitute chair for the Software Engineering Group at the same university and is also writing his habilitation.
Jonas' research interests include load balancing, fault tolerance, and resource adaptivity for Asynchronous Many-Task (AMT) systems. Recently, he has focused on resource adaptivity in general to optimize the efficient use of supercomputing resources. His work covers a broad spectrum, including the development of advanced job scheduling algorithms, the improvement of application programming using AMT systems, and the interaction between resource managers and jobs.
Rust is a modern language that provides type- and memory-safety without a garbage collector, using a concept called lifetimes. Many see Rust as a successor to languages like C and C++, and there are many interested individuals in the computational science community, yet few major projects have made the switch. I'll introduce the language and its ecosystem, including the state of scientific computing libraries. We'll discuss what soundness means for libraries and examine rsmpi, which safely exposes MPI and allows catching many common bugs at compile-time. We'll also discuss type-system approaches to collective semantics, and conclude with an outlook on Rust for scientific computing.
Bio:Jed leads the Physical Prediction, Inference, and Design group at CU Boulder. He is a maintainer of PETSc, libCEED, and rsmpi (Rust bindings to MPI), and is active in many open source communities. He works on high-performance numerical software infrastructure for computational science and engineering, as well as applications such as structural mechanics and materials science, non-Newtonian and turbulent flows, and plasmas. He is co-director of the PSAAP-3 Multidisciplinary Simulation Center for Micromorphic Multiphysics Porous and Particulate Materials Simulations Within Exascale Computing Workflows.
The thermonuclear supernova modeling pipeline has been refined for over four decades and has achieved substantial success in modeling various supernova subtypes. Nonetheless, continuous innovation is essential for maintaining supernova modeling at the forefront of computational astrophysics. In this work, we examine a novel scenario, so called thermonuclear electron-capture supernovae. Originally proposed by Jones et al. (2016), this scenario consists of a collapsing sAGB star that only narrowly escape collapse to a neutron star by runaway thermonuclear thermonuclear burning. Here, we explore the specific circumstances under which such a thermonuclear explosion can occur and under which conditions the collapse can be averted by nuclear burning. Subsequently, we leverage this scenario to motivate a long-overdue update to the thermonuclear supernova modeling pipeline, both by increasing the complexity of the physics included, as well as updating the underlying codebase for the latest exascale computing clusters. In particular, we advocate the integration of radiation hydrodynamics and the transition towards a performance portable programming model.
Bio:The number of RISC-V commercial products increased substantially this past year. This presentation is an orientation to the range of RISC-V hardware, HPC software support, the community, and the current state of HPC-relevant ISA extensions. Acquiring RISC-V hardware is no longer a question of when - it is possible now.
Bio:Chris is a senior principle research engineer at Tactical Computing Labs. His work experience includes compilers, runtime systems, systems level software, numerical libraries, applied math problems, and hardware simulation. He has a Masters Degree in Computer Science from Georgia Tech and an undergraduate degree in Computer Science from Clemson.
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics simulations, necessitating precise physics resolution in localized areas across expansive domains. Today's supercomputers' extreme heterogeneity and large number of compute nodes present a significant challenge for such dynamically adaptive codes, highlighting the importance of both scalability and performance portability. Our research focuses on how to address this by integrating the asynchronous, many-task runtime system HPX with the performance-portability framework Kokkos and SIMD types. To demonstrate and benchmark our solutions at scale, we incorporated them into Octo-Tiger, an adaptive, massively parallel application for the simulation of binary star systems and their outcomes. Thanks to this, Octo-Tiger now supports a diverse set of processors, accelerators, as well as network backends and can scale on various supercomputers, such as Perlmutter, Frontier, and Fugaku. In this talk, we outline our various integrations between HPX and Kokkos. Furthermore, we show the challenges we encountered when using these frameworks together in Octo-Tiger and how we addressed them, ultimately achieving scalability on a selection of current supercomputers.
Bio:Gregor Daiß is a PhD student at the University of Stuttgart, specializing in high-performance computing. His main interests include task-based runtime systems, distributed computing, performance-portability as well as refactoring large-scale simulations and porting them to accelerators. Current work mostly involves both Kokkos (for performance-portability) and HPX (task-based runtime system) for these purposes.
The Journal of Open Source Software (JOSS) is an open-access, no-fee scholarly journal that publishes quality open-source research software based on open peer review. JOSS was founded in 2016 with the dual objectives of giving traditional academic publication credit for software work and improving the quality of research software. Since its founding, JOSS has published over 2500 software papers—and counting!—with over 80 active editors spread across seven topic-area tracks. To handle this level of submissions and publishing, relying on a fully volunteer team, JOSS relies on GitHub and a system of open tools for reviewing and publishing submissions, driven by chatbot commands. Authors submit short Markdown papers along with links to their software's repository, which are compiled to PDF via Pandoc. JOSS’s editorial bot performs automated health checks on submissions, and reviews take place in GitHub issues, with authors, editors, and reviewers issuing bot commands via comments. This talk will describe the publication experience of JOSS and its machinery, and how it can be adapted by other communities.
Bio:Kyle E. Niemeyer is an Associate Professor at Oregon State University in the School of Mechanical, Industrial, and Manufacturing Engineering. He also serves as the Associate School Head for Undergraduate Programs. He leads the Niemeyer Research Group, which uses computational modeling to study various phenomena involving fluid flows, including combustion and chemical kinetics, and related topics like numerical methods and parallel computing. He is also a strong advocate of open access, open source software, and open science in general, and has contributed in the area of standardizing research software citation. Kyle has received multiple prestigious fellowships throughout his career, including the AAAS Science & Technology Policy Fellowship in 2022, the Better Scientific Software (BSSw) Fellowship in 2019, the NSF Graduate Research Fellowship in 2010, and the National Defense Science and Engineering Graduate Fellowship in 2009. Kyle received his Ph.D. in Mechanical Engineering from Case Western Reserve University in 2013. He received BS and MS degrees in Aerospace Engineering from Case Western Reserve University in 2009 and 2010, respectively.
Material:The Message Passing Interface standard has long been the lingua franca of HPC. Its design has enabled the development of many distributed parallel applications. After 30 years, the field of high-performance computing has seen several programming paradigms come and go. However, MPI has yet to address the challenges of accelerator-based computing, the advent of modern languages such as Rust, Python, and C++, and fully asynchronous programming models. This talk will provide insights into current efforts on modernizing MPI, from accelerator integration to improved datatype handling for modern languages.
Bio:Joseph Schuchart is a Senior Research Scientist at the Institute for Advanced Computational Sciences at Stony Brook University. His research revolves around distributed asynchronous and task-based programming models, communication libraries, and design aspects of integrating different models. Joseph received his M.Sc. in Computer Science from Dresden University of Technology in 2012 and his PhD from the University of Stuttgart in 2020. He is an active member of the MPI Forum and a contributor to the Open MPI project.