Publications
Publications by categories in reversed chronological order.
Journal Articles
2025
- A Survey of Distributed Asynchronous Many-Task Models and Their ApplicationsJoseph Schuchart, Patrick Diehl, Michael Bauer, Aurelien Bouteiller, Gregor Daiss, Engin Kayraklioglu, Shreyas Khandekar, Thomas Herault, John Holmen, Ritvik Rao, Alexander Strack, Elliott Schlaughter, Jennifer Spinti, Jeremy Thornock, Alex Aiken, Olivier Aumage, Martin Berzins, George Bosilca, Bradford hamberlain, and Laxmikant KalePreprint, Dec 2025
@article{Schuchart2025, author = {Schuchart, Joseph and Diehl, Patrick and Bauer, Michael and Bouteiller, Aurelien and Daiss, Gregor and Kayraklioglu, Engin and Khandekar, Shreyas and Herault, Thomas and Holmen, John and Rao, Ritvik and Strack, Alexander and Schlaughter, Elliott and Spinti, Jennifer and Thornock, Jeremy and Aiken, Alex and Aumage, Olivier and Berzins, Martin and Bosilca, George and hamberlain, Bradford and Kale, Laxmikant}, journal = {Preprint}, year = {2025}, month = dec, pages = {}, title = {A Survey of Distributed Asynchronous Many-Task Models and Their Applications}, doi = {10.36227/techrxiv.176652588.81044275/v1}, keywords = {journal}, } - HARD: A performance portable radiation hydrodynamics code based on FleCSI frameworkJulien Loiseau, Hyun Lim, Andrés Yagüe López, Mammadbaghir Baghirzade, Shihab Shahriar Khan, Yoonsoo Kim, Sudarshan Neopane, Alexander Strack, Farhana Taiyebah, and Ben BergenSoftwareX, Dec 2025
Hydrodynamics And Radiation Diffusion (HARD) is an open-source application for high-performance simulations of compressible hydrodynamics with radiation-diffusion coupling. Built on the FleCSI (Bergen et al., 2021 [1]) (Flexible Computational Science Infrastructure) framework, HARD expresses its computational units as tasks whose execution can be orchestrated by multiple back-end runtimes, including Legion (Bauer et al., 2012 [2]), MPI (Forum, 1994 [3]), and HPX (Kaiser et al., 2020 [4]). Node-level parallelism is handled through Kokkos (Edwards et al., 2014 [5]), providing a single-source, portable code base that runs efficiently on laptops, small homogeneous clusters, and the largest heterogeneous supercomputers currently available. To ensure scientific reliability, HARD includes a regression test suite that automatically reproduces canonical verification problems such as the Sod and LeBlanc shock tubes, and the Sedov blast wave, comparing numerical solutions against known analytical results. The project is distributed under an OSI-approved license, hosted on GitHub, and accompanied by reproducible build scripts and continuous integration workflows. This combination of performance portability, verification infrastructure, and community-focused development makes HARD a sustainable platform for advancing radiation hydrodynamics research across multiple domains.
@article{Loiseau2025_hard, title = {HARD: A performance portable radiation hydrodynamics code based on FleCSI framework}, journal = {SoftwareX}, volume = {32}, pages = {102441}, year = {2025}, issn = {2352-7110}, doi = {https://doi.org/10.1016/j.softx.2025.102441}, url = {https://www.sciencedirect.com/science/article/pii/S2352711025004078}, author = {Loiseau, Julien and Lim, Hyun and {Yagüe López}, Andrés and Baghirzade, Mammadbaghir and Khan, Shihab Shahriar and Kim, Yoonsoo and Neopane, Sudarshan and Strack, Alexander and Taiyebah, Farhana and Bergen, Ben}, keywords = {journal}, }
Conference and Workshop Articles
2026
- Parallel FFTW on RISC-V: A Comparative Study Including OpenMP, MPI, and HPXAlexander Strack, Christopher Taylor, and Dirk PflügerIn High Performance Computing, Dec 2026
Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores. RISC-V processors with core counts comparable to the SG2042, make efficient parallelization as crucial for RISC-V as the more established processors such as x86-64.
@inproceedings{Strack2026_fft_riscv, author = {Strack, Alexander and Taylor, Christopher and Pfl{\"u}ger, Dirk}, editor = {Neuwirth, Sarah and Paul, Arnab Kumar and Weinzierl, Tobias and Carson, Erin Claire}, title = {Parallel FFTW on RISC-V: A Comparative Study Including OpenMP, MPI, and HPX}, booktitle = {High Performance Computing}, year = {2026}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {586--597}, isbn = {978-3-032-07612-0}, doi = {https://doi.org/10.1007/978-3-032-07612-0_45}, keywords = {paper}, } - GPRat: Gaussian Process Regression with Asynchronous TasksMaksim Helmann, Alexander Strack, and Dirk PflügerIn Asynchronous Many-Task Systems and Applications, Dec 2026
Python is the de-facto language for software development in artificial intelligence (AI). Commonly used libraries, such as PyTorch and TensorFlow, rely on parallelization built into their BLAS backends to achieve speedup on CPUs. However, only applying parallelization in a low-level backend can lead to performance and scaling degradation. In this work, we present a novel way of binding task-based C++ code built on the asynchronous runtime model HPX to a high-level Python API using pybind11. We develop a parallel Gaussian process (GP) library as an application. The resulting Python library GPRat combines the ease of use of commonly available GP libraries with the performance and scalability of asynchronous runtime systems. We evaluate the performance on a mass-spring-damper system, a standard benchmark from control theory, for varying numbers of regressors (features). The results show almost no binding overhead when binding the asynchronous HPX code using pybind11. Compared to GPyTorch and GPflow, GPRat shows superior scaling on up to 64 cores on an AMD EPYC 7742 CPU for training. Furthermore, our library achieves a prediction speedup of 7.63 over GPyTorch and 25.25 over GPflow. If we increase the number of features from eight to 128, we observe speedups of 29.62 and 21.19, respectively. These results showcase the potential of using asynchronous tasks within Python-based AI applications.
@inproceedings{Helmann2026_gprat, author = {Helmann, Maksim and Strack, Alexander and Pfl{\"u}ger, Dirk}, editor = {Diehl, Patrick and Cao, Qinglei and Herault, Thomas and Bosilca, George}, title = {GPRat: Gaussian Process Regression with Asynchronous Tasks}, booktitle = {Asynchronous Many-Task Systems and Applications}, year = {2026}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {83--94}, isbn = {978-3-031-97196-9}, doi = {https://doi.org/10.1007/978-3-031-97196-9_7}, keywords = {paper}, }
2025
- A HPX Communication Benchmark: Distributed FFT Using CollectivesAlexander Strack and Dirk PflügerIn Euro-Par 2024: Parallel Processing Workshops, Dec 2025
Due to increasing core counts in modern processors, several task-based runtimes emerged, including the C++ Standard Library for Concurrency and Parallelism (HPX). Although the asynchronous many-task runtime HPX allows implicit communication via an Active Global Address Space, it also supports explicit collective operations. Collectives are an efficient way to realize complex communication patterns.
@inproceedings{Strack2025_parcelports, author = {Strack, Alexander and Pfl{\"u}ger, Dirk}, editor = {Caino-Lores, Silvina and Zeinalipour, Demetris and Doudali, Thaleia Dimitra and Singh, David E. and Garz{\'o}n, Gracia Ester Mart{\'i}n and Sousa, Leonel and Andrade, Diego and Cucinotta, Tommaso and D'Ambrosio, Donato and Diehl, Patrick and Dolz, Manuel F. and Jukan, Admela and Montella, Raffaele and Nardelli, Matteo and Garcia-Gasulla, Marta and Neuwirth, Sarah}, title = {A HPX Communication Benchmark: Distributed FFT Using Collectives}, booktitle = {Euro-Par 2024: Parallel Processing Workshops}, year = {2025}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {271--274}, isbn = {978-3-031-90203-1}, doi = {https://doi.org/10.1007/978-3-031-90203-1_25}, keywords = {paper}, }
2024
- Experiences Porting Shared and Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-StudyAlexander Strack, Christopher Taylor, Patrick Diehl, and Dirk PflügerIn Asynchronous Many-Task Systems and Applications, Dec 2024
Parallel algorithms relying on synchronous parallelization libraries often experience adverse performance due to global synchronization barriers. Asynchronous many-task runtimes offer task futurization capabilities that minimize or remove the need for global synchronization barriers. This paper conducts a case study of the multidimensional Fast Fourier Transform to identify which applications will benefit from the asynchronous many-task model. Our basis is the popular FFTW library [7]. We use the asynchronous many-task model HPX and a one-dimensional FFTW backend to implement multiple versions using different HPX features and highlight overheads and pitfalls during migration. Furthermore, we add an HPX threading backend to FFTW. The case study analyzes shared memory scaling properties between our HPX-based parallelization and FFTW with its pthreads, OpenMP, and HPX backends. The case study also compares FFTW’s MPI+X backend to a purely HPX-based distributed implementation. The FFT application does not profit from asynchronous task execution. In contrast, enforcing task synchronization results in better cache performance and thus better runtime. Nonetheless, the HPX backend for FFTW is competitive with existing backends. Our distributed HPX implementation based on HPX collectives using MPI parcelport has similar performance to FFTW’s MPI+OpenMP. However, the LCI parcelport of HPX accelerated communication up to factor 5.
@inproceedings{Strack2024_hpxfft, author = {Strack, Alexander and Taylor, Christopher and Diehl, Patrick and Pfl{\"u}ger, Dirk}, editor = {Diehl, Patrick and Schuchart, Joseph and Valero-Lara, Pedro and Bosilca, George}, title = {Experiences Porting Shared and Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-Study}, booktitle = {Asynchronous Many-Task Systems and Applications}, year = {2024}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {111--122}, isbn = {978-3-031-61763-8}, doi = {https://doi.org/10.1007/978-3-031-61763-8_11}, keywords = {paper}, }
2023
- Scalability of Gaussian Processes Using Asynchronous Tasks: A Comparison Between HPX and PETScAlexander Strack and Dirk PflügerIn Asynchronous Many-Task Systems and Applications, Dec 2023
Gaussian processes are a widely used alternative to neural networks for non-linear system identification. The method requires computing the inversion of a large covariance matrix. In this work, we introduce our new task-based asynchronous implementation, focusing on its most popular solver, the Cholesky decomposition. Our implementation is based on HPX, utilizing its asynchronous many-task runtime system. We can therefore investigate its scaling on multi-core hardware and for GPU offloading. Furthermore, we compare our HPX implementation against a high-level reference implementation based on PETSc. We demonstrate that the HPX implementation’s performance is directly tied to the chosen tile size. Compared to the PETSc reference, our task-based implementation is faster in the entire node-level strong scaling experiment on EPYC ROME, showing better parallel efficiency.
@inproceedings{Strack2023_cholesky, author = {Strack, Alexander and Pfl{\"u}ger, Dirk}, editor = {Diehl, Patrick and Thoman, Peter and Kaiser, Hartmut and Kale, Laxmikant}, title = {Scalability of Gaussian Processes Using Asynchronous Tasks: A Comparison Between HPX and PETSc}, booktitle = {Asynchronous Many-Task Systems and Applications}, year = {2023}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {52--64}, isbn = {978-3-031-32316-4}, doi = {https://doi.org/10.1007/978-3-031-32316-4_5}, keywords = {paper}, }
Posters and Extended Abstracts
2025
- Radiation Hydrodynamics at Scale with FleCSI-HARDAlexander Strack, Mammadbaghir Baghirzade, Shihab Shahriar Khan, Yoonsoo Kim, Sudarshan Neopane, Farhana Taiyebah, Julien Loiseau, and Hyun LimDec 2025
@conference{StrackPosterSalishan25, author = {Strack, Alexander and Baghirzade, Mammadbaghir and Khan, Shihab Shahriar and Kim, Yoonsoo and Neopane, Sudarshan and Taiyebah, Farhana and Loiseau, Julien and Lim, Hyun}, title = {Radiation Hydrodynamics at Scale with FleCSI-HARD}, booktitle = {Salishan Conference on High Speed Computing}, year = {2025}, keywords = {poster}, }
2024
- Computational Radiation Hydrodynamics with FleCSIMammadbaghir Baghirzade, Shihab Shahriar Khan, Yoonsoo Kim, Sudarshan Neopane, Alexander Strack, Farhana Taiyebah, Julien Loiseau, and Hyun LimDec 2024
@conference{BaghirzadePosterSC24, author = {Baghirzade, Mammadbaghir and Khan, Shihab Shahriar and Kim, Yoonsoo and Neopane, Sudarshan and Strack, Alexander and Taiyebah, Farhana and Loiseau, Julien and Lim, Hyun}, title = {Computational Radiation Hydrodynamics with FleCSI}, booktitle = {The International Conference for High Performance Computing, Networking, Storage, and Analysis}, year = {2024}, keywords = {poster}, }