Rsearch Projects2 List
ECRP: PerfGen: Synthesizing Performance using GenAI
The PerfGen project addresses the critical need for extensive performance data in HPC environments, where traditional data collection is often time-consuming and resource-intensive. By developing generative AI methods, specifically using GAN-based and LLM-based approaches, PerfGen synthesizes high-fidelity performance data, with LLM-based methods showing superior performance. The framework also introduces a new dissimilarity metric to evaluate the quality of the generated data, ensuring it supports accurate and effective downstream machine learning tasks. This innovative approach enables scalable and efficient performance optimization in HPC systems.
Fractal: Few-Shot Transfer Learning for Performance
Few-shot transfer learning and generative AI significantly enhance the prediction of relative performance in HPC environments. By using a few samples from the target application, few-shot learning adapts the source model to improve generalizability, while generative AI synthesizes performance samples to mitigate data scarcity, ensuring accurate and efficient knowledge transfer across different platforms.
RECUP: Perf. Reproducibility
Studying performance reproducibility is vital in the era of heterogeneous supercomputing due to increased performance variation and reduced consistency across runs. Understanding how factors like network traffic, power limits, concurrency tuning, and job interference impact performance is essential for achieving both optimal and reproducible outcomes in HPC environments.
ECRP: Performance-in-a-Graph (PinG)
The performance analytics domain in High Performance Computing (HPC) uses tabular data to solve regression problems, such as predicting the execution time. Existing Machine Learning (ML) techniques leverage the correlations among features given tabular datasets, not leveraging the relationships between samples directly. Moreover, since high-quality embed-dings from raw features improve the fidelity of the downstream predictive models, existing methods rely on extensive feature engineering and pre-processing steps, costing time and manual effort. To fill these two gaps, we propose a novel idea of transforming tabular performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques in capturing complex relationships between features and samples. In contrast to other ML application domains, such as social networks, the graph is not given; instead, we need to build it. To address this gap, we propose graph-building methods where nodes represent samples, and the edges are automatically inferred iteratively based on the similarity between the features in the samples. We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks compared to other state-of-the-art representation learning techniques. Our evaluation demonstrates that even with up to 25 % random missing values for each dataset, our method outperforms commonly used graph and Deep Neural Network (DNN)-based approaches and achieves up to 61.67% & 78.56 % improvement in MSE loss over the DNN baseline respectively for HPC dataset and Machine Learning Datasets.
Performance Characterization
Performance characterization is crucial as it identifies the key interactions between applications and hardware, captured by performance counters, which are essential for predicting execution times and guiding optimization efforts. It serves as a prerequisite to performance optimization and autotuning by providing insights into bottlenecks and resource usage patterns, allowing for informed adjustments that enhance overall system performance. Without this foundational understanding, tuning efforts are less effective and may not fully leverage the potential of HPC systems.
Cache Bottleneck Characterization
In this project, I built a robust low-level library for Intel's PEBS counter collection in user space, and developed an analysis approach for correlating PEBS data with application-specific information such as line number to characterize cache-access bottlenecks of applications.
Data-driven decision making
This new direction of my research investigates the viability of leveraging automated data-driven analysis in decision making for various domains including job scheduling, resource management, health care, transportation planning and more. There are several collaborative research opportunities currently pursue in my lab and I am looking for students interests in optimizations to join my team.
Reliable and Efficient Checkpointing for Cycle-sharing Systems
This research developed failure models for high-throughput systems, e.g. Grids, and developed a scalable data transfer and checkpoint storing mechanism to provide high fault-tolerance in a volatile environment.
Proxy Application Development and Validation
Proxy applications are written to represent subsets of performance behaviors of larger, and more complex applications that often have distribution restrictions. In this research, we developed a systematic methodology for quantitatively compare how well proxies match with their parents.
Scalable Checkpoint/Restart
My Ph.D. thesis built scalable checkpoint/restart systems for both high-throughput (Grid using Condor) and high-performance computing environments. A significant part of my thesis contributed to the Scalable Checkpoint Restart framework that won the R&D 100 award in 2019.
Performance-Aware Application Development
My group is developing machine learning models to predict the impact of a code change on application performance. This project aims to help application developers assess how their proposed code changes will impact that application's performance before the code is executed. We envision that the performance information collected over time from nightly tests provided as feedback to the model will significantly improve its accuracy.
Power-Aware Resilience
This project developed several algorithms for shifting power in an I/O-aware manner to other applications in an HPC system to improve performance. Since I/O phases of applications use less power, moving it to applications or processes crunching numbers accelerates their computation.