Huy Nguyen's Portfolio

Ph.D. Application Countdown: until Dec 15, 2025

Every second brings me closer to my dream

Introduction


Hi there! I'm Huy Nguyen, a fourth-year Undergraduate researcher at the University of Virginia pursuing a Bachelor's degree in Computer Science with a minor in Applied Mathematics. Originally from Ho Chi Minh City, Vietnam, I moved to the United States as an international student with aspirations to pursue a Ph.D. in Computer Science.

I am currently conducting research in Hardware Architecture and Acceleration at the Lava Lab under Prof. Kevin Skadron, where I'm working on innovating the Rodinia Benchmark Innovation. Previously, I was involved in GPU performance and optimization research at the Insight Lab under Prof. Adwait Jog, focusing on LLM-driven CUDA optimization and MLPerf benchmarking across multi-GPU systems.

My research interests lie in GPU architecture, benchmark-driven performance evaluation, and memory systems, with particular expertise in CUDA programming, architectural simulation (GPGPU-Sim), and characterizing workload behavior across GPU generations. I am passionate about building reproducible evaluation methodologies that bridge the gap between theoretical hardware capabilities and practical performance.

Beyond research, I serve as a Teaching Assistant and Grader for Computer Science and Applied Mathematics courses, where I find fulfillment in mentoring and supporting fellow students.

I am actively searching for PhD programs for this upcoming Fall 2026 cycle! I am specifically interested in research groups working on GPU architecture, memory systems, performance modeling, hardware accelerators, and reproducible benchmarking methodologies.


Research Interests & Experience


My Motivation: Moore's Law is running into physical limits as we can no longer keep shrinking transistors to pack more into a single core. Therefore, to keep improving performance, we have to go beyond scaling with smarter microarchitectures, tighter memory and interconnect design, and software that actually exploits these features. That motivates my work in Computer Architecture and Hardware Acceleration with a specific emphasis on GPUs: Currently, I am interested in developing reproducible benchmarking suites and feature-isolating microbenchmarks that systematically characterize where performance comes from, identify architectural bottlenecks, and inform design decisions for emerging GPU generations. My work on modernizing Rodinia v4.0 addresses these challenges by creating evaluation tools that stress modern GPUs and enable meaningful architectural research.

Below are my research projects addressing these challenges, advised by amazing professors and labs at UVA.

Rodinia Benchmark v4.0 — Modernizing Benchmarks for Emerging GPU Architectures

Advised by Prof. Kevin Skadron

I am leading the modernization of Rodinia for CUDA 12+, larger datasets, and contemporary GPU features such as Tensor Cores, Cooperative Groups, and multi-GPU. I built a reproducible harness to compare legacy versus modernized implementations across GPU generations and created microbenchmarks that isolate memory hierarchy, warp scheduling, and synchronization effects. Using Nsight tooling and GPGPU-Sim, I focus on making results explainable so architectural choices map cleanly to application-level behavior.

GPU L2 Cache Architecture Exploration with GPGPU-Sim

Advised by Prof. Kevin Skadron

I simulated extended L2 cache designs in GPGPU-Sim and co-modeled area and latency with CACTI to study bandwidth-latency trade-offs, hit-rate impacts, and workload sensitivity. I evaluated design points with Rodinia and ML kernels, scripted parameter sweeps, and analyzed how cache capacity and organization shape end-to-end performance. This study complements my benchmark modernization by tying memory-system choices to observed kernel behavior.

LLM-Driven CUDA Optimization & MLPerf Profiling

Advised by Prof. Adwait Jog

Modern ML frameworks abstract away low-level GPU details, often masking inefficiencies in memory access, scheduling, and kernel fusion. I work on investigating the performance gap between hand-optimized CUDA, PyTorch implementations, and LLM-generated code through direct implementation of core ML operators (matrix multiplication, convolution). I discovered an iterative feedback pipeline guides LLMs to automatically refine kernels using profiling insights. Moreover, I also extended the research with MLPerf-based evaluation across architectures and workloads. This reveals not just execution time differences, but the architectural bottlenecks such as memory patterns, warp divergence, synchronization overhead are those that explain why performance varies.

GPU-Accelerated Molecular Dynamics & NEP Modeling

Advised by Prof. Keivan Esfarjani

I built a GPU reproducible MD workflow for entropy-stabilized oxides using GPUMD and a Neuroevolution ML Potential. Additionally, I automated large simulation batches with SLURM job arrays, improved NEP training on GPUs to reduce energy and force error, and analyzed thermal trends across temperatures and compositions. The project lead to a Journal of Applied Physics publication and seeded my interest in the hardware–software interface, especially GPU computing that ultimately drives performance.


Publications


Neuroevolution Machine Learning Potential for High-Temperature Deformation Studies of Entropy-Stabilized Oxide MgNiCoCuZnOâ‚…
Authors: B. Timalsina, H. G. Nguyen, K. Esfarjani
Published in Journal of Applied Physics, October 2024
This study presents the development of a neuroevolution machine learning potential (NEP) for entropy-stabilized oxide MgNiCoCuZnOâ‚… (J14) to explore its lattice distortion, elastic properties, and thermal conductivity across a wide temperature range. The NEP potential demonstrates high accuracy compared to density functional theory (DFT) calculations and experimental data. [J. Appl. Phys. 136, 155109 (2024), DOI: 10.1063/5.0224282]