Introduction
Hi there! I'm Huy Nguyen, a fourth-year Undergraduate researcher at the University of Virginia pursuing a Bachelor's degree in Computer Science with a minor in Applied Mathematics. Originally from Ho Chi Minh City, Vietnam, I moved to the United States as an international student with aspirations to pursue a Ph.D. in Computer Science.
I am currently conducting research in Hardware Architecture and Acceleration at the LAVA Lab under Prof. Kevin Skadron, where I'm working on innovating the Rodinia Benchmark Innovation. Previously, I was involved in GPU performance and optimization research at the Insight Lab, focusing on LLM-driven CUDA optimization and MLPerf benchmarking across multi-GPU systems. In parallel, from April 2023 to March 2025, I conducted GPU-accelerated simulation and machine learning research with Keivan Esfarjani, applying molecular dynamics and neural potentials to materials modeling.
My research interests lie in GPU architecture, benchmark-driven performance evaluation, and memory systems, with particular expertise in CUDA programming, architectural simulation (GPGPU-Sim), and characterizing workload behavior across GPU generations. I am passionate about building reproducible evaluation methodologies that bridge the gap between theoretical hardware capabilities and practical performance.
Beyond research, I serve as a Teaching Assistant and Grader for Computer Science and Applied Mathematics courses, where I find fulfillment in mentoring and supporting fellow students.
I am actively searching for PhD programs for this upcoming Fall 2026 cycle! I am specifically interested in research groups working on GPU architecture, memory systems, performance modeling, hardware accelerators, and reproducible benchmarking methodologies.
Education
| | University of Virginia - School of Engineering, 2026 |
| B.S. in Computer Science and Minor in Applied Mathematics - GPA: 3.98 |
Research Interests & Experience
My Motivation: Moore's Law is running into physical limits as we can no longer keep shrinking transistors to pack more into a single core. Thus, future performance gains must come from architectural innovation: by squeezing more useful work from every transistor on the die, not just making them smaller.
This mesmerizing idea drew me toward GPUs and cemented my research interest with two approaches: (1) GPU architecture and benchmarking and (2) GPU-accelerated AI systems. In particular, I am interested in understanding where GPU performance is lost in real workloads and designing architectures, systems, and benchmarks that expose and close those gaps.
Below are my research projects addressing these challenges, advised by amazing professors and labs at UVA.
Rodinia v4.0: Modernizing Benchmark Design and Datasets for Emerging GPU Architectures
I am leading the modernization of Rodinia for CUDA 12+, larger datasets, and contemporary GPU features such as Tensor Cores, Cooperative Groups, and multi-GPU. I built a reproducible harness to compare legacy versus modernized implementations across GPU generations and created microbenchmarks that isolate memory hierarchy, warp scheduling, and synchronization effects. Using Nsight tooling and GPGPU-Sim, I focus on making results explainable so architectural choices map cleanly to application-level behavior.
Presentation: University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (Expected April 2026)
GPU L2 Cache Architecture Exploration with GPGPU-Sim
I simulated extended L2 cache designs in GPGPU-Sim and co-modeled area and latency with CACTI to study bandwidth-latency trade-offs, hit-rate impacts, and workload sensitivity. I evaluated design points with Rodinia and ML kernels, scripted parameter sweeps, and analyzed how cache capacity and organization shape end-to-end performance. This study complements my benchmark modernization by tying memory-system choices to observed kernel behavior.
Presentation: University of Virginia Engineering Research Expo, Charlottesville, VA (November 2025)
LLM-Driven CUDA Code Optimization: A Comparative Study with PyTorch
I developed an iterative optimization pipeline using LLM-based prompting to accelerate GPT-4o-generated CUDA code, documenting recurring bottlenecks and best practices for LLM-assisted GPU programming. I profiled and compared LLM-optimized CUDA against PyTorch implementations on NVIDIA H100/A100 GPUs, revealing framework trade-offs for ML workload selection. Additionally, I stress-tested large-scale models (ResNet-50, BERT) with RAPIDS AI libraries (cuML, cuDF, TensorRT) to assess scalability and performance across different optimization strategies.
Presentations:
• University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (April 2025)
• University of Virginia Engineering Research Expo, Charlottesville, VA (October 2024)
Profiling GPU Performance Across MLCommons Tools
I reproduced the MLPerf v4.0 ResNet-50 Training benchmark on UVA GPU servers, identifying throughput bottlenecks and tuning multi-GPU NVLink scaling with mixed-precision Tensor Cores and XLA. I am currently comparing MLPerf Training/Inference with MLPerf Tiny to characterize scaling inefficiencies between edge deployments and large-scale GPU clusters.
GPU-Accelerated Molecular Dynamics & NEP Modeling
I built a GPU reproducible MD workflow for entropy-stabilized oxides using GPUMD and a Neuroevolution ML Potential. Additionally, I automated large simulation batches with SLURM job arrays, improved NEP training on GPUs to reduce energy and force error, and analyzed thermal trends across temperatures and compositions. The project lead to a Journal of Applied Physics publication and seeded my interest in the hardware–software interface, especially GPU computing that ultimately drives performance.
Presentation: University of Virginia Research Computing Exhibition, Charlottesville, VA (April 2025)
Selected Hardware Projects
I modeled extended GPU L2 cache architectures in GPGPU-Sim using memory-bound Rodinia benchmarks (BFS, K-means, Gaussian Elimination, Needleman–Wunsch) to evaluate cache behavior under realistic workloads. I evaluated trade-offs between larger L2 capacity, higher associativity, and prefetching, pinpointing configurations where extra cache lowers miss rates and runtime without prohibitive area or energy overheads.
I designed and tuned multiple CUDA kernels for SpMSpM using shared-memory hashing and dynamic scheduling across NVIDIA GPUs to balance parallelism, occupancy, and memory traffic, achieving significant speedups over CPU baselines by exploiting GPU memory hierarchies and warp-level operations. The project involved profiling with Nsight Compute to identify bottlenecks and iteratively optimizing for different matrix sizes and sparsity levels.
CPU/GPU Memory & Near-Data Processing
I explored CPU/GPU memory hierarchies, cache modeling, DRAM simulation, CUDA programming, and near-data processing using PIMeval-PIMbench to build intuition for memory-centric architectures.
Publications
Published in Journal of Applied Physics, October 2024
Honors and Awards

The Raven Society (UVA's highest honor for academic leadership) 2025, 2026
The Raven Society is UVA's oldest and most prestigious honorary society, recognizing exceptional students for academic excellence, leadership, and service to the University and Charlottesville community. Membership is highly selective and considered one of the highest honors a student can receive at UVA.

UVA Computer Science Summer Research Fellowship Summer 2025
Awarded $5000 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at UVA Research Symposium.

Dean's Research Scholarship Summer 2024
Awarded $4,800 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at Fall Undergraduate Research Expo.

UVA Engineering Dean's List All Semesters
Achieved Dean's List recognition for maintaining a GPA of 3.40 or higher and completing over 15 credits of graded coursework in the preceding semester.
