Huy Nguyen's Portfolio

Introduction


Hi there! I'm Huy Nguyen, a fourth-year Undergraduate researcher at the University of Virginia pursuing a Bachelor's degree in Computer Science with a minor in Applied Mathematics. Originally from Ho Chi Minh City, Vietnam, I moved to the United States as an international student with aspirations to pursue a Ph.D. in Computer Science.

I am currently conducting research in Hardware Architecture and Acceleration at the LAVA Lab under Prof. Kevin Skadron, where I'm working on innovating the Rodinia Benchmark Innovation. Previously, I was involved in GPU performance and optimization research at the Insight Lab, focusing on LLM-driven CUDA optimization and MLPerf benchmarking across multi-GPU systems. In parallel, from April 2023 to March 2025, I conducted GPU-accelerated simulation and machine learning research with Keivan Esfarjani, applying molecular dynamics and neural potentials to materials modeling.

My research interests lie in GPU architecture, benchmark-driven performance evaluation, and memory systems, with particular expertise in CUDA programming, architectural simulation (GPGPU-Sim), and characterizing workload behavior across GPU generations. I am passionate about building reproducible evaluation methodologies that bridge the gap between theoretical hardware capabilities and practical performance.

Beyond research, I serve as a Teaching Assistant and Grader for Computer Science and Applied Mathematics courses, where I find fulfillment in mentoring and supporting fellow students.

I am actively searching for PhD programs for this upcoming Fall 2026 cycle! I am specifically interested in research groups working on GPU architecture, memory systems, performance modeling, hardware accelerators, and reproducible benchmarking methodologies.


Education


University of Virginia - School of Engineering, 2026
B.S. in Computer Science and Minor in Applied Mathematics - GPA: 3.98

Research Interests & Experience


My Motivation: Moore's Law is running into physical limits as we can no longer keep shrinking transistors to pack more into a single core. Thus, future performance gains must come from architectural innovation: by squeezing more useful work from every transistor on the die, not just making them smaller.

This mesmerizing idea drew me toward GPUs and cemented my research interest with two approaches: (1) GPU architecture and benchmarking and (2) GPU-accelerated AI systems. In particular, I am interested in understanding where GPU performance is lost in real workloads and designing architectures, systems, and benchmarks that expose and close those gaps.

Below are my research projects addressing these challenges, advised by amazing professors and labs at UVA.

Rodinia v4.0: Modernizing Benchmark Design and Datasets for Emerging GPU Architectures

Advised by Prof. Kevin Skadron @ LAVA Lab · UVA Computer Science Summer Research Fellowship Summer 2025

I am leading the modernization of Rodinia for CUDA 12+, larger datasets, and contemporary GPU features such as Tensor Cores, Cooperative Groups, and multi-GPU. I built a reproducible harness to compare legacy versus modernized implementations across GPU generations and created microbenchmarks that isolate memory hierarchy, warp scheduling, and synchronization effects. Using Nsight tooling and GPGPU-Sim, I focus on making results explainable so architectural choices map cleanly to application-level behavior.

Presentation: University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (Expected April 2026)

GPU L2 Cache Architecture Exploration with GPGPU-Sim

Advised by Prof. Kevin Skadron @ LAVA Lab

I simulated extended L2 cache designs in GPGPU-Sim and co-modeled area and latency with CACTI to study bandwidth-latency trade-offs, hit-rate impacts, and workload sensitivity. I evaluated design points with Rodinia and ML kernels, scripted parameter sweeps, and analyzed how cache capacity and organization shape end-to-end performance. This study complements my benchmark modernization by tying memory-system choices to observed kernel behavior.

Presentation: University of Virginia Engineering Research Expo, Charlottesville, VA (November 2025)

LLM-Driven CUDA Code Optimization: A Comparative Study with PyTorch

Advised by Prof. Adwait Jog @ Insight Lab · Dean's Research Scholarship Summer 2024

I developed an iterative optimization pipeline using LLM-based prompting to accelerate GPT-4o-generated CUDA code, documenting recurring bottlenecks and best practices for LLM-assisted GPU programming. I profiled and compared LLM-optimized CUDA against PyTorch implementations on NVIDIA H100/A100 GPUs, revealing framework trade-offs for ML workload selection. Additionally, I stress-tested large-scale models (ResNet-50, BERT) with RAPIDS AI libraries (cuML, cuDF, TensorRT) to assess scalability and performance across different optimization strategies.

Presentations:
• University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (April 2025)
• University of Virginia Engineering Research Expo, Charlottesville, VA (October 2024)

Profiling GPU Performance Across MLCommons Tools

Advised by Prof. Adwait Jog @ Insight Lab

I reproduced the MLPerf v4.0 ResNet-50 Training benchmark on UVA GPU servers, identifying throughput bottlenecks and tuning multi-GPU NVLink scaling with mixed-precision Tensor Cores and XLA. I am currently comparing MLPerf Training/Inference with MLPerf Tiny to characterize scaling inefficiencies between edge deployments and large-scale GPU clusters.

GPU-Accelerated Molecular Dynamics & NEP Modeling

Advised by Prof. Keivan Esfarjani @ ELM Group

I built a GPU reproducible MD workflow for entropy-stabilized oxides using GPUMD and a Neuroevolution ML Potential. Additionally, I automated large simulation batches with SLURM job arrays, improved NEP training on GPUs to reduce energy and force error, and analyzed thermal trends across temperatures and compositions. The project lead to a Journal of Applied Physics publication and seeded my interest in the hardware–software interface, especially GPU computing that ultimately drives performance.

Presentation: University of Virginia Research Computing Exhibition, Charlottesville, VA (April 2025)


Selected Hardware Projects


GPU L2 Cache Optimizer

CS 6501 – Spring 2025 · In revision for Rodinia v4.0 Integration at LAVA Lab

I modeled extended GPU L2 cache architectures in GPGPU-Sim using memory-bound Rodinia benchmarks (BFS, K-means, Gaussian Elimination, Needleman–Wunsch) to evaluate cache behavior under realistic workloads. I evaluated trade-offs between larger L2 capacity, higher associativity, and prefetching, pinpointing configurations where extra cache lowers miss rates and runtime without prohibitive area or energy overheads.

GPU Sparse Matrix – Sparse Matrix (SpMSpM) Multiplication

CS 4444: Intro to Parallel Computing – Fall 2025

I designed and tuned multiple CUDA kernels for SpMSpM using shared-memory hashing and dynamic scheduling across NVIDIA GPUs to balance parallelism, occupancy, and memory traffic, achieving significant speedups over CPU baselines by exploiting GPU memory hierarchies and warp-level operations. The project involved profiling with Nsight Compute to identify bottlenecks and iteratively optimizing for different matrix sizes and sparsity levels.

CPU/GPU Memory & Near-Data Processing

CS 6501: CPU/GPU Memory & Near-Data Processing – Spring 2025

I explored CPU/GPU memory hierarchies, cache modeling, DRAM simulation, CUDA programming, and near-data processing using PIMeval-PIMbench to build intuition for memory-centric architectures.


Publications


Neuroevolution Machine Learning Potential for High-Temperature Deformation Studies of Entropy-Stabilized Oxide MgNiCoCuZnOâ‚…
Authors: B. Timalsina, H. G. Nguyen, K. Esfarjani
Published in Journal of Applied Physics, October 2024
This study presents the development of a neuroevolution machine learning potential (NEP) for entropy-stabilized oxide MgNiCoCuZnOâ‚… (J14) to explore its lattice distortion, elastic properties, and thermal conductivity across a wide temperature range. The NEP potential demonstrates high accuracy compared to density functional theory (DFT) calculations and experimental data. [J. Appl. Phys. 136, 155109 (2024), DOI: 10.1063/5.0224282]


Honors and Awards


Raven Society Icon

The Raven Society (UVA's highest honor for academic leadership) 2025, 2026

View Certificate

The Raven Society is UVA's oldest and most prestigious honorary society, recognizing exceptional students for academic excellence, leadership, and service to the University and Charlottesville community. Membership is highly selective and considered one of the highest honors a student can receive at UVA.

CS Fellowship Icon

UVA Computer Science Summer Research Fellowship Summer 2025

Awarded $5000 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at UVA Research Symposium.

Research Fellowship Icon

Dean's Research Scholarship Summer 2024

View Certificate

Awarded $4,800 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at Fall Undergraduate Research Expo.

Dean's List Icon

UVA Engineering Dean's List All Semesters

View Transcript

Achieved Dean's List recognition for maintaining a GPA of 3.40 or higher and completing over 15 credits of graded coursework in the preceding semester.