Huy Nguyen's Portfolio

Introduction

Hi there! I'm Huy Nguyen, a fourth-year Undergraduate researcher at the University of Virginia. Originally from Ho Chi Minh City, Vietnam, I moved to the United States as an international student with aspirations to pursue a Ph.D. in Computer Science.

I am currently conducting research in Hardware Architecture and Acceleration at the LAVA Lab under Prof. Kevin Skadron, where I'm working on innovating the Rodinia Benchmark Suite.

Previously from 2023 to 2025, I was involved in GPU performance and optimization research at the Insight Lab, focusing on LLM-driven CUDA optimization and MLPerf benchmarking across multi-GPU systems. In parallel, I also conducted GPU-accelerated Molecular Dynamics simulations and machine learning research for materials modeling with Prof. Keivan Esfarjani

Beyond research, I serve as a Teaching Assistant and Grader for Computer Science and Applied Mathematics courses, where I find fulfillment in mentoring and supporting fellow students.

I am actively searching for PhD programs for this upcoming Fall 2026 cycle! I am specifically interested in research groups working on GPU architecture, memory systems, performance modeling, hardware accelerators, and reproducible benchmarking methodologies.

Education

	University of Virginia - School of Engineering, 2026
	B.S. in Computer Science & Minor in Applied Mathematics - GPA: 3.98 / 4.00

Research Interests

My Motivation: Moore's Law is running into its physical limits as we can no longer keep shrinking transistors to pack more into a single core. Thus, future performance gains must come from architectural innovation - by squeezing more useful work from every transistor on the die.

This mesmerizing idea drew me toward GPUs and cemented my research interest with two approaches: (1) GPU architecture and benchmarking and (2) GPU-accelerated AI systems.

My research interests lie in GPU architecture, benchmark-driven performance evaluation, and memory systems, with particular expertise in CUDA programming, architectural simulation (GPGPU-Sim), and characterizing workload behavior across GPU generations. In particular, I am interested in understanding where GPU performance is lost in real workloads and designing architectures and benchmarks that expose and close those gaps.

Research Experience

Below are my research projects addressing these challenges, advised by amazing professors and labs at UVA.

Rodinia v4.0: Modernizing Benchmark Design and Datasets for Emerging GPU Architectures

Advised by Prof. Kevin Skadron @ LAVA Lab · UVA Computer Science Summer Research Fellowship Summer 2025

Project Report

May 2025 – Present · Ongoing

I am leading the modernization of Rodinia for CUDA 12+, larger datasets, and contemporary GPU features such as Tensor Cores, Cooperative Groups, and multi-GPU. I built a reproducible harness to compare legacy versus modernized implementations across GPU generations and created microbenchmarks that isolate memory hierarchy, warp scheduling, and synchronization effects. Using Nsight tooling and GPGPU-Sim, I focus on making results explainable so architectural choices map cleanly to application-level behavior.

Presentation: University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (Expected April 2026)

GPU L2 Cache Architecture Exploration with GPGPU-Sim

Advised by Prof. Kevin Skadron @ LAVA Lab

Project Report

Feb 2025 – May 2025 · Revising

I simulated extended L2 cache designs in GPGPU-Sim and co-modeled area and latency with CACTI to study bandwidth-latency trade-offs, hit-rate impacts, and workload sensitivity. I evaluated design points with Rodinia and ML kernels, scripted parameter sweeps, and analyzed how cache capacity and organization shape end-to-end performance. This study complements my benchmark modernization by tying memory-system choices to observed kernel behavior.

Presentation: University of Virginia Engineering Research Expo, Charlottesville, VA (November 2025)

LLM-Driven CUDA Code Optimization: A Comparative Study with PyTorch

Advised by Prof. Adwait Jog @ Insight Lab · Dean's Research Scholarship Summer 2024

Project Poster Report

Aug 2023 – March 2025 · Revising

I developed an iterative optimization pipeline using LLM-based prompting to accelerate GPT-4o-generated CUDA code, documenting recurring bottlenecks and best practices for LLM-assisted GPU programming. I profiled and compared LLM-optimized CUDA against PyTorch implementations on NVIDIA H100/A100 GPUs, revealing framework trade-offs for ML workload selection. Additionally, I stress-tested large-scale models (ResNet-50, BERT) with RAPIDS AI libraries (cuML, cuDF, TensorRT) to assess scalability and performance across different optimization strategies.

Presentations:
• University of Virginia - Undergraduate Engineering Research and Design Symposium, Charlottesville, VA (April 2025)
• University of Virginia Engineering Research Expo, Charlottesville, VA (October 2024)

Profiling GPU Performance Across MLCommons Tools

Advised by Prof. Adwait Jog @ Insight Lab

Report

Aug 2024 – December 2024 · Revising

I reproduced the MLPerf v4.0 ResNet-50 Training benchmark on UVA GPU servers, identifying throughput bottlenecks and tuning multi-GPU NVLink scaling with mixed-precision Tensor Cores and XLA. I am currently comparing MLPerf Training/Inference with MLPerf Tiny to characterize scaling inefficiencies between edge deployments and large-scale GPU clusters.

GPU-Accelerated Molecular Dynamics & NEP Modeling

Advised by Prof. Keivan Esfarjani @ ELM Group

Project Paper Published on JAP '24

Apr 2023 – March 2025 · Completed

I built a GPU reproducible MD workflow for entropy-stabilized oxides using GPUMD and a Neuroevolution ML Potential. Additionally, I automated large simulation batches with SLURM job arrays, improved NEP training on GPUs to reduce energy and force error, and analyzed thermal trends across temperatures and compositions. The project lead to a Journal of Applied Physics publication and seeded my interest in the hardware–software interface, especially GPU computing that ultimately drives performance.

Presentation: University of Virginia Research Computing Exhibition, Charlottesville, VA (April 2025)

Selected Hardware Projects

GPU Sparse Matrix – Sparse Matrix (SpMSpM) Multiplication

CS 4444: Intro to Parallel Computing – Fall 2025

GitHub Report

I designed and tuned multiple CUDA kernels for SpMSpM using shared-memory hashing and dynamic scheduling across NVIDIA GPUs to balance parallelism, occupancy, and memory traffic, achieving significant speedups over CPU baselines by exploiting GPU memory hierarchies and warp-level operations. The project involved profiling with Nsight Compute to identify bottlenecks and iteratively optimizing for different matrix sizes and sparsity levels.

CPU/GPU Memory & Near-Data Processing

CS 6501: CPU/GPU Memory & Near-Data Processing – Spring 2025

GitHub & Report

I explored CPU/GPU memory hierarchies, cache modeling, DRAM simulation, CUDA programming, and near-data processing using PIMeval-PIMbench to build intuition for memory-centric architectures.

Publications

Effect of stoichiometry on thermodynamic and thermal transport properties of entropy-stabilized oxide MgCoNiCuZnO₅

Authors: B. Timalsina, H. G. Nguyen, K. Esfarjani
Published in Journal of Applied Physics, Jan 2026

This study investigates the effect of stoichiometry on the thermodynamic and thermal transport properties of entropy-stabilized oxide MgCoNiCuZnO₅ (J14) using computational methods. [J. Appl. Phys. 139, 025105 (2026), DOI: 10.1063/5.0303178]

Neuroevolution Machine Learning Potential for High-Temperature Deformation Studies of Entropy-Stabilized Oxide MgNiCoCuZnO₅

Authors: B. Timalsina, H. G. Nguyen, K. Esfarjani
Published in Journal of Applied Physics, Oct 2024

This study presents the development of a neuroevolution machine learning potential (NEP) for entropy-stabilized oxide MgNiCoCuZnO₅ (J14) to explore its lattice distortion, elastic properties, and thermal conductivity across a wide temperature range. The NEP potential demonstrates high accuracy compared to density functional theory (DFT) calculations and experimental data. [J. Appl. Phys. 136, 155109 (2024), DOI: 10.1063/5.0224282]

Honors and Awards

The Raven Society (UVA's highest honor for academic leadership) 2025, 2026

View Certificate

The Raven Society is UVA's oldest and most prestigious honorary society, recognizing exceptional students for academic excellence, leadership, and service to the University and Charlottesville community. Membership is highly selective and considered one of the highest honors a student can receive at UVA.

UVA Computer Science Summer Research Fellowship Summer 2025

Awarded $5000 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at UVA Research Symposium.

Dean's Research Scholarship Summer 2024

View Certificate

Awarded $4,800 for a 10-week full-time research project with additional engagement in professional development workshops and presentation of findings at Fall Undergraduate Research Expo.

UVA Engineering Dean's List All Semesters

View Transcript

Achieved Dean's List recognition for maintaining a GPA of 3.40 or higher and completing over 15 credits of graded coursework in the preceding semester.