Recent News & Highlights
-
[Aug 2025] New Research Stream: Advancing Autonomous Driving AI.
- Developed "Efficient Virtuoso," a state-of-the-art latent diffusion Transformer for goal-conditioned trajectory planning, achieving minADE of 0.25 on Waymo Open Motion Dataset. [Paper] [GitHub]
- Submitted new research on "Mining the Long Tail" for robust Offline RL in AVs to arXiv, demonstrating significant safety improvements via data curation. [Paper] [GitHub]
- Submitted new research on "From Imitation to Optimization" for Offline RL in AVs to arXiv, demonstrating 3.2x higher success rates than BC baselines. [Paper] [GitHub]
- [Early 2025] Initiated a new research stream at HPE exploring LLM-based agents for dynamic system control.
- [Nov 2024] Our work on Hierarchical RL, GreenDCC, accepted to the AAAI 2025 Demonstration Track.
- [Oct 2024] Our MARL benchmark, SustainDC, accepted to the NeurIPS 2024 Datasets and Benchmarks Track.
- [May 2024] Filed a new U.S. patent on "Real-Time Carbon Footprint Reduction Controller".
- [Dec 2023] Received the Best ML Innovation Award at the NeurIPS 2023 Climate Change AI Workshop.
- [Sep 2022] Joined HPE AI Labs as an AI Research/Applied Scientist.
- [Jun 2022] Awarded Ph.D. in Computer Science with Cum Laude distinction.
About Me
I am a Ph.D. AI Research Scientist and Engineer focused on creating the intelligent systems and foundational models needed for robust, general-purpose autonomy. My work is built on the following principles:
- Scientific Foundation: My research centers on Reinforcement Learning (RL), Multi-Agent Systems, and Imitation Learning to solve complex coordination and control problems, resulting in novel algorithms like LfOD and publications at venues like NeurIPS and AAAI.
- Engineering Execution: I architect and implement the necessary infrastructure to bring research to life, from large-scale, open-source simulations (SustainDC) to distributed training pipelines (Ray/RLlib) with hundreds of parallel workers.
- Future Focus: My recent work explores the frontier of LLM-based agents, using fine-tuning (LoRA) and novel refinement techniques to build more capable and adaptable decision-makers for real-world systems.
Featured Projects

Our Sparse Route model (right) generates highly unbiased precise trajectories, outperforming other goal representations.
Qualitative comparison of goal representations in a challenging turning scenario. (Click to expand)
Efficient Virtuoso: Latent Diffusion Transformer for Trajectory Planning
Developed a state-of-the-art conditional latent diffusion model for goal-conditioned trajectory planning, achieving a \textbf{minADE of 0.25} on Waymo Open Motion Dataset. Introduced novel normalization and provided key insights into optimal goal representation for AVs.
My Key Contributions:- Pioneered a two-stage normalization pipeline for stable latent diffusion training.
- Designed a Transformer-based StateEncoder for rich scene context fusion.
- Conducted rigorous ablation on goal representation, proving multi-step routes are critical for tactical precision.
- Achieved state-of-the-art minADE of 0.25 on WOMD.

Baseline CQL: Collision in merge scenario.

Heuristic-Weighted: Suboptimal, reactive merge.

Uncertainty-Weighted: Proactive, successful merge.
Qualitative comparison of data curation strategies in a challenging highway merge.
Mining the Long Tail: Data Curation for Robust Offline RL in AVs
Systematically investigated six data curation strategies (heuristic, uncertainty, behavior-based) to tackle the long-tail problem in autonomous driving, achieving nearly a three-fold reduction in collision rate with uncertainty-based methods.
My Key Contributions:- Developed novel, data-driven criticality metrics (e.g., model disagreement via ensemble scouts) for non-uniform data sampling.
- Designed specialized PyTorch `Dataset` implementations for timestep and scenario-level weighting.
- Conducted large-scale comparative study demonstrating all curation methods significantly outperform uniform sampling.

BC-S (MLP): Fails due to complex interactions.

BC-T (Transformer): Brittle, leads to "circling" failure.

CQL (Offline RL): Robust recovery, successfully navigates.
Qualitative comparison of Behavioral Cloning baselines vs. robust Offline RL (CQL) agent. (Click to expand)
From Imitation to Optimization: Offline Learning for Autonomous Driving
Pioneered an end-to-end pipeline applying state-of-the-art Offline Reinforcement Learning (CQL) to the Waymo Open Motion Dataset, demonstrating significantly superior robustness over Behavioral Cloning baselines for long-horizon AV control.
My Key Contributions:- Engineered a robust, parallelized data processing pipeline for the Waymo Open Motion Dataset.
- Conducted rigorous comparative study, demonstrating CQL's 3.2x higher success rate and 7.4x lower collision rate over Transformer-based BC.
- Designed an effective multi-objective reward function for Offline RL training in autonomous driving.

MARL agents coordinating to cross an intersection safely.
Autonomous Intersection Management
Designed and implemented a MARL system where autonomous vehicles learn to coordinate and safely cross intersections without traffic lights, significantly improving traffic flow.
My Key Contributions:- Designed the end-to-end MARL system architecture, including a novel LSTM-based state encoder.
- Engineered the multi-objective reward function to balance efficiency and safety.
- The final system reduced vehicle travel time by up to 59% in simulation.

System architecture for the SustainDC benchmark.
SustainDC: A NeurIPS Benchmark
Co-led the creation of an open-source, Gym-compatible benchmark for developing MARL controllers to optimize the energy and carbon footprint of data centers.
My Key Contributions:- Co-led the architectural design and open-source implementation.
- Engineered the Python-based physics models for cooling and power.

Conceptual framework for Learning from Oracle Demonstrations.
Learning from Oracle Demonstrations (LfOD)
Developed a novel Imitation Learning paradigm to accelerate DRL training by using a learned "Oracle" agent to provide corrective demonstrations to the primary agent.
My Key Contributions:- Engineered the core LfOD methodology from first principles.
- Implemented the TD3fOD algorithm to integrate oracle advice.
- Demonstrated a 5x speedup in training convergence on complex tasks.

Animation showing the CNN prediction.
3D CNN Surrogate for Accelerating Physics Simulations
Developed a 3D CNN (U-Net) to act as a fast proxy for computationally expensive CFD simulations to predict 3D heat distribution in data centers, achieving a >2800x inference speedup over the original simulator.
My Key Contributions:- Evaluated 3D U-Net architectures for spatial heat prediction in data centers.
- Engineered the data pipeline to process and voxelize raw CFD simulation data.
- Used the surrogate model to optimize the workload placement using a genetic algorithm, reducing the maximum temperature by 7.7% and the energy consumption by 2.5%.
Core Technical Skills
Reinforcement & Decision Science
- Foundations: Sequential Decision-Making, MDPs, Multi-Objective Optimization, Credit Assignment
- Paradigms: Deep RL, Multi-Agent RL (MARL), Hierarchical RL (HRL), Imitation Learning (IL), Behavioral Cloning (BC), Learning from Demonstrations (LfD), Offline Reinforcement Learning (Offline RL)
- Algorithms: Policy Gradient (PPO, A2C), Value-Based (SAC, TD3, Q-Learning, Conservative Q-Learning - CQL)
- Techniques: Model-Based RL, Off-Policy Learning, Exploration Strategies, Reward Function Design & Shaping, Policy Optimization, RLHF, Data-Centric RL (Curation, Sampling, Weighting)
- Applications: Autonomous Driving Planning, Behavior Prediction, Robot Control, System Optimization
Deep Learning & Generative AI
- LLM Agents: Agentic Frameworks, Tool Use, Planning, Fine-Tuning (PEFT, LoRA)
- Architectures: Transformers & Attention, CNNs (U-Net, V-Net), RNNs (LSTM)
- Generative Techniques: Surrogate Modeling, Data-Driven World Models, Diffusion Models (Conceptual)
- Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Training Techniques: Transfer Learning, Fine-Tuning, Hyperparameter Optimization
- Evolutionary & Search Methods: Genetic Algorithms (for optimization), PSO, Bayesian Optimization
High-Performance ML Engineering
- Distributed Systems: Large-Scale Training (Ray: RLlib, Tune), Parallel Computing, Distributed Data Processing
- Infrastructure: Scalable ML Pipelines, MLOps Concepts, HPC Environments, High-Throughput Data Loaders
- Performance: Model Evaluation & Benchmarking, Debugging Large ML Codebases, Performance Profiling
Simulation & Embodied AI
- Environment Development: Digital Twins, World Models, (Gymnasium, PettingZoo), Waymax Simulator
- Robotics Concepts: Motion & Behavioral Planning, Control Systems, Perception Pipeline, Trajectory Prediction, Safety & Robustness
- Tools & Data: Physics Simulators (SUMO, CARLA), Waymo Open Motion Dataset (WOMD), Synthetic Data Generation, Real-World Data Integration, Large-Scale Datasets
Expert In
Python, PyTorch, Ray (RLlib, Tune)
Proficient With
NumPy, Pandas, Scikit-learn, Stable Baselines3, Docker, Git, Linux, Waymo Open Motion Dataset (WOMD), Waymax, C++ (Basic)
Selected Publications
Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning
arXiv preprint, August 2025
Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning
arXiv preprint, Aug 2025
From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving
arXiv preprint, July 2025
SustainDC: Benchmarking for Sustainable Data Center Control
Advances in Neural Information Processing Systems (NeurIPS), 2024
Learning From Oracle Demonstrations—A New Approach to Develop Autonomous Intersection Management...
IEEE Access, 2022
Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections
IEEE Transactions on Vehicular Technology, 2022
N-CRITICS: Self-Refinement of Large Language Models with Ensemble of Critics
NeurIPS 2023 Workshop on Robustness of Foundation Models
For a full list of publications, please visit my Google Scholar profile.