Recent News & Highlights
- [Early 2025] Initiated a new research stream at HPE exploring LLM-based agents for dynamic system control.
- [Nov 2024] Our work on Hierarchical RL, GreenDCC, accepted to the AAAI 2025 Demonstration Track.
- [Oct 2024] Our MARL benchmark, SustainDC, accepted to the NeurIPS 2024 Datasets and Benchmarks Track.
- [May 2024] Filed a new U.S. patent on "Real-Time Carbon Footprint Reduction Controller".
- [Dec 2023] Received the Best ML Innovation Award at the NeurIPS 2023 Climate Change AI Workshop.
- [Sep 2022] Joined HPE AI Labs as an AI Research/Applied Scientist.
- [Jun 2022] Awarded Ph.D. in Computer Science with Cum Laude distinction.
About Me
I am a Ph.D. AI Research Scientist and Engineer focused on creating the intelligent systems and foundational models needed for robust, general-purpose autonomy. My work is built on the following principles:
- Scientific Foundation: My research centers on Reinforcement Learning (RL), Multi-Agent Systems, and Imitation Learning to solve complex coordination and control problems, resulting in novel algorithms like LfOD and publications at venues like NeurIPS and AAAI.
- Engineering Execution: I architect and implement the necessary infrastructure to bring research to life, from large-scale, open-source simulations (SustainDC) to distributed training pipelines (Ray/RLlib) with hundreds of parallel workers.
- Future Focus: My recent work explores the frontier of LLM-based agents, using fine-tuning (LoRA) and novel refinement techniques to build more capable and adaptable decision-makers for real-world systems.
Featured Projects

MARL agents coordinating to cross an intersection safely.
Autonomous Intersection Management
Designed and implemented a MARL system where autonomous vehicles learn to coordinate and safely cross intersections without traffic lights, significantly improving traffic flow.
My Key Contributions:- Designed the end-to-end MARL system architecture, including a novel LSTM-based state encoder.
- Engineered the multi-objective reward function to balance efficiency and safety.
- The final system reduced vehicle travel time by up to 59% in simulation.

System architecture for the SustainDC benchmark.
SustainDC: A NeurIPS Benchmark
Co-led the creation of an open-source, Gym-compatible benchmark for developing MARL controllers to optimize the energy and carbon footprint of data centers.
My Key Contributions:- Co-led the architectural design and open-source implementation.
- Engineered the Python-based physics models for cooling and power.

Conceptual framework for Learning from Oracle Demonstrations.
Learning from Oracle Demonstrations (LfOD)
Developed a novel Imitation Learning paradigm to accelerate DRL training by using a learned "Oracle" agent to provide corrective demonstrations to the primary agent.
My Key Contributions:- Engineered the core LfOD methodology from first principles.
- Implemented the TD3fOD algorithm to integrate oracle advice.
- Demonstrated a 5x speedup in training convergence on complex tasks.

Animation showing the CNN prediction.
3D CNN Surrogate for Accelerating Physics Simulations
Developed a 3D CNN (U-Net) to act as a fast proxy for computationally expensive CFD simulations to predict 3D heat distribution in data centers, achieving a >2800x inference speedup over the original simulator.
My Key Contributions:- Evaluated 3D U-Net architectures for spatial heat prediction in data centers.
- Engineered the data pipeline to process and voxelize raw CFD simulation data.
- Used the surrogate model to optimize the workload placement using a genetic algorithm, reducing the maximum temperature by 7.7% and the energy consumption by 2.5%.
Core Technical Skills
Reinforcement & Decision Science
- Foundations: Sequential Decision-Making, MDPs, Multi-Objective Optimization
- Paradigms: Deep RL, Multi-Agent RL (MARL), Hierarchical RL (HRL), Imitation Learning (IL), Behavioral Cloning (BC), Learning from Demonstrations (LfD)
- Algorithms: Policy Gradient (PPO, A2C), Value-Based (SAC, TD3, Q-Learning)
- Techniques: Model-Based RL, Off-Policy Learning, Exploration Strategies (Epsilon-Greedy, Thompson Sampling)
- Applications: Reward Function Design, Policy Optimization, RLHF
Deep Learning & Generative AI
- LLM Agents: Agentic Frameworks, Tool Use, Planning, Fine-Tuning (PEFT, LoRA)
- Architectures: Transformers & Attention, CNNs (U-Net, V-Net), RNNs (LSTM)
- Generative Techniques: Surrogate Modeling, Data-Driven World Models, Diffusion Models (Conceptual)
- Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Training Techniques: Transfer Learning, Fine-Tuning, Hyperparameter Optimization
- Evolutionary & Search Methods: Genetic Algorithms (for optimization), PSO, Bayesian Optimization
High-Performance ML Engineering
- Distributed Systems: Large-Scale Training (Ray: RLlib, Tune), Parallel Computing
- Infrastructure: Scalable ML Pipelines, MLOps Concepts, HPC Environments
- Performance: Model Evaluation & Benchmarking, Debugging, Performance Profiling
Simulation & Embodied AI
- Environment Development: Digital Twins, World Models (Gymnasium, PettingZoo)
- Robotics Concepts: Motion & Behavioral Planning, Control Systems, Perception Pipeline
- Tools & Data: Physics Simulators (SUMO, CARLA), Synthetic Data Generation
Expert In
Python, PyTorch, Ray (RLlib, Tune)
Proficient With
NumPy, Pandas, Scikit-learn, Stable Baselines3, Docker, Git, Linux, C++ (Basic)
Selected Publications
SustainDC: Benchmarking for Sustainable Data Center Control
Advances in Neural Information Processing Systems (NeurIPS), 2024
Learning From Oracle Demonstrations—A New Approach to Develop Autonomous Intersection Management...
IEEE Access, 2022
Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections
IEEE Transactions on Vehicular Technology, 2022
N-CRITICS: Self-Refinement of Large Language Models with Ensemble of Critics
NeurIPS 2023 Workshop on Robustness of Foundation Models
For a full list of publications, please visit my Google Scholar profile.