Antonio Guillen-Perez | AI Research & Engineering

Recent News & Highlights

[Early 2025] Initiated a new research stream at HPE exploring LLM-based agents for dynamic system control.
[Nov 2024] Our work on Hierarchical RL, GreenDCC, accepted to the AAAI 2025 Demonstration Track.
[Oct 2024] Our MARL benchmark, SustainDC, accepted to the NeurIPS 2024 Datasets and Benchmarks Track.
[May 2024] Filed a new U.S. patent on "Real-Time Carbon Footprint Reduction Controller".
[Dec 2023] Received the Best ML Innovation Award at the NeurIPS 2023 Climate Change AI Workshop.
[Sep 2022] Joined HPE AI Labs as an AI Research/Applied Scientist.
[Jun 2022] Awarded Ph.D. in Computer Science with Cum Laude distinction.

About Me

I am a Ph.D. AI Research Scientist and Engineer focused on creating the intelligent systems and foundational models needed for robust, general-purpose autonomy. My work is built on the following principles:

Scientific Foundation: My research centers on Reinforcement Learning (RL), Multi-Agent Systems, and Imitation Learning to solve complex coordination and control problems, resulting in novel algorithms like LfOD and publications at venues like NeurIPS and AAAI.
Engineering Execution: I architect and implement the necessary infrastructure to bring research to life, from large-scale, open-source simulations (SustainDC) to distributed training pipelines (Ray/RLlib) with hundreds of parallel workers.
Future Focus: My recent work explores the frontier of LLM-based agents, using fine-tuning (LoRA) and novel refinement techniques to build more capable and adaptable decision-makers for real-world systems.

Featured Projects

Animation of autonomous vehicles at an intersection

MARL agents coordinating to cross an intersection safely.

Autonomous Intersection Management

Designed and implemented a MARL system where autonomous vehicles learn to coordinate and safely cross intersections without traffic lights, significantly improving traffic flow.

My Key Contributions:

Designed the end-to-end MARL system architecture, including a novel LSTM-based state encoder.
Engineered the multi-objective reward function to balance efficiency and safety.
The final system reduced vehicle travel time by up to 59% in simulation.

[Paper] [GitHub]

System architecture for the SustainDC benchmark.

SustainDC: A NeurIPS Benchmark

Co-led the creation of an open-source, Gym-compatible benchmark for developing MARL controllers to optimize the energy and carbon footprint of data centers.

My Key Contributions:

Co-led the architectural design and open-source implementation.
Engineered the Python-based physics models for cooling and power.

[NeurIPS Paper] [GitHub]

Conceptual framework for Learning from Oracle Demonstrations.

Learning from Oracle Demonstrations (LfOD)

Developed a novel Imitation Learning paradigm to accelerate DRL training by using a learned "Oracle" agent to provide corrective demonstrations to the primary agent.

My Key Contributions:

Engineered the core LfOD methodology from first principles.
Implemented the TD3fOD algorithm to integrate oracle advice.
Demonstrated a 5x speedup in training convergence on complex tasks.

[Paper]

Animation showing the CNN prediction.

3D CNN Surrogate for Accelerating Physics Simulations

Developed a 3D CNN (U-Net) to act as a fast proxy for computationally expensive CFD simulations to predict 3D heat distribution in data centers, achieving a >2800x inference speedup over the original simulator.

My Key Contributions:

Evaluated 3D U-Net architectures for spatial heat prediction in data centers.
Engineered the data pipeline to process and voxelize raw CFD simulation data.
Used the surrogate model to optimize the workload placement using a genetic algorithm, reducing the maximum temperature by 7.7% and the energy consumption by 2.5%.

[NeurIPS Workshop Paper]

Core Technical Skills

Reinforcement & Decision Science

Foundations: Sequential Decision-Making, MDPs, Multi-Objective Optimization
Paradigms: Deep RL, Multi-Agent RL (MARL), Hierarchical RL (HRL), Imitation Learning (IL), Behavioral Cloning (BC), Learning from Demonstrations (LfD)
Algorithms: Policy Gradient (PPO, A2C), Value-Based (SAC, TD3, Q-Learning)
Techniques: Model-Based RL, Off-Policy Learning, Exploration Strategies (Epsilon-Greedy, Thompson Sampling)
Applications: Reward Function Design, Policy Optimization, RLHF

Deep Learning & Generative AI

LLM Agents: Agentic Frameworks, Tool Use, Planning, Fine-Tuning (PEFT, LoRA)
Architectures: Transformers & Attention, CNNs (U-Net, V-Net), RNNs (LSTM)
Generative Techniques: Surrogate Modeling, Data-Driven World Models, Diffusion Models (Conceptual)
Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
Training Techniques: Transfer Learning, Fine-Tuning, Hyperparameter Optimization
Evolutionary & Search Methods: Genetic Algorithms (for optimization), PSO, Bayesian Optimization

High-Performance ML Engineering

Distributed Systems: Large-Scale Training (Ray: RLlib, Tune), Parallel Computing
Infrastructure: Scalable ML Pipelines, MLOps Concepts, HPC Environments
Performance: Model Evaluation & Benchmarking, Debugging, Performance Profiling

Simulation & Embodied AI

Environment Development: Digital Twins, World Models (Gymnasium, PettingZoo)
Robotics Concepts: Motion & Behavioral Planning, Control Systems, Perception Pipeline
Tools & Data: Physics Simulators (SUMO, CARLA), Synthetic Data Generation

Expert In

Python, PyTorch, Ray (RLlib, Tune)

Proficient With

NumPy, Pandas, Scikit-learn, Stable Baselines3, Docker, Git, Linux, C++ (Basic)

Selected Publications

SustainDC: Benchmarking for Sustainable Data Center Control

A. Naug*, A. Guillen-Perez*, R. Luna Gutierrez*, V. Gundecha, et al.

Advances in Neural Information Processing Systems (NeurIPS), 2024

[Paper] [GitHub]

Learning From Oracle Demonstrations—A New Approach to Develop Autonomous Intersection Management...

A. Guillen-Perez, M.D. Cano

IEEE Access, 2022

[Paper]

Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections

A. Guillen-Perez, M.D. Cano

IEEE Transactions on Vehicular Technology, 2022

[Paper] [GitHub]

N-CRITICS: Self-Refinement of Large Language Models with Ensemble of Critics

Sajad Mousavi, Ricardo Luna Gutierrez, A. Guillen-Perez, et al.

NeurIPS 2023 Workshop on Robustness of Foundation Models

[Paper]

For a full list of publications, please visit my Google Scholar profile.

Antonio Guillen-Perez, Ph.D.

AI Research & Engineering | Reinforcement Learning & Autonomous Systems

Recent News & Highlights

About Me

Featured Projects

Autonomous Intersection Management

SustainDC: A NeurIPS Benchmark

Learning from Oracle Demonstrations (LfOD)

3D CNN Surrogate for Accelerating Physics Simulations

Core Technical Skills

Reinforcement & Decision Science

Deep Learning & Generative AI

High-Performance ML Engineering

Simulation & Embodied AI

Expert In

Proficient With

Selected Publications