Recent News & Highlights

  • [Early 2025] Initiated a new research stream at HPE exploring LLM-based agents for dynamic system control.
  • [Nov 2024] Our work on Hierarchical RL, GreenDCC, accepted to the AAAI 2025 Demonstration Track.
  • [Oct 2024] Our MARL benchmark, SustainDC, accepted to the NeurIPS 2024 Datasets and Benchmarks Track.
  • [May 2024] Filed a new U.S. patent on "Real-Time Carbon Footprint Reduction Controller".
  • [Dec 2023] Received the Best ML Innovation Award at the NeurIPS 2023 Climate Change AI Workshop.
  • [Sep 2022] Joined HPE AI Labs as an AI Research/Applied Scientist.
  • [Jun 2022] Awarded Ph.D. in Computer Science with Cum Laude distinction.

About Me

I am a Ph.D. AI Research Scientist and Engineer focused on creating the intelligent systems and foundational models needed for robust, general-purpose autonomy. My work is built on the following principles:

  • Scientific Foundation: My research centers on Reinforcement Learning (RL), Multi-Agent Systems, and Imitation Learning to solve complex coordination and control problems, resulting in novel algorithms like LfOD and publications at venues like NeurIPS and AAAI.
  • Engineering Execution: I architect and implement the necessary infrastructure to bring research to life, from large-scale, open-source simulations (SustainDC) to distributed training pipelines (Ray/RLlib) with hundreds of parallel workers.
  • Future Focus: My recent work explores the frontier of LLM-based agents, using fine-tuning (LoRA) and novel refinement techniques to build more capable and adaptable decision-makers for real-world systems.

Featured Projects

Animation of autonomous vehicles at an intersection

MARL agents coordinating to cross an intersection safely.

Autonomous Intersection Management

Designed and implemented a MARL system where autonomous vehicles learn to coordinate and safely cross intersections without traffic lights, significantly improving traffic flow.

My Key Contributions:
  • Designed the end-to-end MARL system architecture, including a novel LSTM-based state encoder.
  • Engineered the multi-objective reward function to balance efficiency and safety.
  • The final system reduced vehicle travel time by up to 59% in simulation.
SustainDC Architecture Diagram

System architecture for the SustainDC benchmark.

SustainDC: A NeurIPS Benchmark

Co-led the creation of an open-source, Gym-compatible benchmark for developing MARL controllers to optimize the energy and carbon footprint of data centers.

My Key Contributions:
  • Co-led the architectural design and open-source implementation.
  • Engineered the Python-based physics models for cooling and power.
LfOD Framework Diagram

Conceptual framework for Learning from Oracle Demonstrations.

Learning from Oracle Demonstrations (LfOD)

Developed a novel Imitation Learning paradigm to accelerate DRL training by using a learned "Oracle" agent to provide corrective demonstrations to the primary agent.

My Key Contributions:
  • Engineered the core LfOD methodology from first principles.
  • Implemented the TD3fOD algorithm to integrate oracle advice.
  • Demonstrated a 5x speedup in training convergence on complex tasks.
Animation showing the CNN prediction

Animation showing the CNN prediction.

3D CNN Surrogate for Accelerating Physics Simulations

Developed a 3D CNN (U-Net) to act as a fast proxy for computationally expensive CFD simulations to predict 3D heat distribution in data centers, achieving a >2800x inference speedup over the original simulator.

My Key Contributions:
  • Evaluated 3D U-Net architectures for spatial heat prediction in data centers.
  • Engineered the data pipeline to process and voxelize raw CFD simulation data.
  • Used the surrogate model to optimize the workload placement using a genetic algorithm, reducing the maximum temperature by 7.7% and the energy consumption by 2.5%.

Core Technical Skills

Reinforcement & Decision Science

  • Foundations: Sequential Decision-Making, MDPs, Multi-Objective Optimization
  • Paradigms: Deep RL, Multi-Agent RL (MARL), Hierarchical RL (HRL), Imitation Learning (IL), Behavioral Cloning (BC), Learning from Demonstrations (LfD)
  • Algorithms: Policy Gradient (PPO, A2C), Value-Based (SAC, TD3, Q-Learning)
  • Techniques: Model-Based RL, Off-Policy Learning, Exploration Strategies (Epsilon-Greedy, Thompson Sampling)
  • Applications: Reward Function Design, Policy Optimization, RLHF

Deep Learning & Generative AI

  • LLM Agents: Agentic Frameworks, Tool Use, Planning, Fine-Tuning (PEFT, LoRA)
  • Architectures: Transformers & Attention, CNNs (U-Net, V-Net), RNNs (LSTM)
  • Generative Techniques: Surrogate Modeling, Data-Driven World Models, Diffusion Models (Conceptual)
  • Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
  • Training Techniques: Transfer Learning, Fine-Tuning, Hyperparameter Optimization
  • Evolutionary & Search Methods: Genetic Algorithms (for optimization), PSO, Bayesian Optimization

High-Performance ML Engineering

  • Distributed Systems: Large-Scale Training (Ray: RLlib, Tune), Parallel Computing
  • Infrastructure: Scalable ML Pipelines, MLOps Concepts, HPC Environments
  • Performance: Model Evaluation & Benchmarking, Debugging, Performance Profiling

Simulation & Embodied AI

  • Environment Development: Digital Twins, World Models (Gymnasium, PettingZoo)
  • Robotics Concepts: Motion & Behavioral Planning, Control Systems, Perception Pipeline
  • Tools & Data: Physics Simulators (SUMO, CARLA), Synthetic Data Generation

Expert In

Python, PyTorch, Ray (RLlib, Tune)

Proficient With

NumPy, Pandas, Scikit-learn, Stable Baselines3, Docker, Git, Linux, C++ (Basic)

Selected Publications

SustainDC: Benchmarking for Sustainable Data Center Control

A. Naug*, A. Guillen-Perez*, R. Luna Gutierrez*, V. Gundecha, et al.

Advances in Neural Information Processing Systems (NeurIPS), 2024

Learning From Oracle Demonstrations—A New Approach to Develop Autonomous Intersection Management...

A. Guillen-Perez, M.D. Cano

IEEE Access, 2022

Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections

A. Guillen-Perez, M.D. Cano

IEEE Transactions on Vehicular Technology, 2022

N-CRITICS: Self-Refinement of Large Language Models with Ensemble of Critics

Sajad Mousavi, Ricardo Luna Gutierrez, A. Guillen-Perez, et al.

NeurIPS 2023 Workshop on Robustness of Foundation Models