Recent News & Highlights

  • [Aug 2025] New Research Stream: Advancing Autonomous Driving AI.
    • Developed "Efficient Virtuoso," a state-of-the-art latent diffusion Transformer for goal-conditioned trajectory planning, achieving minADE of 0.25 on Waymo Open Motion Dataset. [Paper] [GitHub]
    • Submitted new research on "Mining the Long Tail" for robust Offline RL in AVs to arXiv, demonstrating significant safety improvements via data curation. [Paper] [GitHub]
    • Submitted new research on "From Imitation to Optimization" for Offline RL in AVs to arXiv, demonstrating 3.2x higher success rates than BC baselines. [Paper] [GitHub]
  • [Early 2025] Initiated a new research stream at HPE exploring LLM-based agents for dynamic system control.
  • [Nov 2024] Our work on Hierarchical RL, GreenDCC, accepted to the AAAI 2025 Demonstration Track.
  • [Oct 2024] Our MARL benchmark, SustainDC, accepted to the NeurIPS 2024 Datasets and Benchmarks Track.
  • [May 2024] Filed a new U.S. patent on "Real-Time Carbon Footprint Reduction Controller".
  • [Dec 2023] Received the Best ML Innovation Award at the NeurIPS 2023 Climate Change AI Workshop.
  • [Sep 2022] Joined HPE AI Labs as an AI Research/Applied Scientist.
  • [Jun 2022] Awarded Ph.D. in Computer Science with Cum Laude distinction.

About Me

I am a Ph.D. AI Research Scientist and Engineer focused on creating the intelligent systems and foundational models needed for robust, general-purpose autonomy. My work is built on the following principles:

  • Scientific Foundation: My research centers on Reinforcement Learning (RL), Multi-Agent Systems, and Imitation Learning to solve complex coordination and control problems, resulting in novel algorithms like LfOD and publications at venues like NeurIPS and AAAI.
  • Engineering Execution: I architect and implement the necessary infrastructure to bring research to life, from large-scale, open-source simulations (SustainDC) to distributed training pipelines (Ray/RLlib) with hundreds of parallel workers.
  • Future Focus: My recent work explores the frontier of LLM-based agents, using fine-tuning (LoRA) and novel refinement techniques to build more capable and adaptable decision-makers for real-world systems.

Featured Projects

Qualitative comparison of goal representations in diffusion model

Our Sparse Route model (right) generates highly unbiased precise trajectories, outperforming other goal representations.

Qualitative comparison of goal representations in a challenging turning scenario. (Click to expand)

Efficient Virtuoso: Latent Diffusion Transformer for Trajectory Planning

Developed a state-of-the-art conditional latent diffusion model for goal-conditioned trajectory planning, achieving a \textbf{minADE of 0.25} on Waymo Open Motion Dataset. Introduced novel normalization and provided key insights into optimal goal representation for AVs.

My Key Contributions:
  • Pioneered a two-stage normalization pipeline for stable latent diffusion training.
  • Designed a Transformer-based StateEncoder for rich scene context fusion.
  • Conducted rigorous ablation on goal representation, proving multi-step routes are critical for tactical precision.
  • Achieved state-of-the-art minADE of 0.25 on WOMD.
Baseline CQL Agent failing a merge

Baseline CQL: Collision in merge scenario.

Heuristic Agent suboptimal merge

Heuristic-Weighted: Suboptimal, reactive merge.

Uncertainty Agent successful merge

Uncertainty-Weighted: Proactive, successful merge.

Qualitative comparison of data curation strategies in a challenging highway merge.

Mining the Long Tail: Data Curation for Robust Offline RL in AVs

Systematically investigated six data curation strategies (heuristic, uncertainty, behavior-based) to tackle the long-tail problem in autonomous driving, achieving nearly a three-fold reduction in collision rate with uncertainty-based methods.

My Key Contributions:
  • Developed novel, data-driven criticality metrics (e.g., model disagreement via ensemble scouts) for non-uniform data sampling.
  • Designed specialized PyTorch `Dataset` implementations for timestep and scenario-level weighting.
  • Conducted large-scale comparative study demonstrating all curation methods significantly outperform uniform sampling.
BC-S Agent failing to control

BC-S (MLP): Fails due to complex interactions.

BC-T Agent circling failure

BC-T (Transformer): Brittle, leads to "circling" failure.

CQL Agent robust control success

CQL (Offline RL): Robust recovery, successfully navigates.

Qualitative comparison of Behavioral Cloning baselines vs. robust Offline RL (CQL) agent. (Click to expand)

From Imitation to Optimization: Offline Learning for Autonomous Driving

Pioneered an end-to-end pipeline applying state-of-the-art Offline Reinforcement Learning (CQL) to the Waymo Open Motion Dataset, demonstrating significantly superior robustness over Behavioral Cloning baselines for long-horizon AV control.

My Key Contributions:
  • Engineered a robust, parallelized data processing pipeline for the Waymo Open Motion Dataset.
  • Conducted rigorous comparative study, demonstrating CQL's 3.2x higher success rate and 7.4x lower collision rate over Transformer-based BC.
  • Designed an effective multi-objective reward function for Offline RL training in autonomous driving.
Animation of autonomous vehicles at an intersection

MARL agents coordinating to cross an intersection safely.

Autonomous Intersection Management

Designed and implemented a MARL system where autonomous vehicles learn to coordinate and safely cross intersections without traffic lights, significantly improving traffic flow.

My Key Contributions:
  • Designed the end-to-end MARL system architecture, including a novel LSTM-based state encoder.
  • Engineered the multi-objective reward function to balance efficiency and safety.
  • The final system reduced vehicle travel time by up to 59% in simulation.
SustainDC Architecture Diagram

System architecture for the SustainDC benchmark.

SustainDC: A NeurIPS Benchmark

Co-led the creation of an open-source, Gym-compatible benchmark for developing MARL controllers to optimize the energy and carbon footprint of data centers.

My Key Contributions:
  • Co-led the architectural design and open-source implementation.
  • Engineered the Python-based physics models for cooling and power.
LfOD Framework Diagram

Conceptual framework for Learning from Oracle Demonstrations.

Learning from Oracle Demonstrations (LfOD)

Developed a novel Imitation Learning paradigm to accelerate DRL training by using a learned "Oracle" agent to provide corrective demonstrations to the primary agent.

My Key Contributions:
  • Engineered the core LfOD methodology from first principles.
  • Implemented the TD3fOD algorithm to integrate oracle advice.
  • Demonstrated a 5x speedup in training convergence on complex tasks.
Animation showing the CNN prediction

Animation showing the CNN prediction.

3D CNN Surrogate for Accelerating Physics Simulations

Developed a 3D CNN (U-Net) to act as a fast proxy for computationally expensive CFD simulations to predict 3D heat distribution in data centers, achieving a >2800x inference speedup over the original simulator.

My Key Contributions:
  • Evaluated 3D U-Net architectures for spatial heat prediction in data centers.
  • Engineered the data pipeline to process and voxelize raw CFD simulation data.
  • Used the surrogate model to optimize the workload placement using a genetic algorithm, reducing the maximum temperature by 7.7% and the energy consumption by 2.5%.

Core Technical Skills

Reinforcement & Decision Science

  • Foundations: Sequential Decision-Making, MDPs, Multi-Objective Optimization, Credit Assignment
  • Paradigms: Deep RL, Multi-Agent RL (MARL), Hierarchical RL (HRL), Imitation Learning (IL), Behavioral Cloning (BC), Learning from Demonstrations (LfD), Offline Reinforcement Learning (Offline RL)
  • Algorithms: Policy Gradient (PPO, A2C), Value-Based (SAC, TD3, Q-Learning, Conservative Q-Learning - CQL)
  • Techniques: Model-Based RL, Off-Policy Learning, Exploration Strategies, Reward Function Design & Shaping, Policy Optimization, RLHF, Data-Centric RL (Curation, Sampling, Weighting)
  • Applications: Autonomous Driving Planning, Behavior Prediction, Robot Control, System Optimization

Deep Learning & Generative AI

  • LLM Agents: Agentic Frameworks, Tool Use, Planning, Fine-Tuning (PEFT, LoRA)
  • Architectures: Transformers & Attention, CNNs (U-Net, V-Net), RNNs (LSTM)
  • Generative Techniques: Surrogate Modeling, Data-Driven World Models, Diffusion Models (Conceptual)
  • Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
  • Training Techniques: Transfer Learning, Fine-Tuning, Hyperparameter Optimization
  • Evolutionary & Search Methods: Genetic Algorithms (for optimization), PSO, Bayesian Optimization

High-Performance ML Engineering

  • Distributed Systems: Large-Scale Training (Ray: RLlib, Tune), Parallel Computing, Distributed Data Processing
  • Infrastructure: Scalable ML Pipelines, MLOps Concepts, HPC Environments, High-Throughput Data Loaders
  • Performance: Model Evaluation & Benchmarking, Debugging Large ML Codebases, Performance Profiling

Simulation & Embodied AI

  • Environment Development: Digital Twins, World Models, (Gymnasium, PettingZoo), Waymax Simulator
  • Robotics Concepts: Motion & Behavioral Planning, Control Systems, Perception Pipeline, Trajectory Prediction, Safety & Robustness
  • Tools & Data: Physics Simulators (SUMO, CARLA), Waymo Open Motion Dataset (WOMD), Synthetic Data Generation, Real-World Data Integration, Large-Scale Datasets

Expert In

Python, PyTorch, Ray (RLlib, Tune)

Proficient With

NumPy, Pandas, Scikit-learn, Stable Baselines3, Docker, Git, Linux, Waymo Open Motion Dataset (WOMD), Waymax, C++ (Basic)

Selected Publications

Efficient Virtuoso: A Latent Diffusion Transformer Model for Goal-Conditioned Trajectory Planning

A. Guillen-Perez

arXiv preprint, August 2025

Mining the Long Tail: A Comparative Study of Data-Centric Criticality Metrics for Robust Offline Reinforcement Learning in Autonomous Motion Planning

A. Guillen-Perez

arXiv preprint, Aug 2025

From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving

A. Guillen-Perez

arXiv preprint, July 2025

SustainDC: Benchmarking for Sustainable Data Center Control

A. Naug*, A. Guillen-Perez*, R. Luna Gutierrez*, V. Gundecha, et al.

Advances in Neural Information Processing Systems (NeurIPS), 2024

Learning From Oracle Demonstrations—A New Approach to Develop Autonomous Intersection Management...

A. Guillen-Perez, M.D. Cano

IEEE Access, 2022

Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections

A. Guillen-Perez, M.D. Cano

IEEE Transactions on Vehicular Technology, 2022

N-CRITICS: Self-Refinement of Large Language Models with Ensemble of Critics

Sajad Mousavi, Ricardo Luna Gutierrez, A. Guillen-Perez, et al.

NeurIPS 2023 Workshop on Robustness of Foundation Models