This repository contains the code and experiments developed for the Master's Thesis project:
“A Safe Soft Actor-Critic Controller for Power Systems”
Department of Computer, Control and Management Engineering
Sapienza University of Rome
Academic Year: 2023–2024
Candidate: Emanuele De Bianchi
Advisor: Alessandro Giuseppi
Co-advisor: Danilo Menegatti
This project explores the use of reinforcement learning (RL) to design a controller capable of stabilizing a power system under cyber-physical attacks while guaranteeing safety at all times. It builds upon the Soft Actor-Critic (SAC) algorithm, which is modified to incorporate Control Barrier Functions (CBFs) to enforce safety constraints during learning and execution.
- Stabilize the power system under normal and perturbed (attack) conditions.
- Ensure system safety is maintained throughout training and deployment.
- Linearized model of a power system (e.g., WSCC 9-bus system).
- Attacks modeled as exogenous disturbances with limited information.
- Standard RL methods may violate safety constraints during exploration.
- Need for fast, model-free control that is safe by design.
- Actor Network: Proposes actions based on observations.
- Critic Network: Evaluates value of actions.
- Safety Layer (CBF): Modifies actions to enforce safety constraints via a Quadratic Program (QP) solver.
- Represent safety constraints as smooth state-dependent functions, called Control Barrier Functions (CBFs).
- Define a safe set in the state space.
- Correct actions by solving a QP to stay within this set.
- SAC with modified entropy regularization and off-policy updates.
- Safety-corrected actions during training and evaluation.
- Gradient backpropagation through the QP solution to enable safe learning.
- Attack quickly learned through a Gaussian Process (GP) Regressor in the early stages of training.
📄 Citation
Emam, Y., Notomista, G., Glotfelter, P., Kira, Z., Egerstedt, M. (2022).
Safe Reinforcement Learning Using Robust Control Barrier Functions.
IEEE Robotics and Automation Letters PP(99):1-8
- Toy example for interpretability and debugging.
- WSCC 9-bus system for realistic power system dynamics.
- Non-destabilizing and destabilizing cyber-physical disturbances.
- Low disclosure but high disruption power.
- Safe-SAC outperforms a naive safe SAC in maintaining safety, while vanilla SAC fails to guarantee the safety.
- Learns a stable policy under perturbations.
- Robust to disturbances without prior attack model knowledge.
- Demonstrates no training violations when CBF correction is active.
- Python 3.8+
- PyTorch
- NumPy, SciPy, Matplotlib
- Gymnasium
- stable-baselines3 (for pytorch standard SAC implementation)
- cvxpylayers (for differentiable QP solver)
- gpytorch (for GP Regressor)
Install dependencies:
pip install -r requirements.txt
- Scripts are provided to reproduce all figures from the thesis.
- Random seeds can be set for reproducibility.
- Environments are modular and extendable.