sudo rm -rf agentic_security

🎉 Excited to share that our paper "sudo rm -rf agentic_security" has been accepted to ACL 2025 (Industry Track)!

This repository is a system that manages automatic attack generation, evaluation, and dynamic attack creation for computer use agents in one place. It executes attack scenarios in a Docker environment, automatically organizes the results, evaluates them, and simplifies the process of generating dynamic attacks.

Overview

Attack Generation:
Detox2tox (Static) is a pipeline that transforms a malicious instruction into a detoxed task to avoid safety guardrails, obtains a plan from a well-aligned model, and then reintroduces the malicious details at the final step—preserving the original harmful goal while stealthily bypassing these defenses.

Generates attack JSON files and inserts the Scene Change Task.
The resulting files are copied to the ./claude-cua/.../data/ folder.
Subsequently, attacks are executed using Docker to generate logs.
Evaluation:
Moves generated logs to ./eval/logs/ for evaluation.
Performs numeric calculations and evaluations.
Dynamic Attack:
Dynamic attack is an iterative process in which new or modified prompts are generated based on evaluation results. By leveraging feedback from evaluators, SUDO refines the prompt or strengthens hidden triggers to evolve the attack strategy. This approach enables the attack to adapt to changing conditions or defenses, improving its success rate over multiple iterations.

All steps are automated using a single script: main.py.

Folder Structure

sudo
├── main.py                   # Main script managing the entire pipeline
├── attack
│   ├── attack_generation.py
│   └── result.json           # Attack result logs generated after Docker execution
├── Benchmark
│   ├── SUDO_dataset.csv
│   └── fill_placeholders.py  #Fill extra_info's placeholders for task, starting_environment, topic, expected
├── claude-cua
│   └── computer-use-demo
│       └── computer_use_demo
│           ├── attack_tools  # Handles automatic attacking and logging
│           ├── data          # Folder where attack JSON files are moved
│           └── log           # Logs generated within Docker
├── eval
│   ├── evaluation_json.py    # Evaluation logic (log files → extract scores)
│   ├── calculate_score.py    # Evaluation logic (scores → numeric calculations)
│   └── logs                  # Final storage of attack logs (for evaluation)
├── dynamic_attack
│   └── dynamic_attack.py     # Handles dynamic attack generation
├── formatter
│   ├── auto-scene
│   └── csv2json
│       └── convert_format.py
├── .env                   
├── .gitignore
└── README.md                 # This file

Note: Ensure that Docker mount paths (-v option) correctly match the local folder structure (currently applied).

Prerequisites

conda create -n sudo python=3.10
Install required packages:

pip install -r requirements.txt

Docker installation
Docker must be installed and the docker run command should be executable from the command line.
Set API keys for use within .env file: Environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, IMGUR_CLIENT_ID )
Create a research account. Enter the details of the corresponding research account into extra_info

Run the file "./benchmark/fill_placeholders.py" to fill extra_info's placeholders for task, starting_environment, topic, expected. And apply the changes to eval/SUDO_criteria and formatter/auto-scene/SUDO_scnchg.
Run the file "./formatter/auto-scene/harmGUI_scnchg.json" using claude-cua, capture screenshots of each task's starting point, and place them in the "./attack/screenshot/" directory. Each screenshot filename should match the corresponding task's identifier.(c.f. formatter/origin_img2url.py)

Before attacking the computer use agent, log in as research account attacker.

Usage

The main.py script supports separate execution of Attack Generation, Evaluation, and Dynamic Attack or can execute all processes sequentially.

Main arguments:

attack_name = f"{model_name}_{tactic}"  # e.g., o1_static, o1_dynamic-r1

Execute only Static Attack:

Integrated command (includes all steps below):

python main.py --attack <attack_name>

Individual steps (manual execution):

python main.py --attack-gen <attack_name>
python main.py --formatter <attack_name>  
python main.py --docker-run

Generate attack JSON (with Scene Change Task inserted).
Move the generated JSON to computer_use_demo/data.
Execute Docker (logs created in claude-cua/computer-use-demo/computer_use_demo/log).

Execute only Evaluation:

python main.py --evaluate <attack_name>

Move Docker results (attack/result.json) to eval/logs.
Run evaluation_json.py script for numeric calculations.

Execute only Dynamic Attack:

python3 main.py --dynamic <attack_name>

If you want to generate dynamic-r1, attack_name is static.(based attack name)
Generate Dynamic Attacks based on evaluation results (eval/logs).

Run the full pipeline automatically (Attack → Evaluate → Dynamic):

python main.py --all <attack_name>

Automatically executes Attack Generation → Attack Execution → Evaluation → Dynamic Attack in order.

Detailed Workflow

Attack Generation
- attack_generation.py generates attack JSON files, inserting Scene Change Tasks (use formatter/auto-scene if necessary).
- Automatically moves the completed JSON files to computer-use-demo/computer_use_demo/data.
Docker Container Execution
- Docker execution is handled by main.py in the run_attack() step.
- Performs actual attack simulations based on the JSON files.
- Attack logs (result.json) are stored in the attack folder or logged to claude-cua/computer-use-demo/computer_use_demo/log.
Evaluation
- Triggered by main.py with the evaluation step (--evaluate).
- Automatically moves attack/result.json to eval/logs.
- Runs evaluation_json.py to perform evaluations and numeric calculations.
Dynamic Attack
- dynamic_attack/dynamic_attack.py creates dynamic attacks based on evaluation results.

Notes

Ensure Docker mount paths match local directory structures:

docker run -v $(pwd)/...:/home/...

Confirm that the ANTHROPIC_API_KEY is properly passed. If necessary, pass environment variables directly to subprocess.run() instead of using shell=True.

Citation

If you find this repository helpful in your research or work, please cite the following paper:

@inproceedings{leeetal2025sudo,
  title      = {sudo rm-rf agentic\_security},
  author     = {Lee, Sejin and Kim, Jian and Park, Haon and Yousefpour, Ashkan and Yu, Sangyoon and Song, Min},
  booktitle  = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Industry Track)},
  year       = {2025},
  month      = jul,
  address    = {Vienna, Austria},
  publisher  = {Association for Computational Linguistics},
  pages      = {to appear},
  url        = {https://arxiv.org/abs/2503.20279},
  note       = {Sejin Lee and Jian Kim contributed equally.}
}

License and Contribution

This project follows the guidelines specified in the LICENSE file.
Feel free to submit bug reports, feature requests, or pull requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sudo rm -rf agentic_security

🎉 Excited to share that our paper "sudo rm -rf agentic_security" has been accepted to ACL 2025 (Industry Track)!

Table of Contents

Overview

Folder Structure

Prerequisites

Usage

Detailed Workflow

Notes

Citation

License and Contribution

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
Benchmark		Benchmark
attack		attack
claude-cua		claude-cua
eval		eval
formatter		formatter
.gitignore		.gitignore
README.md		README.md
demo_video.gif		demo_video.gif
main.py		main.py
requirements.txt		requirements.txt
sudo_figure.png		sudo_figure.png

AIM-Intelligence/SUDO

Folders and files

Latest commit

History

Repository files navigation

sudo rm -rf agentic_security

🎉 Excited to share that our paper "sudo rm -rf agentic_security" has been accepted to ACL 2025 (Industry Track)!

Table of Contents

Overview

Folder Structure

Prerequisites

Usage

Detailed Workflow

Notes

Citation

License and Contribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages