unstable-baselines

unstable‑baselines is an experimental, asynchronous, online reinforcement‑learning framework for rapid prototyping of multi‑turn / multi‑agent algorithms on TextArena environments.

The main focus on unstable baselines is to enable fast prototyping/research. For something a bit more production ready we recomment to use oat or verifiers

Work in progress — interfaces will change.

Key Features

Asynchronous collection & learning – actors generate data while learners train.
Multi‑agent, multi‑turn focus with self‑play or fixed opponents.
LoRA‑first fine‑tuning workflow for fast, lightweight updates.
Composable reward transforms at step, final, and sampling stages.

Collaboration

Developed in partnership with PlasticLabs.

Installation

# clone the repo
git clone https://github.com/LeonGuertler/unstable-baselines.git
cd unstable-baselines

# install Python dependencies
pip install -r requirements.txt

# build TextArena v0.6.9 (until it’s on PyPI)
git clone https://github.com/LeonGuertler/TextArena.git
cd TextArena
git checkout v0.6.9
pip install -e .
cd ..

Quick Start

python3 example.py

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
unstable		unstable
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODOs.md		TODOs.md
main.py		main.py
main_multi_gpu.py		main_multi_gpu.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

unstable-baselines

Key Features

Collaboration

Installation

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

LeonGuertler/UnstableBaselines

Folders and files

Latest commit

History

Repository files navigation

unstable-baselines

Key Features

Collaboration

Installation

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages