Skip to content

LeonGuertler/UnstableBaselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unstable-baselines

Status TextArena Discord

unstable‑baselines is an experimental, asynchronous, online reinforcement‑learning framework for rapid prototyping of multi‑turn / multi‑agent algorithms on TextArena environments.

The main focus on unstable baselines is to enable fast prototyping/research. For something a bit more production ready we recomment to use oat or verifiers

Work in progress — interfaces will change.


Key Features

  • Asynchronous collection & learning – actors generate data while learners train.
  • Multi‑agent, multi‑turn focus with self‑play or fixed opponents.
  • LoRA‑first fine‑tuning workflow for fast, lightweight updates.
  • Composable reward transforms at step, final, and sampling stages.

Collaboration

Developed in partnership with PlasticLabs.

Installation

# clone the repo
git clone https://github.com/LeonGuertler/unstable-baselines.git
cd unstable-baselines

# install Python dependencies
pip install -r requirements.txt

# build TextArena v0.6.9 (until it’s on PyPI)
git clone https://github.com/LeonGuertler/TextArena.git
cd TextArena
git checkout v0.6.9
pip install -e .
cd ..

Quick Start

python3 example.py

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages