For our environment configuration and required additional packages, please refer to "environment.yml".
Our pipeline for GRPO training is built upon the Open-r1 framework, for which we express our sincere appreciation. For details regarding the experimental environment setup, you might want ot refer to Open-r1.
For models that initially lack any understanding or reasoning ability in the domain (e.g., sports)— such as Qwen2.5 1.5B — knowledge distillation is a necessary prerequisite before applying reinforcement learning. The input data is structured as follows:
[
{
"key": "ORL_CLE_Apr 20, 2024.csv",
"instruction": "You are an assistant for NBA basketball task. We will .....",
"input": "Below is provided win probabilities .... ",
"process": "<think>\nAlright, let's try to figure out .... </think>\n**d**",
"label": "d",
"pred": "d"
},
]
(The 'process' is distilled solution from DeepSeek-R1-Distill-Qwen-32B and 'instruction' is our question.)
To start the warm-up stage (Slurm), use the following command:
sbatch warm-up/run.sh
After the warm-up, we apply GRPO (Generalized Reinforcement with Prompt Optimization), which enables further improvement through self-improvement. The input data consists of only Q&A pairs:
[
{
"instruction": "You are an assistant for NBA basketball task. We will …..",
"input": "Below is provided win probabilities …. ",
"output": "c",
"key": "DEN_OKC_Jan 31, 2024.csv7095"
},
]
To start the GRPO (Slurm), use the following command:
sbatch scripts-sports-nba/nba-GRPO.sh
sbatch scripts-sports-nfl/nfl-GRPO.sh
Our two-stage post-training enables a 1.5B model without domain-specific reasoning ability to outperform its distillation source (DeepSeek-R1-Distill-Qwen-32B) and in some tasks, even approach OpenAI’s o1.
(Rewards With Training Steps can be found in Paper`s Figure 6)
- Fill knowledge gaps and 2. Self-improvement.
run ./script/build.sh and change the conditions in the shell script for different experiments And the prompts will be/have been saved in "./prompt/"
./batch.sh (or ./slow.sh) is the shell built to submit tasks in ./script/ Our main configuration files about LLMs and experiments are located in "./tsllm/config/".
The setup of different LLMs is in ./tsllm/models/
We curate a dataset and propose our benchmark "GAMETime: Generating And Modeling Events from TIME series". This dataset contains a real-valued time series of 1.7 million timestamps along with corresponding event sequences.
Please email the author directly (e.g., [email protected]
). To simplify the process, just send the following:
"Hello, GAMETime."
[Your Name]
We will provide you with a download link, aiming to complete the process within 1 minute.
Or you can find the script in the resource file to download the HTML and extract the data yourself.
If you find our work helpful, please kindly cite as follows:
@article{tan2025inferring,
title={Inferring Events from Time Series using Language Models},
author={Tan, Mingtian and Merrill, Mike A and Gottesman, Zack and Althoff, Tim and Evans, David and Hartvigsen, Tom},
journal={arXiv preprint arXiv:2503.14190},
year={2025}
}