What's Changed
New Features 🎉
- enable together models and reasoning models as judges. by @JoelNiklaus in #537
- Propagate vLLM batch size controls by @alvin319 in #588
- Integrate huggingface_hub inference support for LLM as Judge by @alozowski in #651
- add cot_prompt in vllm by @HERIUN in #654
- Unify modelargs and use Pydantic for model configs by @NathanHB in #609
- Improve test by @qubvel in #674
- adds wandb loging of metrics by @NathanHB in #676
- Adds wanddb logging by @NathanHB in #685
- Added custom model inference. by @JoelNiklaus in #437
- Update split iteration for DynamicBatchingDataset by @qubvel in #684
Documentation 📚
- Add --use-chat-template to the broken litellm example by @eldarkurtic in #614
- Lighteval math by @HERIUN in #630
- Update quicktour command by @qubvel in #679
- fix wrong 'custom_task_directory' in python api doc by @xgwang in #671
- docs: improve consistency in punctuation of metric list by @mariagrandury in #605
New Tasks 📈
- add arc agi 2 by @NathanHB in #642
- Add G-Pass@k Metric by @jnanliu in #589
- adds simpleqa by @NathanHB in #680
Task and Metrics changes 🛠️
- Pass At K Math by @clefourrier in #647
- Use
n=16
samples to estimatepass@1
for AIME benchmarks by @lewtun in #661 - adding uzbek literals by @shopulatov in #664
- Align AIME pass@1 with literature by @lewtun in #666
- Update LCB prompt & fix newlines by @rawsh in #645
- fix gsm8k metric by @NathanHB in #688
- Add pass@1 for GPQA-D and MATH-500 by @lewtun in #698
Bug Fixes 🐛
- Use
blfoat16
as default for vllm models. by @NathanHB in #638 - Fix passing of generation config to main_accelerate by @LoserCheems in #659
- Parse seed for vLLM by @eldarkurtic in #602
- Parse string values for add_special_tokens in vLLM by @eldarkurtic in #598
- hardcode configs to not make lighteval crash if lcb repo unavailable by @NathanHB in #677
- tokenizer 'padding' param is not correct. by @xgwang in #669
- Fix TransformersModel.from_model() method by @Vectorrent in #691
- Inference providers by @clefourrier in #701
New Contributors
- @DerekLiu35 made their first contribution in #620
- @AnikiFan made their first contribution in #610
- @alvin319 made their first contribution in #588
- @alozowski made their first contribution in #643
- @Laz4rz made their first contribution in #613
- @shopulatov made their first contribution in #664
- @HERIUN made their first contribution in #654
- @rawsh made their first contribution in #645
- @qubvel made their first contribution in #674
- @xgwang made their first contribution in #669
- @jnanliu made their first contribution in #589
- @Vectorrent made their first contribution in #683
- @omahs made their first contribution in #702
Full Changelog: v0.8.0...v0.9.0