This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
-
Updated
May 29, 2025 - Python
This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
Agentic Safety Framework
an exploration of issues of international social development policy and its operationalization
Add a description, image, and links to the safeguards topic page so that developers can more easily learn about it.
To associate your repository with the safeguards topic, visit your repo's landing page and select "manage topics."