safeguards

Here are 4 public repositories matching this topic...

wisent-ai / wisent-guard

This is an open-source version of the representation engineering framework for stopping harmful outputs or hallucinations on the level of activations. 100% free, self-hosted and open-source.

security ai safeguards explainability llms

Updated May 29, 2025
Python

rishub-tamirisa / tamper-resistance

Star

[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

meta-learning safeguards tamper-resistance llm open-weight

Updated Feb 28, 2025
Python

cirbuk / safeguards

Star

Agentic Safety Framework

python agents safeguards budgets guardrails llm

Updated Apr 23, 2025
Python

aaronkyle / social-development

Star

an exploration of issues of international social development policy and its operationalization

policy international-social-policy social-development social-policy safeguards

Updated Feb 28, 2023
HTML

Improve this page

Add a description, image, and links to the safeguards topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the safeguards topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

safeguards

Here are 4 public repositories matching this topic...

wisent-ai / wisent-guard

rishub-tamirisa / tamper-resistance

cirbuk / safeguards

aaronkyle / social-development

Improve this page

Add this topic to your repo