semcache
is a semantic caching layer for your LLM applications.
Start the Semcache Docker image:
docker run -p 8080:8080 semcache/semcache:latest
Configure your application e.g with the OpenAI Python SDK:
from openai import OpenAI
# Point to your Semcache host instead of OpenAI
client = OpenAI(base_url="http://localhost:8080", api_key="your-key")
# Cache miss - continues to OpenAI
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
# Cache hit - returns instantly
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me France's capital city"}]
)
Node.js follows a similar pattern of changing the base URL to point to your Semcache host:
const OpenAI = require('openai');
// Point to your Semcache host instead of OpenAI
const openai = new OpenAI({baseURL: 'http://localhost:8080', apiKey: 'your-key'});
- 🧠 Completely in-memory - Prompts, responses and the vector database are stored in-memory
- 🎯 Flexible by design - Can work with your custom or private LLM APIs
- 🔌 Support for major LLM APIs - OpenAI, Anthropic, Gemini, and more
- ⚡ HTTP proxy mode - Drop-in replacement that reduces costs and latency
- 📈 Prometheus metrics - Full observability out of the box
- 📊 Build-in dashboard - Monitor cache performance at
/admin
- 📤 Smart eviction - LRU cache eviction policy
Semcache is still in beta and being actively developed.
Semcache accelerates LLM applications by caching responses based on semantic similarity.
When you make a request Semcache first searches for previously cached answers to similar prompts and delivers them immediately. This eliminates redundant API calls, reducing both latency and costs.
Semcache also operates in a "cache-aside" mode, allowing you to load prompts and responses yourself.
For comprehensive provider configuration and detailed code examples, visit our LLM Providers & Tools documentation.
Point your existing SDK to Semcache instead of the provider's endpoint.
OpenAI
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080", api_key="your-key")
Anthropic
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080", # Semcache endpoint
api_key="your-key"
)
LangChain
from langchain.llms import OpenAI
llm = OpenAI(
openai_api_base="http://localhost:8080",
openai_api_key="your-key"
)
LiteLLM
import litellm
litellm.api_base = "http://localhost:8080"
Install with:
pip install semcache
from semcache import Semcache
# Initialize the client
client = Semcache(base_url="http://localhost:8080")
# Store a key-data pair
client.put("What is the capital of France?", "Paris")
# Retrieve data by semantic similarity
response = client.get("Tell me France's capital city.")
print(response) # "Paris"
Configure via environment variables or config.yaml
:
log_level: info
port: 8080
Environment variables (prefix with SEMCACHE_
):
SEMCACHE_PORT=8080
SEMCACHE_LOG_LEVEL=debug
Semcache emits comprehensive Prometheus metrics for production monitoring.
Check out our /monitoring
directory for our custom Grafana dashboard.
Access the admin dashboard at /admin
to monitor cache performance.
Our managed version of Semcache provides you with semantic caching as a service.
Features we offer:
- Custom text embedding models for your specific business
- Persistent storage allowing you to build application memory over time
- In-depth analysis of your LLM responses
- SLA support and dedicated engineering resources
Contact us at [email protected]
Interested in contributing? Contributions to Semcache are welcome! Feel free to make a PR.
Built with ❤️ in Rust • Documentation • GitHub Issues