Skip to content

Commit d961991

Browse files
committed
Add cache documentation
1 parent 02ea764 commit d961991

File tree

1 file changed

+169
-4
lines changed

1 file changed

+169
-4
lines changed

docs/modules/ROOT/pages/ai-services.adoc

Lines changed: 169 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,174 @@ quarkus.langchain4j.openai.m1.api-key=sk-...
171171
quarkus.langchain4j.huggingface.m2.api-key=sk-...
172172
----
173173

174+
[#cache]
175+
== Configuring the Cache
176+
177+
If necessary, a semantic cache can be enabled to maintain a fixed number of questions and answers previously asked to the LLM, thus reducing the number of API calls.
178+
179+
The `@CacheResult` annotation enables semantic caching and can be used at the class or method level. When used at the class level, it indicates that all methods of the AiService will perform a cache lookup before making a call to the LLM. This approach provides a convenient way to enable the caching for all methods of a `@RegisterAiService`.
180+
181+
[source,java]
182+
----
183+
@RegisterAiService
184+
@CacheResult
185+
@SystemMessage("...")
186+
public interface LLMService {
187+
// Cache is enabled for all methods
188+
...
189+
}
190+
191+
----
192+
193+
On the other hand, using `@CacheResult` at the method level allows fine-grained control over where the cache is enabled.
194+
195+
[source,java]
196+
----
197+
@RegisterAiService
198+
@SystemMessage("...")
199+
public interface LLMService {
200+
201+
@CacheResult
202+
@UserMessage("...")
203+
public String method1(...); // Cache is enabled for this method
204+
205+
@UserMessage("...")
206+
public String method2(...); // Cache is not enabled for this method
207+
}
208+
209+
----
210+
211+
[IMPORTANT]
212+
====
213+
Each method annotated with `@CacheResult` will have its own cache shared by all users.
214+
====
215+
216+
=== Cache properties
217+
218+
The following properties can be used to customize the cache configuration:
219+
220+
- `quarkus.langchain4j.cache.threshold`: Specifies the threshold used during semantic search to determine whether a cached result should be returned. This threshold defines the similarity measure between new queries and cached entries. (`default 1`)
221+
- `quarkus.langchain4j.cache.max-size`: Sets the maximum number of messages to cache. This property helps control memory usage by limiting the size of each cache. (`default 10`)
222+
- `quarkus.langchain4j.cache.ttl`: Defines the time-to-live for messages stored in the cache. Messages that exceed the TTL are automatically removed. (`default 5m`)
223+
- `quarkus.langchain4j.cache.embedding.name`: Specifies the name of the embedding model to use.
224+
- `quarkus.langchain4j.cache.embedding.query-prefix`: Adds a prefix to each "query" value before performing the embedding operation.
225+
- `quarkus.langchain4j.cache.embedding.response-prefix`: Adds a prefix to each "response" value before performing the embedding operation.
226+
227+
By default, the cache uses the default embedding model provided by the LLM. If there are multiple embedding providers, the `quarkus.langchain4j.cache.embedding.name` property can be used to choose which one to use.
228+
229+
In the following example, there are two different embedding providers
230+
231+
`pom.xml`:
232+
233+
[source,xml,subs=attributes+]
234+
----
235+
...
236+
<dependencies>
237+
<dependency>
238+
<groupId>io.quarkiverse.langchain4j</groupId>
239+
<artifactId>quarkus-langchain4j-openai</artifactId>
240+
<version>{project-version}</version>
241+
</dependency>
242+
<dependency>
243+
<groupId>io.quarkiverse.langchain4j</groupId>
244+
<artifactId>quarkus-langchain4j-watsonx</artifactId>
245+
<version>{project-version}</version>
246+
</dependency>
247+
<dependencies>
248+
...
249+
----
250+
251+
`application.properties`:
252+
253+
[source,properties]
254+
----
255+
# OpenAI configuration
256+
quarkus.langchain4j.service1.chat-model.provider=openai
257+
quarkus.langchain4j.service1.embedding-model.provider=openai
258+
quarkus.langchain4j.openai.service1.api-key=sk-...
259+
260+
# Watsonx configuration
261+
quarkus.langchain4j.service2.chat-model.provider=watsonx
262+
quarkus.langchain4j.service2.embedding-model.provider=watsonx
263+
quarkus.langchain4j.watsonx.service2.base-url=...
264+
quarkus.langchain4j.watsonx.service2.api-key=...
265+
quarkus.langchain4j.watsonx.service2.project-id=...
266+
quarkus.langchain4j.watsonx.service2.embedding-model.model-id=...
267+
268+
# The cache will use the embedding model provided by watsonx
269+
quarkus.langchain4j.cache.embedding.name=service2
270+
----
271+
272+
When an xref:in-process-embedding.adoc[in-process embedding model] must to be used:
273+
274+
`pom.xml`:
275+
276+
[source,xml,subs=attributes+]
277+
----
278+
...
279+
<dependencies>
280+
<dependency>
281+
<groupId>io.quarkiverse.langchain4j</groupId>
282+
<artifactId>quarkus-langchain4j-openai</artifactId>
283+
<version>{project-version}</version>
284+
</dependency>
285+
<dependency>
286+
<groupId>io.quarkiverse.langchain4j</groupId>
287+
<artifactId>quarkus-langchain4j-watsonx</artifactId>
288+
<version>{project-version}</version>
289+
</dependency>
290+
<dependency>
291+
<groupId>dev.langchain4j</groupId>
292+
<artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
293+
<version>0.31.0</version>
294+
<exclusions>
295+
<exclusion>
296+
<groupId>dev.langchain4j</groupId>
297+
<artifactId>langchain4j-core</artifactId>
298+
</exclusion>
299+
</exclusions>
300+
</dependency>
301+
<dependencies>
302+
...
303+
----
304+
305+
`application.properties`:
306+
307+
[source,properties]
308+
----
309+
# OpenAI configuration
310+
quarkus.langchain4j.service1.chat-model.provider=openai
311+
quarkus.langchain4j.service1.embedding-model.provider=openai
312+
quarkus.langchain4j.openai.service1.api-key=sk-...
313+
314+
# Watsonx configuration
315+
quarkus.langchain4j.service2.chat-model.provider=watsonx
316+
quarkus.langchain4j.service2.embedding-model.provider=watsonx
317+
quarkus.langchain4j.watsonx.service2.base-url=...
318+
quarkus.langchain4j.watsonx.service2.api-key=...
319+
quarkus.langchain4j.watsonx.service2.project-id=...
320+
quarkus.langchain4j.watsonx.service2.embedding-model.model-id=...
321+
322+
# The cache will use the in-process embedding model AllMiniLmL6V2EmbeddingModel
323+
quarkus.langchain4j.embedding-model.provider=dev.langchain4j.model.embedding.AllMiniLmL6V2EmbeddingModel
324+
----
325+
326+
=== Advanced usage
327+
The `cacheProviderSupplier` attribute of the `@RegisterAiService` annotation enables configuring the `AiCacheProvider`. The default value of this annotation is `RegisterAiService.BeanAiCacheProviderSupplier.class` which means that the AiService will use whatever `AiCacheProvider` bean is configured by the application or the default one provided by the extension.
328+
329+
The extension provides a default implementation of `AiCacheProvider` which does two things:
330+
331+
* It uses whatever bean `AiCacheStore` bean is configured, as the cache store. The default implementation is `InMemoryAiCacheStore`.
332+
** If the application provides its own `AiCacheStore` bean, that will be used instead of the default `InMemoryAiCacheStore`.
333+
334+
* It leverages the available configuration options under `quarkus.langchain4j.cache` to construct the `AiCacheProvider`.
335+
** The default configuration values result in the usage of `FixedAiCache` with a size of ten.
336+
337+
[source,java]
338+
----
339+
@RegisterAiService(cacheProviderSupplier = CustomAiCacheProvider.class)
340+
----
341+
174342
[#memory]
175343
== Configuring the Context (Memory)
176344

@@ -288,10 +456,7 @@ This guidance aims to cover all crucial aspects of designing AI services with Qu
288456
By default, @RegisterAiService annotated interfaces don't moderate content. However, users can opt in to having the LLM moderate
289457
content by annotating the method with `@Moderate`.
290458

291-
For moderation to work, the following criteria need to be met:
292-
293-
* A CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box)
294-
* The interface must be configured with `@RegisterAiService(moderationModelSupplier = RegisterAiService.BeanModerationModelSupplier.class)`
459+
For moderation to work, a CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box).
295460

296461
=== Advanced usage
297462
An alternative to providing a CDI bean is to configure the interface with `@RegisterAiService(moderationModelSupplier = MyCustomSupplier.class)`

0 commit comments

Comments
 (0)