You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If necessary, a semantic cache can be enabled to maintain a fixed number of questions and answers previously asked to the LLM, thus reducing the number of API calls.
178
+
179
+
The `@CacheResult` annotation enables semantic caching and can be used at the class or method level. When used at the class level, it indicates that all methods of the AiService will perform a cache lookup before making a call to the LLM. This approach provides a convenient way to enable the caching for all methods of a `@RegisterAiService`.
180
+
181
+
[source,java]
182
+
----
183
+
@RegisterAiService
184
+
@CacheResult
185
+
@SystemMessage("...")
186
+
public interface LLMService {
187
+
// Cache is enabled for all methods
188
+
...
189
+
}
190
+
191
+
----
192
+
193
+
On the other hand, using `@CacheResult` at the method level allows fine-grained control over where the cache is enabled.
194
+
195
+
[source,java]
196
+
----
197
+
@RegisterAiService
198
+
@SystemMessage("...")
199
+
public interface LLMService {
200
+
201
+
@CacheResult
202
+
@UserMessage("...")
203
+
public String method1(...); // Cache is enabled for this method
204
+
205
+
@UserMessage("...")
206
+
public String method2(...); // Cache is not enabled for this method
207
+
}
208
+
209
+
----
210
+
211
+
[IMPORTANT]
212
+
====
213
+
Each method annotated with `@CacheResult` will have its own cache shared by all users.
214
+
====
215
+
216
+
=== Cache properties
217
+
218
+
The following properties can be used to customize the cache configuration:
219
+
220
+
- `quarkus.langchain4j.cache.threshold`: Specifies the threshold used during semantic search to determine whether a cached result should be returned. This threshold defines the similarity measure between new queries and cached entries. (`default 1`)
221
+
- `quarkus.langchain4j.cache.max-size`: Sets the maximum number of messages to cache. This property helps control memory usage by limiting the size of each cache. (`default 10`)
222
+
- `quarkus.langchain4j.cache.ttl`: Defines the time-to-live for messages stored in the cache. Messages that exceed the TTL are automatically removed. (`default 5m`)
223
+
- `quarkus.langchain4j.cache.embedding.name`: Specifies the name of the embedding model to use.
224
+
- `quarkus.langchain4j.cache.embedding.query-prefix`: Adds a prefix to each "query" value before performing the embedding operation.
225
+
- `quarkus.langchain4j.cache.embedding.response-prefix`: Adds a prefix to each "response" value before performing the embedding operation.
226
+
227
+
By default, the cache uses the default embedding model provided by the LLM. If there are multiple embedding providers, the `quarkus.langchain4j.cache.embedding.name` property can be used to choose which one to use.
228
+
229
+
In the following example, there are two different embedding providers
The `cacheProviderSupplier` attribute of the `@RegisterAiService` annotation enables configuring the `AiCacheProvider`. The default value of this annotation is `RegisterAiService.BeanAiCacheProviderSupplier.class` which means that the AiService will use whatever `AiCacheProvider` bean is configured by the application or the default one provided by the extension.
328
+
329
+
The extension provides a default implementation of `AiCacheProvider` which does two things:
330
+
331
+
* It uses whatever bean `AiCacheStore` bean is configured, as the cache store. The default implementation is `InMemoryAiCacheStore`.
332
+
** If the application provides its own `AiCacheStore` bean, that will be used instead of the default `InMemoryAiCacheStore`.
333
+
334
+
* It leverages the available configuration options under `quarkus.langchain4j.cache` to construct the `AiCacheProvider`.
335
+
** The default configuration values result in the usage of `FixedAiCache` with a size of ten.
@@ -288,10 +456,7 @@ This guidance aims to cover all crucial aspects of designing AI services with Qu
288
456
By default, @RegisterAiService annotated interfaces don't moderate content. However, users can opt in to having the LLM moderate
289
457
content by annotating the method with `@Moderate`.
290
458
291
-
For moderation to work, the following criteria need to be met:
292
-
293
-
* A CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box)
294
-
* The interface must be configured with `@RegisterAiService(moderationModelSupplier = RegisterAiService.BeanModerationModelSupplier.class)`
459
+
For moderation to work, a CDI bean for `dev.langchain4j.model.moderation.ModerationModel` must be configured (the `quarkus-langchain4j-openai` and `quarkus-langchain4j-azure-openai` provide one out of the box).
295
460
296
461
=== Advanced usage
297
462
An alternative to providing a CDI bean is to configure the interface with `@RegisterAiService(moderationModelSupplier = MyCustomSupplier.class)`
0 commit comments