Skip to content

AbstractJsonExtractorOutputGuardrail behavior #1417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lpedrov opened this issue Apr 9, 2025 · 11 comments
Open

AbstractJsonExtractorOutputGuardrail behavior #1417

lpedrov opened this issue Apr 9, 2025 · 11 comments

Comments

@lpedrov
Copy link

lpedrov commented Apr 9, 2025

The AbstractJsonExtractorOutputGuardrail class injects the JsonGuardrailsUtils class, which uses the ObjectMapper without any kind of configuration. As a result, any DTO will always be considered valid as long as the JSON itself is valid.
Is this the expected behavior?

Shouldn't the ObjectMapper be configured with FAIL_ON_UNKNOWN_PROPERTIES set to true?

new ObjectMapper() .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, true) .readValue(responseFromLLM.text(), ExampleDto.class);

@geoand
Copy link
Collaborator

geoand commented Apr 9, 2025

Good question. @cescoffier @mariofusco WDYT?

@cescoffier
Copy link
Collaborator

Hum, I would have used the Quarkus mapper. Why don't we do so?

Anyway, yes, I agree.

@geoand
Copy link
Collaborator

geoand commented Apr 9, 2025

Yeah, probably just an oversight. But even so, I am not sure the problem here would be addressed (and TBH, I am still not convinced it should be, because do we honestly want to fail if the LLM returns more fields than we expected?)

@cescoffier
Copy link
Collaborator

cescoffier commented Apr 9, 2025 via email

@geoand
Copy link
Collaborator

geoand commented Apr 9, 2025

I personally think that what is proposed here should not be done (or if it is, should not be the default) as we have no way of knowing whether an LLM will decide to include additional fields that might be totally irrelevant.

@lpedrov
Copy link
Author

lpedrov commented Apr 9, 2025

I understand. But then the purpose of the class might be misleading, because in reality it’s only validating the JSON. And it seemingly appears to be redundant, since—as documented—this can be done in a simpler way.

@mariofusco
Copy link
Contributor

I personally think that what is proposed here should not be done (or if it is, should not be the default) as we have no way of knowing whether an LLM will decide to include additional fields that might be totally irrelevant.

I agree on this, I would at least keep the current behavior as default. We could make this configurable, or allow to plug your own ObjectMapper.

@mariofusco
Copy link
Contributor

I understand. But then the purpose of the class might be misleading, because in reality it’s only validating the JSON. And it seemingly appears to be redundant, since—as documented—this can be done in a simpler way.

The AbstractJsonExtractorOutputGuardrail is an utility class. It's perfectly fine if you want to implement something similar in your own OutputGuardrail with less, more or different features. I disagree on the fact that it is totally trivial or redundant as you wrote, even because it also programmatically tries to recover from a quite common hallucination where the LLM responds with a valid json, but prepend or append to it some explanation on how it generated that json, thus breaking the parser.

@lpedrov
Copy link
Author

lpedrov commented Apr 9, 2025

When I read the documentation, and without seeing the class source code, the first thing I thought was that it would validate both the JSON and the deserialization.
After running a couple of tests, I realized that wasn’t exactly the case.
But I agree - if a custom ObjectMapper could be used, that would be fantastic.

@geoand
Copy link
Collaborator

geoand commented Apr 9, 2025

It definitely could be made so

@mariofusco
Copy link
Contributor

When I read the documentation, and without seeing the class source code, the first thing I thought was that it would validate both the JSON and the deserialization. After running a couple of tests, I realized that wasn’t exactly the case. But I agree - if a custom ObjectMapper could be used, that would be fantastic.

At this point I'm afraid that I don't understand what you mean with "validate the deserialization". The guardrail tries to deserialize the json into an instance of the target class and fails if it is not able to do so. Isn't this equivalent to validate the deserialization (and actually also performing it)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants