Skip to content

Feature/json schema spec2020 #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

cs-util
Copy link
Collaborator

@cs-util cs-util commented Jul 27, 2025

This change is Reviewable

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @cs-util, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on modernizing and enhancing the JSON Schema generation and consumption capabilities within the codebase. It introduces significant updates to align the internal JSON Schema representation with newer specifications, improves the fidelity of generated schemas from C# models, and strengthens the integration with the OpenAI API for structured outputs. Additionally, it includes improvements to background task management for better synchronization and minor utility enhancements.

Highlights

  • JSON Schema Specification Alignment: I've refactored the core JsonSchema class and related components (JsonModelExtensions.cs, BaseJsonSchemaToView.cs, ModelToJsonSchema.cs) to better support the anyOf keyword for array items. This change aligns our JSON Schema representation with more recent specifications, such as Draft 2020-12, allowing for more flexible and accurate schema definitions for arrays.
  • Enhanced JSON Schema Generation: The ModelToJsonSchema.cs class now generates more accurate and descriptive JSON schemas from C# models. This includes respecting JsonPropertyAttribute for field names, incorporating content from DescriptionAttribute into schema descriptions, and representing enums as strings rather than integers in the generated schema, improving readability and compatibility.
  • Improved OpenAI Structured Output Integration: The OpenAI API client (OpenAi.cs) has been enhanced to support stricter JSON Schema response formats. I've added validation checks for schema names and types, and introduced a convenient overload for SetResponseFormatToJsonSchema<T> that automatically generates the schema name from the C# type, streamlining the process of requesting structured outputs from OpenAI.
  • Reliable Background Task Management: I've introduced WaitTillDone methods to ExpensiveRegularAsyncTaskInBackground.cs and TemplatesIO.cs. These methods allow for explicit synchronization, enabling callers to await the completion of all queued and running background tasks. This significantly improves the reliability of operations, especially in test environments where precise timing is crucial.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the JSON schema implementation to a more recent specification, primarily by changing how array items are handled with anyOf. These changes are consistently applied across model generation, UI components, and tests. The OpenAI API client also sees significant improvements, including support for strict JSON schema responses and better input validation.

My review highlights a few areas for improvement, focusing on preventing potential runtime exceptions, enhancing performance, and ensuring API consistency. Overall, this is a strong contribution that modernizes a key part of the codebase.

@@ -210,7 +210,7 @@ public static async Task LoadModelList(this ListFieldView self, JObject root, Js

private static void SetupButtons(ListFieldView listView, JObject root, JsonSchemaToView viewGenerator, JArray modelArray, Dictionary<FieldView, JToken> map) {
listView.add.SetOnClickAction(async delegate {
JToken entry = listView.field.items.First().NewDefaultJInstance();
JToken entry = listView.field.items.anyOf.First().NewDefaultJInstance();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code calls listView.field.items.anyOf.First() without checking if items or anyOf are null or empty. This could lead to a NullReferenceException or InvalidOperationException at runtime. Add checks to handle these cases gracefully.

@@ -256,7 +256,7 @@ private static IEnumerable<ListEntryView> GetSelectedViews(ListFieldView self) {

private static async Task<FieldView> CreateChildEntryView(
ListFieldView self, JObject root, JsonSchemaToView viewGenerator, JToken modelEntry, string fieldName) {
JsonSchema newEntryVm = GetMatchingSchema(modelEntry, self.field.items);
JsonSchema newEntryVm = GetMatchingSchema(modelEntry, self.field.items.anyOf);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code accesses self.field.items.anyOf without checking if items or anyOf are null. This could lead to a NullReferenceException if either is null, causing the GetMatchingSchema method to fail. Add a null check to prevent this.

Comment on lines 213 to -217
public double? temperature { get; set; }

/// <summary> The maximum number of tokens to generate in the completion.
/// The token count of your prompt plus max_tokens cannot exceed the model's context length.
/// Most models have a context length of 2048 tokens (except for the newest models, which support 4096). </summary>
public int max_tokens { get; set; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The max_tokens property was removed from ChatGpt.Request. This parameter is still supported by the OpenAI Chat Completions API and is important for controlling response length and managing token usage. Removing it could be a feature regression. Was this removal intentional? If so, the reasoning should be documented. If not, it should be restored. The VisionGpt.Request class still includes it, which makes its removal here seem inconsistent.

}
}
}

private static bool onBlacklist(Type enumType) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method onBlacklist should be renamed to IsBlacklisted to adhere to C# naming conventions.

        private static bool IsBlacklisted(Type enumType) {

Comment on lines +282 to +287
if (!Regex.IsMatch(name, "^[a-zA-Z0-9_-]+$")) { // name must match the pattern '^[a-zA-Z0-9_-]+$' so verify that here:
throw new ArgumentException($"Invalid 'response_format.json_schema.name': {name} does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.");
}
if (schema.type != "object") {
throw new ArgumentException($"Invalid schema.type: {schema.type} is not supported, expected 'object'.");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Regex.IsMatch in a constructor that might be called frequently can be inefficient as it compiles the regex on each call. For better performance, create a static readonly Regex instance with RegexOptions.Compiled and reuse it.

private static readonly Regex NameValidationRegex = new Regex("^[a-zA-Z0-9_-]+$", RegexOptions.Compiled);

public JsonSchemaResponse(string name, JsonSchema schema, bool strict) {
    this.name = name;
    this.schema = schema;
    this.strict = strict;
    if (!NameValidationRegex.IsMatch(name)) { // name must match the pattern '^[a-zA-Z0-9_-]+$' so verify that here:

if (tasksToWaitFor.IsEmpty()) {
// If there are no running tasks but still waiting tasks,
// a short delay is needed to allow the system to schedule the next task.
await Task.Delay(10);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using Task.Delay with a magic number like 10 to handle synchronization can be brittle and may lead to race conditions under different system loads. A more robust approach would be to use synchronization primitives like SemaphoreSlim or TaskCompletionSource to signal when all tasks are truly complete, rather than relying on a timed delay.

Comment on lines 86 to 90
var words = RegexUtil.SplitCamelCaseString(varName).Split(' ');
var words2 = words.SelectMany(x => x.Split('_'));
words2 = words2.Where(x => !x.IsNullOrEmpty());
return string.Join(" ", words2.Select(x => x.First().ToString().ToUpper() + x.Substring(1)));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of ToTitle can be made more concise and slightly more performant by using a single LINQ chain and char.ToUpper instead of ToString().ToUpper().

public static string ToTitle(string varName) {
    return string.Join(" ", RegexUtil.SplitCamelCaseString(varName)
        .Split(' ')
        .SelectMany(x => x.Split('_'))
        .Where(x => !x.IsNullOrEmpty())
        .Select(x => char.ToUpper(x[0]) + x.Substring(1)));
}

Comment on lines +100 to +103
if (modelType != null && property.Value is JValue jv && jv.Parent is JProperty jp) {
member = modelType.GetMember(jp.Name).FirstOrDefault();
}
schema.properties.Add(property.Key, NewField(property.Key, modelType, model, property.Value, member));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to get the MemberInfo here is fragile. It only works if property.Value is a JValue. If it's a JObject or JArray (for nested objects or arrays), jv will be null, and member will not be found. This could lead to incorrect schema generation for complex objects when processing from a JSON model.

You can make this more robust by getting the property name from property.Key, which is available for any JToken type.

if (modelType != null) {
    member = modelType.GetMember(property.Key).FirstOrDefault();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant