Skip to content

Incorrect data serialization #8684

@MikeAlhayek

Description

@MikeAlhayek

Elastic.Clients.Elasticsearch version: 8.19.3

Elasticsearch version: 8.18.4

.NET runtime version: .NET 9

Operating system version: Windows

Description of the problem including expected versus actual behavior:
I am trying to store a collection of custom objects in a nested property. The type is IList<Phrase>, where Phrase is defined as follows:

public class Phrase
{
    public int Channel { get; set; }
    public string SpeakerId { get; set; }
    public long OffsetMilliseconds { get; set; }
    public long DurationMilliseconds { get; set; }
    public string Text { get; set; }
    public float[] TextEmbedding { get; set; }
    public string Locale { get; set; }
    public double Confidence { get; set; }
}

I attempted to send this data using the bulk API (via this method: ElasticsearchDocumentIndexManager.cs#L121-L165.

The issue is that the internal serializer converts the IList<Phrase> into camelCase, instead of preserving the original property names. This leads to the following error:

mapper [Phrases.textEmbedding] cannot be changed from type [dense_vector] to [float]

Specifically, TextEmbedding is serialized as textEmbedding, which conflicts with the mapping definition. Example request generated by the SDK:

{
  "TranscriptId": "466hddd11c3kbv5qfvmsbsbwcj",
  "FileName": "466hddd11c3kbv5qfvmsbsbwcj.opus",
  "Phrases": [
    {
      "channel": 0,
      "speakerId": "1",
      "offsetMilliseconds": 200,
      "durationMilliseconds": 3640,
      "text": "Thank you for",
      "textEmbedding": [
        0.0294216201,
        -0.00876519084,
        -0.0213851593
        // ...
      ]
    }
  ],
  "Transcript": "Thank you for calling"
}

Expected behavior
The request should preserve the original property casing exactly as defined in the class. For example:

{
  "TranscriptId": "466hddd11c3kbv5qfvmsbsbwcj",
  "FileName": "466hddd11c3kbv5qfvmsbsbwcj.opus",
  "Phrases": [
    {
      "Channel": 0,
      "SpeakerId": "1",
      "OffsetMilliseconds": 200,
      "DurationMilliseconds": 3640,
      "Text": "Thank you for",
      "TextEmbedding": [
        0.0294216201,
        -0.00876519084,
        -0.0213851593
        // ...
      ]
    }
  ],
  "Transcript": "Thank you for calling"
}

Notice that each property inside Phrases starts with an uppercase letter, matching the class definition and the index mapping.

Provide DebugInformation (if relevant):
Here is the index structure:

{
  "test_ww1_calls": {
    "mappings": {
      "_meta": {
        "last_task_id": 109
      },
      "properties": {
        ...
        "Phrases": {
          "type": "nested",
          "properties": {
            "Channel": { "type": "integer" },
            "Confidence": { "type": "float" },
            "DurationMilliseconds": { "type": "long" },
            "Locale": { "type": "text" },
            "OffsetMilliseconds": { "type": "long" },
            "SpeakerId": { "type": "text" },
            "Text": {
              "type": "text",
              "index_options": "offsets",
              "analyzer": "standard"
            },
            "TextEmbedding": {
              "type": "dense_vector",
              "dims": 1536,
              "index": true,
              "similarity": "cosine",
              "index_options": {
                "type": "int8_hnsw",
                "m": 16,
                "ef_construction": 100
              }
            },
            ...
          }
        },
        ...
      }
    }
  }
}

Workaround
My workaround is to decorate each property in the Phrase class with [JsonPropertyName] to prevent the serializer from altering the casing:

public class Phrase
{
    [JsonPropertyName(nameof(Channel))]
    public int Channel { get; set; }

    [JsonPropertyName(nameof(SpeakerId))]
    public string SpeakerId { get; set; }

    [JsonPropertyName(nameof(OffsetMilliseconds))]
    public long OffsetMilliseconds { get; set; }

    [JsonPropertyName(nameof(DurationMilliseconds))]
    public long DurationMilliseconds { get; set; }

    [JsonPropertyName(nameof(Text))]
    public string Text { get; set; }

    [JsonPropertyName(nameof(TextEmbedding))]
    public float[] TextEmbedding { get; set; }

    [JsonPropertyName(nameof(Locale))]
    public string Locale { get; set; }

    [JsonPropertyName(nameof(Confidence))]
    public double Confidence { get; set; }
}

Notes
Since Elasticsearch is case-sensitive when it comes to field names, I expected the serializer to preserve the property names exactly as defined. If this behavior is intentional, it would be helpful to have a simpler way to configure or register a custom serializer so that casing is not inadvertently changed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions