-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Elastic.Clients.Elasticsearch version: 8.19.3
Elasticsearch version: 8.18.4
.NET runtime version: .NET 9
Operating system version: Windows
Description of the problem including expected versus actual behavior:
I am trying to store a collection of custom objects in a nested
property. The type is IList<Phrase>
, where Phrase
is defined as follows:
public class Phrase
{
public int Channel { get; set; }
public string SpeakerId { get; set; }
public long OffsetMilliseconds { get; set; }
public long DurationMilliseconds { get; set; }
public string Text { get; set; }
public float[] TextEmbedding { get; set; }
public string Locale { get; set; }
public double Confidence { get; set; }
}
I attempted to send this data using the bulk API (via this method: ElasticsearchDocumentIndexManager.cs#L121-L165.
The issue is that the internal serializer converts the IList<Phrase>
into camelCase, instead of preserving the original property names. This leads to the following error:
mapper [Phrases.textEmbedding] cannot be changed from type [dense_vector] to [float]
Specifically, TextEmbedding
is serialized as textEmbedding
, which conflicts with the mapping definition. Example request generated by the SDK:
{
"TranscriptId": "466hddd11c3kbv5qfvmsbsbwcj",
"FileName": "466hddd11c3kbv5qfvmsbsbwcj.opus",
"Phrases": [
{
"channel": 0,
"speakerId": "1",
"offsetMilliseconds": 200,
"durationMilliseconds": 3640,
"text": "Thank you for",
"textEmbedding": [
0.0294216201,
-0.00876519084,
-0.0213851593
// ...
]
}
],
"Transcript": "Thank you for calling"
}
Expected behavior
The request should preserve the original property casing exactly as defined in the class. For example:
{
"TranscriptId": "466hddd11c3kbv5qfvmsbsbwcj",
"FileName": "466hddd11c3kbv5qfvmsbsbwcj.opus",
"Phrases": [
{
"Channel": 0,
"SpeakerId": "1",
"OffsetMilliseconds": 200,
"DurationMilliseconds": 3640,
"Text": "Thank you for",
"TextEmbedding": [
0.0294216201,
-0.00876519084,
-0.0213851593
// ...
]
}
],
"Transcript": "Thank you for calling"
}
Notice that each property inside Phrases
starts with an uppercase letter, matching the class definition and the index mapping.
Provide DebugInformation
(if relevant):
Here is the index structure:
{
"test_ww1_calls": {
"mappings": {
"_meta": {
"last_task_id": 109
},
"properties": {
...
"Phrases": {
"type": "nested",
"properties": {
"Channel": { "type": "integer" },
"Confidence": { "type": "float" },
"DurationMilliseconds": { "type": "long" },
"Locale": { "type": "text" },
"OffsetMilliseconds": { "type": "long" },
"SpeakerId": { "type": "text" },
"Text": {
"type": "text",
"index_options": "offsets",
"analyzer": "standard"
},
"TextEmbedding": {
"type": "dense_vector",
"dims": 1536,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "int8_hnsw",
"m": 16,
"ef_construction": 100
}
},
...
}
},
...
}
}
}
}
Workaround
My workaround is to decorate each property in the Phrase
class with [JsonPropertyName]
to prevent the serializer from altering the casing:
public class Phrase
{
[JsonPropertyName(nameof(Channel))]
public int Channel { get; set; }
[JsonPropertyName(nameof(SpeakerId))]
public string SpeakerId { get; set; }
[JsonPropertyName(nameof(OffsetMilliseconds))]
public long OffsetMilliseconds { get; set; }
[JsonPropertyName(nameof(DurationMilliseconds))]
public long DurationMilliseconds { get; set; }
[JsonPropertyName(nameof(Text))]
public string Text { get; set; }
[JsonPropertyName(nameof(TextEmbedding))]
public float[] TextEmbedding { get; set; }
[JsonPropertyName(nameof(Locale))]
public string Locale { get; set; }
[JsonPropertyName(nameof(Confidence))]
public double Confidence { get; set; }
}
Notes
Since Elasticsearch is case-sensitive when it comes to field names, I expected the serializer to preserve the property names exactly as defined. If this behavior is intentional, it would be helpful to have a simpler way to configure or register a custom serializer so that casing is not inadvertently changed.