Skip to content

tracing-subscriber: optimise span fields serialisation #3343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lbeschastny
Copy link

@lbeschastny lbeschastny commented Jul 22, 2025

Move formatted fields from Span to json log without storing them in an intermediate serde_json::Value by using a smart serde visitor.

Motivation

Current serde::ser::Serialize implementation for SerializableSpan parses the entire FormattedFields string into serde_json::Value to fix #391, but there is no need to create an intermediate object just to move json fields from one json string to another.

Solution

Instead this PR introduces a new SerializeMapVisitor struct implementing serde::de::Visitor trait to move all json fields from Deserializer to Serializer as json object is being parsed.

I assume here that FormattedFields should always contain a json object.

FormattedFields is still parsed using serde. But instead of deserialising it into one big serde_json::Value, I iterate over it's entriesusing a custom serde::de::Visitor.

FormattedFields keys are always parsed as &str. This could fail if any of the keys contains escape sequences, i.e. {"boo\\"bar":"baz"}. But I don't think this corner case is worth being supported.

FormattedFields values are parsed as serde_json::Value. I tried being smart there, but it didn't work as expected (see comments below).

@lbeschastny lbeschastny requested review from hawkw and a team as code owners July 22, 2025 14:51
Move formatted fields from Span to json log without storing them in an intermediate `serde_json::Value` by using a smart serde visitor.
@lbeschastny lbeschastny force-pushed the chore/optimise-span-serialisation branch from b2b0aa9 to 5f1bacc Compare July 22, 2025 15:22
@lbeschastny
Copy link
Author

lbeschastny commented Jul 22, 2025

I tried to measure effect on memory allocations from this change using stats_alloc crate.

diff --git a/tracing-subscriber/Cargo.toml b/tracing-subscriber/Cargo.toml
index 1ea08eb..91d83fe 100644
--- a/tracing-subscriber/Cargo.toml
+++ b/tracing-subscriber/Cargo.toml
@@ -68,6 +68,7 @@ chrono = { version = "0.4.26", default-features = false, features = ["clock", "s
 # registry
 sharded-slab = { version = "0.1.4", optional = true }
 thread_local = { version = "1.1.4", optional = true }
+stats_alloc = "0.1.10"

 [target.'cfg(tracing_unstable)'.dependencies]
 valuable_crate = { package = "valuable", version = "0.1.0", optional = true, default-features = false }

I used it with existing tests in fmt::format::json module:

diff --git a/tracing-subscriber/src/fmt/format/json.rs b/tracing-subscriber/src/fmt/format/json.rs
index 3ef0fcd..c332f58 100644
--- a/tracing-subscriber/src/fmt/format/json.rs
+++ b/tracing-subscriber/src/fmt/format/json.rs
@@ -548,6 +548,9 @@ impl field::Visit for JsonVisitor<'_> {
 }
 #[cfg(test)]
 mod test {
+    use stats_alloc::{self, Region, StatsAlloc, INSTRUMENTED_SYSTEM};
+    use std::alloc::System;
+
     use super::*;
     use crate::fmt::{format::FmtSpan, test::MockMakeWriter, time::FormatTime, SubscriberBuilder};
     use tracing::{self, subscriber::with_default};
@@ -555,6 +558,9 @@ mod test {
     use std::fmt;
     use std::path::Path;

+    #[global_allocator]
+    static GLOBAL: &StatsAlloc<System> = &INSTRUMENTED_SYSTEM;
+
     struct MockTime;
     impl FormatTime for MockTime {
         fn format_time(&self, w: &mut Writer<'_>) -> fmt::Result {
@@ -853,7 +859,12 @@ mod test {
             .with_timer(MockTime)
             .finish();

-        with_default(subscriber, producer);
+        with_default(subscriber, || {
+            let reg = Region::new(&GLOBAL);
+            producer();
+            let stat = reg.change();
+            println!("{stat:#?}");
+        });

         let buf = make_writer.buf();
         let actual = std::str::from_utf8(&buf[..]).unwrap();

Here is what I got from running

cargo test --package tracing-subscriber --lib --all-features --release -- fmt::format::json::test::json_nested_span --exact --show-output

on my branch:

---- fmt::format::json::test::json_nested_span stdout ----
Stats {
    allocations: 32,
    deallocations: 19,
    reallocations: 23,
    bytes_allocated: 7906,
    bytes_deallocated: 2065,
    bytes_reallocated: 637,
}

and on main:

---- fmt::format::json::test::json_nested_span stdout ----
Stats {
    allocations: 29,
    deallocations: 16,
    reallocations: 11,
    bytes_allocated: 8998,
    bytes_deallocated: 3157,
    bytes_reallocated: 616,
}

So, while my patch reduces bytes_allocated, it increases the number of memory allocations and reallocations.

And now I'm not sure if this actually improves anything.

@lbeschastny
Copy link
Author

lbeschastny commented Jul 22, 2025

Hm... I tried dropping SpanFieldsValue enum and using serde_json::Value instead and got perfect allocation stats:

Stats {
    allocations: 20,
    deallocations: 7,
    reallocations: 11,
    bytes_allocated: 7066,
    bytes_deallocated: 1225,
    bytes_reallocated: 616,
}

So, maybe this is the way to go.

I think it's Deserialize implementation for an untagged enum that creates those extra allocations since it just tries every enum variant.

@lbeschastny
Copy link
Author

Looks like I was right. I tried using only strings in span fields and SpanFieldsValue showed an expected improvement over serde_json::Value.

But we can't assume those would be strings, so I'm going to drop SpanFieldsValue.

lbeschastny added a commit to nentgroup/telemetry-rust that referenced this pull request Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

subscriber: don't escape field values in JSON
1 participant