Skip to content

wip(vercel-ai): add vercel ai integration #5858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Jun 9, 2025

What does this PR do?

Adds support for [email protected] and greater.

The Vercel AI SDK provides OTel tracing of their operations under the hood. This gives us a nice "in" to automatically pass our otel-compatible tracer in patching, and do whatever we want on span start and end. This is a better alternative to doing the patching ourselves, as the Vercel AI SDK does not nicely expose model provider functions through a base class, and certain functions, like tool executions, are not exported and are instead inlined.

import tracer, { TracerProvider } from 'dd-trace'
tracer.init()

const provider = new TracerProvider()
provider.register()

import { generateText } from "ai"
import { openai } from "@ai-sdk/openai"
const { text } = await generateText({
  model: openai("o3-mini"),
  prompt: "What is love?",
  experimental_telemetry: { // this can be done by users, or we can do it during patching
    isEnabled: true,
    tracer: provider.getTracer()
  }
})

To assist with this, we emit span start and end events for OpenTelemetry spans from our OTel tracer provider, and from there assert that the spans are Vercel AI spans in the corresponding APM and LLMObs integrations.

Motivation

Need to link FR when this PR is ready

Plugin Checklist

Additional Notes

Copy link

github-actions bot commented Jun 9, 2025

Overall package size

Self size: 9.66 MB
Deduped: 104.61 MB
No deduping: 105.12 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.6.0 | 30.47 MB | 30.47 MB | | @datadog/native-appsec | 8.5.2 | 19.33 MB | 19.34 MB | | @datadog/pprof | 5.8.0 | 12.55 MB | 12.92 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | import-in-the-middle | 1.14.0 | 120.58 kB | 841.68 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | lru-cache | 7.18.3 | 133.92 kB | 133.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 5.3.2 | 53.63 kB | 53.63 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | dc-polyfill | 0.1.9 | 25.11 kB | 25.11 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.2 | 23.54 kB | 23.54 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Jun 9, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.75%. Comparing base (8045409) to head (407d6dc).
Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
...ages/datadog-instrumentations/src/helpers/hooks.js 0.00% 1 Missing ⚠️
packages/dd-trace/src/llmobs/tagger.js 85.71% 1 Missing ⚠️
packages/dd-trace/src/plugins/index.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5858      +/-   ##
==========================================
+ Coverage   80.65%   80.75%   +0.10%     
==========================================
  Files         463      464       +1     
  Lines       19841    19924      +83     
==========================================
+ Hits        16002    16090      +88     
+ Misses       3839     3834       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jun 9, 2025

Benchmarks

Benchmark execution time: 2025-06-11 19:06:23

Comparing candidate commit 407d6dc in PR branch sabrenner/vercel-ai-sdk-integration with baseline commit 8045409 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1269 metrics, 54 unstable metrics.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jun 9, 2025

Datadog Report

Branch report: sabrenner/vercel-ai-sdk-integration
Commit report: 550f1a6
Test service: dd-trace-js-integration-tests

✅ 0 Failed, 1256 Passed, 0 Skipped, 21m 58.51s Total Time

return new Identifier(value, radix)
}
}, { Identifier })
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't necessary but is a helpful export for some instanceof checks we do in llmobs/tagger.js - seems all tests still pass as-is, if we're ok to do this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not expose this as we should replace the current implementation with BigInt at some point, and any exposed internals will make that more difficult.

isEnabled: true,
// TODO(sabrenner): need to figure out how a user can configure this tracer
// maybe we advise they do this manually?
tracer: provider.getTracer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we wouldn't go through translation as that could have unintended side-effects and won't be as flexible. Is there a centralized place in the module that handles tracing? Maybe we could patch something there instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey do have a getTracer function, but as this whole module is bundled it seems into one dist/index.js (ref), it's not actually available on module exports for us to hook into.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use orchestrion for this? I think we're not ready to use it for APM yet, but for LLMObs I think it's already used? If not I guess passing in our OTel tracer might be the only option, although I'd probably prefer to pass a "fake" tracer that only works for this integration at that point to avoid having to go through the OTel SDK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i thought about this too. aside from the node version problem (it's possible we have customers not above 20.6 yet in order to use orchestrion), it would just mean more on our side for maintaining patching for something they already instrument with the otel sdk themselves.

imho, i'm more in favor of just re-using their otel instrumentation and translating it as-is, although it wouldn't change the existing logic in the plugins too much, just our maintenance of patching in the instrumentation.

I'd probably prefer to pass a "fake" tracer that only works for this integration at that point to avoid having to go through the OTel SDK

could you clarify this a bit - i think i'm confused, i'm not seeing the harm in passing ours in as is 😅 it's only currently passed in if there's no tracer passed in which, by default, there isn't anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that for example if it expected to receive an object with startSpan on it, you could just pass an object created in the instrumentation that would just send a message on a channel like other integrations do instead of passing in a provider. Basically converting OTel to what the library should actually be doing itself 😅 Then the plugin can listen to that as normal without any hacks to make it work with OTel.

For example:

tracer: {
  startSpan ( ... ) {
    startCh.publish( ... )
  }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i've tried this in a4014cc. it works (yeah there are no tests but it looks the same w the dummy app i'm using for now 😆), but my main concern is if someone wants to pass in their own dd-trace provider otel tracer (as noted is possible in the PR description), we'll need to still handle those events (or just always override the tracer option, which i don't think is a good solution).

at that point, instead of handling the case where they pass in their own dd-trace otel tracer, and us using this custom one in instrumentation, it would be better to just keep the implementation before this commit.

let me know what you think - i actually do agree it's easier for parent context setting/manipulation if we do it ourselves as i implemented in the commit, but i wonder if we should have consideration for folks passing in their own dd-trace otel tracer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to further that point a bit, the dummy 'span' passed into the function in the instrumentation for startActiveSpan doesn't have all of the otel methods, just the ones used by vercel-ai. should they use one not on there, we break. imho any solution there that's like a) subclass the span from a noop otel span kind or b) just have all the methods, points us back to just listening for otel spans in the first place 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main concern is if someone wants to pass in their own dd-trace provider otel tracer (as noted is possible in the PR description), we'll need to still handle those events (or just always override the tracer option, which i don't think is a good solution)

i've pushed up one more change in 407d6dc that addresses this concern, which seems a lot cleaner. but i guess i still have a question on reusability of otel tracer spans from integrations we want to have. if we add more integrations that have their own otel tracing under the hood, is this a pattern we want to do every time?

return new Identifier(value, radix)
}
}, { Identifier })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not expose this as we should replace the current implementation with BigInt at some point, and any exposed internals will make that more difficult.

span.context()._trace.tags[PROPAGATED_PARENT_ID_KEY] ??
ROOT_PARENT_ID
let parentId
if (parent instanceof Identifier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an identifier as the parent needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i suppose i was being silly with what i was doing here haha. i can change this back to what i had before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually i had it this way because i don't think i have a reference to the parent span at the point of the vercel ai integration for llmobs parentage, just the current span, and on it, its parent id, which i opted to pass in here (but, correct me if i'm wrong about spans holding a reference to their entire parent span)


this.start(ctx, {
useLlmObsParent,
enterIntoLlmObsStorage: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these be replaced by a different class hierarchy or by runStores/bindStore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bit tricky to me - because we're using the points of the otel spans, which are the start and end, it doesn't leave a lot of flexibility for running/binding stores over a context if the only points of the context we have are disjoint start and end events. but, i could be thinking about this wrong - lmk if so!

since the otel spans started by the vercel ai sdk are activated spans in the proper scope, i wanted to just add an option to use the APM span parentage as opposed to the LLMObs parentage, since these operations should be isolated from whatever manual LLMObs instrumentation users might do.

Comment on lines +22 to +36
const noopTracer = {
startActiveSpan () {
const fn = arguments[arguments.length - 1]

const span = {
end () {},
setAttributes () { return this },
addEvent () { return this },
recordException () { return this },
setStatus () { return this }
}

return fn(span)
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this could be extracted out into an otel noop tracer that could be shared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants