ai) via a language model middleware. Wrap any model with wrapLanguageModel and Bitfab captures every generateText, streamText, generateObject, and streamObject call as a keyed llm span, no hand-written withSpan required. Streaming is captured without disturbing the live stream.
Canonical signatures: TypeScript reference
Supported Languages
| Language | Method | Status |
|---|---|---|
| TypeScript | getVercelAiMiddleware() | ✅ Supported |
| Python | — | The Vercel AI SDK is JavaScript/TypeScript only |
| Ruby | — | Not yet supported |
| Go | — | Not yet supported |
Quick Start
model records a chat-turn span. The same middleware works for non-streaming calls:
What Gets Captured
Each model call creates onellm span:
| Field | Captured Data |
|---|---|
| Input | The call parameters (the prompt/messages and settings), recorded as a single positional argument so the call replays by key |
| Output | A serializable summary: { text, toolCalls, usage, finishReason, model }, where model is { provider, modelId } — the provider/model that actually served the call |
| Type | llm |
The middleware hooks the model, so it captures one span per model call. A multi-step
streamText run (tool calls that trigger follow-up model calls) records one span per step. To group those under a single root, wrap the whole call (see Nesting with core tracing).TypeScript
Installation
ai v5 or v6) as a peer dependency.
Method Signature
traceFunctionKey(string, required) — Groups all traces from this middleware under one key in Bitfab
wrapLanguageModel. It implements wrapGenerate and wrapStream; the AI SDK reads only those, so it drops straight in.
Usage
Multiple providers and fallback
Because the middleware wraps the model, it is provider-agnostic: wrap each model you use (Anthropic, OpenAI, anything that implements the AI SDK model interface) with the same middleware. The provider/model that served each call is recorded on the span, so a Claude-primary, GPT-4o-fallback setup shows exactly who answered.deal-brief key; the model field on each span tells the two apart.
Next.js App Router
In a route handler (or server action), wrap the model once and use it as normal. For streaming, return the AI SDK response directly — the middleware’s span finalizes as the stream drains, so it does not delay first byte:after() so the span lands:
Vercel Workflow SDK
The middleware works inside Vercel Workflow SDK durable steps ("use step"). Each "use step" runs in its own invocation, so two things need care for durable, multi-step pipelines (extraction, research, deal-brief/memo generation):
- Flush before the step returns. Span uploads are fire-and-forget. A non-streaming model call inside a step that returns immediately can have its upload cut off when the function freezes, so
await flushTraces()at the end of the step. - Stitch the steps into one run. Steps run in separate invocations with no shared context, so each step records its own trace. Set a shared session id (the workflow run id) on every step, and Bitfab groups them as one run. Wrapping each step body in
withSpangives it a replayable root with the model-callllmspans nested underneath.
"use step" functions (they have full Node.js access); the "use workflow" function only orchestrates.
runId session, so the whole pipeline reads as one run in Bitfab. (This pattern is verified end to end against the real Workflow runtime in the SDK’s test suite.)
For agents built with DurableAgent from @workflow/ai, the model still flows through wrapLanguageModel, so wrap the model the same way and pass it to the agent.
Nesting with Core Tracing
To group a multi-step run under one replayable root, wrap the AI SDK call inwithSpan. The middleware spans nest underneath it, and the finalizers.aiSdk helper records a clean root output from the streaming result without consuming the live stream:
chat-turn span carries the messages as input and { text, usage, finishReason, toolCalls } as output, with each model call nested beneath it.
Streaming
Streaming is handled automatically. The middleware passes the model’s stream through a transform that accumulates the assembled output as the AI SDK reads it, so:- The caller’s stream is returned unchanged — every part is enqueued in order.
- First-byte latency is unaffected.
- The span is finalized once the stream emits its
finishpart.
finalizers.aiSdk for the per-call middleware spans; it is only for an optional outer withSpan root around the AI SDK call (see Nesting with core tracing).
Error Handling
- The middleware never throws; span capture is wrapped so a tracing failure cannot break the model call or the stream.
- If the client is disabled, the middleware is a transparent pass-through.
Replay
Each model call records a keyedllm span carrying its parameters as input, so the call replays by key. With no outer withSpan, the span is the trace root and replay(key, fn) re-feeds each historical call’s parameters to a callable that re-issues the model call.
If you wrap the AI SDK call in a withSpan with the same key (see Nesting with core tracing), that outer span is the replayable root and the middleware spans nest under it. Full details: Replaying functions in the TypeScript SDK page.