Skip to main content
Bitfab integrates with the Vercel AI SDK (ai) via a language model middleware. Wrap any model with wrapLanguageModel and Bitfab captures every generateText, streamText, generateObject, and streamObject call as a keyed llm span, no hand-written withSpan required. Streaming is captured without disturbing the live stream. Canonical signatures: TypeScript reference

Supported Languages

LanguageMethodStatus
TypeScriptgetVercelAiMiddleware()✅ Supported
PythonThe Vercel AI SDK is JavaScript/TypeScript only
RubyNot yet supported
GoNot yet supported

Quick Start

import { Bitfab } from "@bitfab/sdk"
import { openai } from "@ai-sdk/openai"
import { streamText, wrapLanguageModel } from "ai"

const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })

// Wrap the model once; reuse it everywhere you call the AI SDK.
const model = wrapLanguageModel({
  model: openai("gpt-4o"),
  middleware: bitfab.getVercelAiMiddleware("chat-turn"),
})

const result = streamText({ model, messages })
return result.toUIMessageStreamResponse() // live stream untouched
Every call through model records a chat-turn span. The same middleware works for non-streaming calls:
const { text } = await generateText({ model, prompt: "Summarize this." })

What Gets Captured

Each model call creates one llm span:
FieldCaptured Data
InputThe call parameters (the prompt/messages and settings), recorded as a single positional argument so the call replays by key
OutputA serializable summary: { text, toolCalls, usage, finishReason, model }, where model is { provider, modelId } — the provider/model that actually served the call
Typellm
Because the middleware hooks the model, it captures the resolved provider and model id on every call. That makes a multi-provider setup (one primary model, a fallback) fully observable: each span shows which provider answered. For streaming calls, the assembled text, tool calls, and final usage are accumulated from the model’s stream as the caller consumes it. The live stream is handed back unchanged (first-byte latency is untouched) and the span is finalized once the stream completes.
The middleware hooks the model, so it captures one span per model call. A multi-step streamText run (tool calls that trigger follow-up model calls) records one span per step. To group those under a single root, wrap the whole call (see Nesting with core tracing).

TypeScript

Installation

npm install @bitfab/sdk
Requires the Vercel AI SDK (ai v5 or v6) as a peer dependency.

Method Signature

bitfab.getVercelAiMiddleware(traceFunctionKey: string): BitfabLanguageModelMiddleware
Parameters:
  • traceFunctionKey (string, required) — Groups all traces from this middleware under one key in Bitfab
Returns: A language model middleware object you pass to the AI SDK’s wrapLanguageModel. It implements wrapGenerate and wrapStream; the AI SDK reads only those, so it drops straight in.

Usage

import { Bitfab } from "@bitfab/sdk"
import { anthropic } from "@ai-sdk/anthropic"
import { generateText, tool, wrapLanguageModel } from "ai"
import { z } from "zod"

const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })

const model = wrapLanguageModel({
  model: anthropic("claude-sonnet-4-6"),
  middleware: bitfab.getVercelAiMiddleware("weather-agent"),
})

const { text } = await generateText({
  model,
  prompt: "What's the weather in San Francisco?",
  tools: {
    getWeather: tool({
      description: "Get the weather for a city",
      inputSchema: z.object({ city: z.string() }),
      execute: async ({ city }) => `Foggy in ${city}`,
    }),
  },
})
// Each model call (initial + post-tool) is traced under "weather-agent"

Multiple providers and fallback

Because the middleware wraps the model, it is provider-agnostic: wrap each model you use (Anthropic, OpenAI, anything that implements the AI SDK model interface) with the same middleware. The provider/model that served each call is recorded on the span, so a Claude-primary, GPT-4o-fallback setup shows exactly who answered.
import { anthropic } from "@ai-sdk/anthropic"
import { openai } from "@ai-sdk/openai"
import { generateText, wrapLanguageModel } from "ai"

const middleware = bitfab.getVercelAiMiddleware("deal-brief")

const primary = wrapLanguageModel({
  model: anthropic("claude-sonnet-4-6"),
  middleware,
})
const fallback = wrapLanguageModel({
  model: openai("gpt-4o"),
  middleware,
})

async function generateBrief(prompt: string) {
  try {
    return await generateText({ model: primary, prompt })
  } catch {
    // The fallback call is traced too, with model = { provider: "openai", ... }.
    return await generateText({ model: fallback, prompt })
  }
}
Every call (primary or fallback) lands under the deal-brief key; the model field on each span tells the two apart.

Next.js App Router

In a route handler (or server action), wrap the model once and use it as normal. For streaming, return the AI SDK response directly — the middleware’s span finalizes as the stream drains, so it does not delay first byte:
// app/api/chat/route.ts
import { streamText } from "ai"
import { model } from "@/lib/ai" // wrapLanguageModel(...) with the middleware

export async function POST(req: Request) {
  const { messages } = await req.json()
  const result = streamText({ model, messages })
  return result.toUIMessageStreamResponse()
}
For non-streaming calls in serverless, the span upload is fire-and-forget; if the function returns immediately the upload may be cut off. Keep it alive with after() so the span lands:
import { after } from "next/server"
import { flushTraces } from "@bitfab/sdk"

export async function POST(req: Request) {
  // ... generateText(...) ...
  after(() => flushTraces())
  return Response.json(result)
}
(Streaming responses keep the function alive while the stream is consumed, so this is only needed for non-streaming calls.)

Vercel Workflow SDK

The middleware works inside Vercel Workflow SDK durable steps ("use step"). Each "use step" runs in its own invocation, so two things need care for durable, multi-step pipelines (extraction, research, deal-brief/memo generation):
  1. Flush before the step returns. Span uploads are fire-and-forget. A non-streaming model call inside a step that returns immediately can have its upload cut off when the function freezes, so await flushTraces() at the end of the step.
  2. Stitch the steps into one run. Steps run in separate invocations with no shared context, so each step records its own trace. Set a shared session id (the workflow run id) on every step, and Bitfab groups them as one run. Wrapping each step body in withSpan gives it a replayable root with the model-call llm spans nested underneath.
Keep the AI logic in "use step" functions (they have full Node.js access); the "use workflow" function only orchestrates.
import { MockLanguageModelV3 } from "ai/test"
import { generateText, wrapLanguageModel } from "ai"
import { anthropic } from "@ai-sdk/anthropic"
import { Bitfab, flushTraces, getCurrentTrace } from "@bitfab/sdk"

const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })

async function extract(runId: string, doc: string): Promise<string> {
  "use step"
  const model = wrapLanguageModel({
    model: anthropic("claude-sonnet-4-6"),
    middleware: bitfab.getVercelAiMiddleware("extract"),
  })
  const run = bitfab.withSpan("extract", { type: "agent" }, async (input: string) => {
    getCurrentTrace().setSessionId(runId) // group every step under the run
    const { text } = await generateText({ model, prompt: input })
    return text
  })
  const result = await run(doc)
  await flushTraces() // the step invocation ends here — make the span land
  return result
}

// ...research(runId, ...) and writeMemo(runId, ...) follow the same shape...

export async function dealBriefPipeline(doc: string) {
  "use workflow"
  const runId = crypto.randomUUID()
  const facts = await extract(runId, doc)
  const notes = await research(runId, facts)
  return await writeMemo(runId, notes)
}
Each step’s model calls are captured; all steps share the runId session, so the whole pipeline reads as one run in Bitfab. (This pattern is verified end to end against the real Workflow runtime in the SDK’s test suite.) For agents built with DurableAgent from @workflow/ai, the model still flows through wrapLanguageModel, so wrap the model the same way and pass it to the agent.

Nesting with Core Tracing

To group a multi-step run under one replayable root, wrap the AI SDK call in withSpan. The middleware spans nest underneath it, and the finalizers.aiSdk helper records a clean root output from the streaming result without consuming the live stream:
import { finalizers } from "@bitfab/sdk"

const runChatTurn = bitfab.withSpan(
  "chat-turn",
  { type: "agent", finalize: finalizers.aiSdk },
  (messages) => streamText({ model, messages }),
)

const result = runChatTurn(messages) // caller still gets the live stream
return result.toUIMessageStreamResponse()
Here the root chat-turn span carries the messages as input and { text, usage, finishReason, toolCalls } as output, with each model call nested beneath it.

Streaming

Streaming is handled automatically. The middleware passes the model’s stream through a transform that accumulates the assembled output as the AI SDK reads it, so:
  • The caller’s stream is returned unchanged — every part is enqueued in order.
  • First-byte latency is unaffected.
  • The span is finalized once the stream emits its finish part.
You do not need finalizers.aiSdk for the per-call middleware spans; it is only for an optional outer withSpan root around the AI SDK call (see Nesting with core tracing).

Error Handling

  • The middleware never throws; span capture is wrapped so a tracing failure cannot break the model call or the stream.
  • If the client is disabled, the middleware is a transparent pass-through.

Replay

Each model call records a keyed llm span carrying its parameters as input, so the call replays by key. With no outer withSpan, the span is the trace root and replay(key, fn) re-feeds each historical call’s parameters to a callable that re-issues the model call. If you wrap the AI SDK call in a withSpan with the same key (see Nesting with core tracing), that outer span is the replayable root and the middleware spans nest under it. Full details: Replaying functions in the TypeScript SDK page.