TypeScript SDK

The Bitfab TypeScript SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.

Installation

# npm
npm install bitfab

# pnpm
pnpm add bitfab

# yarn
yarn add bitfab

Quick Start

import { Bitfab } from "bitfab"

const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })

Need an API key? Get one from the Bitfab dashboard or see the API Keys guide for detailed setup instructions.

Coding Agent Prompt (Cursor, Claude Code)

Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):

Modify existing TypeScript code to add Bitfab tracing.
Do NOT browse or web search. Use ONLY the API described below.

Bitfab TypeScript SDK (authoritative excerpt):
- Install: `npm install bitfab` or `pnpm add bitfab`
- Init:
  import { Bitfab } from "bitfab"
  const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })
- Instrumentation (ONLY allowed form - use getFunction):
  // Declare trace function key once
  const myService = bitfab.getFunction("<trace_function_key>")

  // Wrap functions with withSpan
  const tracedFn = myService.withSpan(originalFunction)

  // Or with options:
  const tracedFn = myService.withSpan({ name: "DisplayName", type: "function" }, originalFunction)

  // Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"
- DO NOT modify the original function.
- DO NOT extract helper methods.

Task:
1) Ensure bitfab is installed and initialization exists.
2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).
3) Present me with a numbered list of workflows you found. For each, describe:
   - What it does
   - Why it's worth instrumenting — what visibility tracing gives you into each step
4) After I choose which workflow(s) to instrument:
   - Create a function wrapper with `bitfab.getFunction("<trace_function_key>")`
   - Wrap the functions with `myService.withSpan(originalFunction)`
   - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues
   - Replace usages of the original functions with the traced versions
5) Do not change function signature, behavior, or return value. Minimal diff.

Output:
- First: your numbered list of workflows with why each is worth instrumenting
- After my selection: minimal diffs for dependencies, initialization, and the function wrapping

Basic Configuration

new Bitfab({ apiKey: string })

// Disable tracing (functions still execute, but no spans are sent)
new Bitfab({ apiKey: string, enabled: false })

Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All wrapped functions still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.

Tracing

Custom (Recommended)

Using `getFunction()` to Link Spans

Declare the trace function key once and wrap multiple functions:

const orderService = bitfab.getFunction("order-processing")

async function processOrder(orderId: string) {
  return { orderId }
}

async function validateOrder(orderId: string) {
  return { valid: true }
}

// Wrap functions - all share the same trace function key
const tracedProcessOrder = orderService.withSpan(processOrder)
const tracedValidateOrder = orderService.withSpan(validateOrder)

Multi-File Projects

For projects with instrumented functions spread across multiple files, create a dedicated file that initializes Bitfab and exports the function. Import it wherever you need to instrument.

// lib/bitfab.ts — single source of truth
import { Bitfab } from "bitfab"
const bitfab = new Bitfab({ apiKey: process.env.BITFAB_API_KEY })
export const orderService = bitfab.getFunction("order-processing")

// services/processOrder.ts
import { orderService } from "../lib/bitfab"

async function processOrder(orderId: string) {
  return { orderId }
}

export const tracedProcessOrder = orderService.withSpan(processOrder)

// services/validateOrder.ts
import { orderService } from "../lib/bitfab"

async function validateOrder(orderId: string) {
  return { valid: true }
}

export const tracedValidateOrder = orderService.withSpan(validateOrder)

Spans from different files are automatically linked as parent-child when one wrapped function calls another.

Wrapping Existing Functions Inline

When wrapping a function you didn’t define (e.g. an SDK or library call), pass it directly to withSpan and call the result immediately. This ensures the arguments are captured as span input.

// ✅ GOOD — pass function directly, arguments are captured as span input
const result = await orderService.withSpan(
  { name: "ProcessOrder", type: "function" },
  processOrder,
)(orderId)

// ❌ BAD — anonymous wrapper loses all input capture (span has no input)
const result = await orderService.withSpan(
  { name: "ProcessOrder", type: "function" },
  async () => processOrder(orderId),
)()

Never wrap functions in an anonymous function like async () => fn(args). The SDK captures the wrapper function’s arguments as span input — an anonymous wrapper has no arguments, so the span records nothing.

Using `withSpan()` Directly

For a single span without linking to a function group:

const standaloneTask = bitfab.withSpan("one-off-operation", () => {
  return "done"
})

Automatic Nesting

Spans nest automatically based on call stack:

const outer = bitfab.withSpan("outer", { type: "agent" }, async () => {
  await inner()  // Becomes a child of "outer"
})

const inner = bitfab.withSpan("inner", { type: "function" }, async () => {
  // ...
})

Span Options

Parameters:

traceFunctionKey (required): String identifier for grouping spans
name (optional): Display name. Defaults to function name, then trace function key
type (optional): Span type. Defaults to "custom"

Span Types:

type SpanType =
  | "llm"        // LLM calls
  | "agent"      // Agent workflows
  | "function"   // Function calls
  | "guardrail"  // Safety checks
  | "handoff"    // Human handoffs
  | "custom"     // Default

Examples:

// Function name is automatically captured as span name
async function processOrder(orderId: string) {
  return { orderId }
}
const traced = bitfab.withSpan("order-processing", processOrder)
// Span name: "processOrder"

// Override with name option
const traced = bitfab.withSpan(
  "order-processing",
  { name: "OrderProcessor" },
  processOrder
)
// Span name: "OrderProcessor"

// Set span type
const checkSafety = bitfab.withSpan(
  "safety-check",
  { type: "guardrail" },
  async (content: string) => ({ safe: true })
)

// With getFunction()
const service = bitfab.getFunction("order-processing")
const traced = service.withSpan({ name: "CustomName", type: "function" }, processOrder)

Span Context

Use getCurrentSpan() to get a handle to the active span, then call .addContext() to attach contextual key-value pairs from inside a traced function — useful for runtime values like request IDs, computed scores, or dynamic context:

import { getCurrentSpan } from "bitfab"

async function processOrder(orderId: string) {
  const userId = await getCurrentUser()
  getCurrentSpan()?.addContext({ user_id: userId, order_id: orderId })
  return { orderId, status: "completed" }
}

const traced = bitfab.withSpan("order-processing", { type: "function" }, processOrder)

Each addContext call pushes the entire object as one entry. Multiple calls accumulate entries:

getCurrentSpan()?.addContext({ user_id: "u-123" })
getCurrentSpan()?.addContext({ request_id: "req-789" })
// Result: contexts: [{ user_id: "u-123" }, { request_id: "req-789" }]

Span Trace ID

Access the current trace ID from within a span using getCurrentSpan().traceId. This is useful for capturing trace IDs to use with replay or for logging:

import { getCurrentSpan } from "bitfab"

const traced = bitfab.withSpan("my-function", async () => {
  const traceId = getCurrentSpan().traceId  // UUID string
  console.log("Current trace:", traceId)
  return { traceId }
})

Outside a span context, getCurrentSpan().traceId returns an empty string.

Span Prompt

Use getCurrentSpan() to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:

import { getCurrentSpan } from "bitfab"

async function classifyText(text: string) {
  const prompt = `Classify the following text: ${text}`
  getCurrentSpan()?.setPrompt(prompt)
  const result = await llm.complete(prompt)
  return result
}

const traced = bitfab.withSpan("classification", { type: "llm" }, classifyText)

The last setPrompt call wins — it overwrites any previously set prompt on the span. Calling setPrompt outside a span context is a no-op (it never crashes).

Supported Frameworks

Bitfab provides automatic tracing for popular AI frameworks. See the dedicated guides for full API references:

LangGraph / LangChain

Callback handler for graph nodes, LLM calls, and tools

OpenAI Agents SDK

Trace processor for agent runs

BAML

Auto-capture prompts and LLM metadata

Claude Agent SDK

Capture LLM turns, tool calls, and subagents

Trace Context

Use getCurrentTrace() to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:

import { getCurrentTrace } from "bitfab"

const traced = bitfab.withSpan("order-processing", { type: "function" }, async () => {
  const trace = getCurrentTrace()

  // Set session ID (stored as database column, filterable in dashboard)
  trace?.setSessionId("session-123")

  // Set trace metadata (stored in raw trace data)
  trace?.setMetadata({ region: "us-west-2", environment: "production" })

  // Add context entries (stored as key-value pairs, accumulates across calls)
  trace?.addContext({ workflow: "checkout-flow", batch_id: "batch-2024-01" })

  return { status: "completed" }
})

setSessionId(id) — Groups traces by user session. Stored as a database column for efficient filtering.
setMetadata(obj) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
addContext(obj) — Key-value context entries. Accumulates across multiple calls.

Error Handling

Errors are captured in the span and re-raised:

const risky = bitfab.withSpan("risky-service", () => {
  throw new Error("error")
})

try {
  risky()
} catch (e) {
  // Span records error and timing
}

Advanced Configuration

new Bitfab({
  apiKey: string,                    // Required
  serviceUrl?: string,               // Default: https://bitfab.ai
  timeout?: number,                  // Request timeout in ms (default: 120000)
  envVars?: { OPENAI_API_KEY: string },  // For native function execution
  enabled?: boolean,                 // Default: true
  bamlClient?: unknown               // Generated BAML client (for wrapBAML)
})

timeout: Request timeout in milliseconds for API calls. Defaults to 120000 (2 minutes).
envVars: Pass LLM provider API keys for native function execution via call().
enabled: When false, all tracing is disabled. Wrapped functions still execute normally but no spans are sent.
bamlClient: The generated BAML client instance (e.g., b from @baml). See BAML framework guide for full usage.

Replay

Replay historical traces through an updated function version to compare outputs:

const pipeline = bitfab.getFunction("my-function")

const updatedFn = pipeline.withSpan(
  { name: "UpdatedPipeline", type: "function" },
  async (input: string) => {
    // New implementation to test
    return { result: input.toUpperCase() }
  },
)

// Replay all traces (up to limit)
const result = await bitfab.replay("my-function", updatedFn, {
  limit: 10,
  maxConcurrency: 5,  // Default: 10
})

// Replay specific traces by ID
const result2 = await bitfab.replay("my-function", updatedFn, {
  traceIds: ["trace-id-1", "trace-id-2"],
})

// Result structure
console.log(result.testRunId)   // Test run identifier
console.log(result.testRunUrl)  // Dashboard URL
for (const item of result.items) {
  console.log(item.input)           // Original input
  console.log(item.result)          // New output
  console.log(item.originalOutput)  // Original output
  console.log(item.error)           // Error if any
  console.log(item.durationMs)      // Original trace duration in ms (or null)
  console.log(item.tokens)          // { input, output, cached, total } or null
  console.log(item.model)           // Original model name, or null
}

Per-item metrics (durationMs, tokens, model) come from the historical trace that fed this replay item. Use them to reason about latency and cost: average tokens.total tells you how much the old runs cost, and durationMs tells you how long they took. Each field is null when the underlying trace didn’t capture it. Options:

limit — Maximum number of traces to replay (default: all)
traceIds — Specific trace IDs to replay
maxConcurrency — Number of traces to replay in parallel (default: 10)
codeChangeDescription — Optional rationale for the code change being tested in this replay (stored on the experiment)
codeChangeFiles — Optional list of edited files, each as { path, before, after } (use "" for newly created or deleted files)
mock — Mock strategy for child spans during replay: "none" (default, run real code), "all" (return historical output for every child), or "marked" (only return historical output for child spans declared with mockOnReplay: true). See Mocking child spans during replay below.

Mocking child spans during replay

When iterating on a root function, child spans sometimes fail in your local environment for reasons unrelated to the code under test: a paid API key is missing, an external service is flaky, or a production-only DB row isn’t seeded locally. The mock option lets the child return its recorded output so the root function can still run. Three strategies on replay():

"none" (default): every child span runs real code. Use when your local environment can faithfully reproduce the trace.
"all": every descendant withSpan returns its historical output. The root function still runs real, but every child is short-circuited. Useful for a quick sanity-check against recorded data; not the recommended iteration strategy because changes to descendants won’t actually execute.
"marked": only descendants declared with mockOnReplay: true are short-circuited; everything else runs real. This is the iteration-friendly mode.

Per-span opt-in via SpanOptions.mockOnReplay:

const fetchArticle = bitfab.withSpan(
  "fetch-article-from-db",
  { mockOnReplay: true },
  async (id: string) => db.articles.findById(id),
)

const summarize = bitfab.withSpan(
  "summarize-article",
  async (article: Article) => /* real summarization, no flag */
)

const processArticle = bitfab.withSpan(
  "process-article",
  async (id: string) => summarize(await fetchArticle(id)),
)

// During replay, fetch-article-from-db returns its recorded output;
// summarize-article runs real so you can iterate on it.
const result = await bitfab.replay("process-article", processArticle, {
  limit: 10,
  mock: "marked",
})

mockOnReplay is a per-span tag at definition time — it has no effect outside replay, and it’s only read under mock: "marked". The root function always runs real code; only descendants can be mocked. When no historical span matches a child call (e.g. the recorded trace didn’t reach that branch), execution falls through to the real function — never silent omission.

Attaching a Code Change

Each replay creates an experiment (test run). When you’re iterating on a function and replaying after every edit, attach the change so the dashboard can show exactly what was edited alongside the results. The agent reads each file before editing, edits, then reads it again — the two strings go straight into codeChangeFiles. There’s no diff format to construct.

import { readFileSync } from "node:fs"

const before = readFileSync("src/foo.ts", "utf8")
// ...edit src/foo.ts...
const after = readFileSync("src/foo.ts", "utf8")

const result = await bitfab.replay("my-function", updatedFn, {
  codeChangeDescription: "fix off-by-one in retry logic",
  codeChangeFiles: [{ path: "src/foo.ts", before, after }],
})

Both options are optional and independent — you can pass just codeChangeDescription for a quick rationale-only annotation, or just codeChangeFiles to record the literal edits. Notes:

Use a single Bitfab client across instrumentation and replay. If your instrumented module constructs new Bitfab() at import and your replay script constructs another, they do not share registered trace functions — import the client from the instrumented module (or a shared singleton) rather than constructing a new one in the replay script.

Replay Output Contract

Replay results are typically consumed by automation (CI logs, code reviewers, and coding agents reading stdout). Emit the full ReplayResult as a single JSON block so a consumer can JSON.parse it and reason about every field, including the new per-item durationMs, tokens, and model. Never print only lengths, counts, hashes, or truncated previews, and never replace the JSON block with ad-hoc per-field log lines. Recommended script tail (TypeScript):

const result = await bitfab.replay("my-function", updatedFn, { limit })

// Optional: human-readable summary first.
console.log(`Test run: ${result.testRunUrl}`)
console.log(`Items:    ${result.items.length}`)

// Then: full structured dump, ready for JSON.parse.
console.log(JSON.stringify(result, null, 2))

The dumped object includes every item’s input, result, originalOutput, error, durationMs, tokens, and model, plus testRunId and testRunUrl. Writing the same JSON to scripts/replay-result.json in parallel is optional but useful for later analysis. Per-item errors are part of the contract. If the wrapped function throws on a given trace, bitfab.replay catches it, sets item.error, leaves item.result undefined, and continues. Treat items with item.error set as unreplayable, not as failing outputs — compute pass/fail only over items where it’s unset. This matters most for DB reads/writes: a stale FK, missing record, or rejected write is infra failure, not a regression. Don’t swallow per-item errors in the script. A custom try/catch that returns a placeholder turns infra failures into fake successes. Let the SDK record them. The only allowed top-level catch is a fatal handler around main() that exits non-zero, so callers can tell a whole-replay crash from a clean run with some unreplayable items. Environment. Replay executes in the app’s own process — the instrumented function is imported as a library, and its DB clients, env vars, config loaders, and model IDs resolve from whatever environment the replay script is run under. The script must bootstrap the same environment the app uses (e.g. import "dotenv/config" at the top, or run via pnpm with-env tsx scripts/replay.ts). Do not mock these — they’re the same dependencies the app resolves in production. For replay to see the same DB rows the trace was captured against, point the script at the trace’s source environment (the environment field on the trace — production / staging / development). Input serialization caveat. Replay deserializes historical span inputs and passes them back to your function. This works for strings, numbers, and plain objects. If your span wraps a function that takes hydrated domain objects (ORM models, class instances, DB records), they won’t round-trip through serialization — move the span to where inputs are IDs or plain data and let the function fetch objects internally, or reshape arguments in the wrapper.

Replay Script

Create a standalone script to regression-test your trace functions against production data with one command. The script maps pipeline names to their replay functions, accepts CLI flags, and prints a side-by-side comparison with delta summaries.

/**
 * Replay production traces through instrumented functions.
 *
 * Uses bitfab.replay() to fetch real traces and re-run them through
 * the current code, creating a test run for side-by-side comparison.
 *
 * Usage:
 *   npx tsx scripts/replay.ts <pipeline>
 *   npx tsx scripts/replay.ts <pipeline> --limit 20
 *   npx tsx scripts/replay.ts <pipeline> --trace-ids id1,id2
 */
import "dotenv/config"
import { bitfab } from "../lib/bitfab"
import { extractMemories } from "../services/extraction"
import { searchDocuments } from "../services/search"

const FUNCTIONS = {
  extraction: "my-extraction-pipeline",
  search: "my-search-pipeline",
} as const

type Pipeline = keyof typeof FUNCTIONS

const pipeline = process.argv[2] as Pipeline | undefined
const args = process.argv.slice(3)

if (!pipeline || !FUNCTIONS[pipeline]) {
  console.error(
    `Usage: npx tsx scripts/replay.ts <${Object.keys(FUNCTIONS).join("|")}> [--limit N] [--trace-ids id1,id2]`,
  )
  process.exit(1)
}

let limit = 10
let traceIds: string[] | undefined

for (let i = 0; i < args.length; i++) {
  if (args[i] === "--limit" && args[i + 1]) {
    limit = Number.parseInt(args[i + 1], 10)
    i++
  } else if (args[i] === "--trace-ids" && args[i + 1]) {
    traceIds = args[i + 1].split(",").map((id) => id.trim())
    i++
  }
}

// Each pipeline gets its own replay function — replay deserializes
// historical inputs, so if the function signature doesn't match the
// raw input shape, reshape the arguments in a thin wrapper here.

async function replayExtraction() {
  const fn = async (conversation: string, existingItems: unknown[]) => {
    return extractMemories(conversation, existingItems)
  }
  return bitfab.replay(FUNCTIONS.extraction, fn, {
    limit,
    ...(traceIds ? { traceIds } : {}),
  })
}

async function replaySearch() {
  const fn = async (query: string, opts: Record<string, unknown>) => {
    return searchDocuments(query, {
      userId: opts.userId as string,
      limit: (opts.limit as number) ?? 10,
    })
  }
  return bitfab.replay(FUNCTIONS.search, fn, {
    limit,
    ...(traceIds ? { traceIds } : {}),
  })
}

const REPLAY_FNS: Record<Pipeline, () => ReturnType<typeof bitfab.replay>> = {
  extraction: replayExtraction,
  search: replaySearch,
}

async function main() {
  const functionKey = FUNCTIONS[pipeline!]
  console.log(`[replay] Replaying ${traceIds?.length ?? limit} traces from "${functionKey}"...\n`)

  const result = await REPLAY_FNS[pipeline!]()
  console.log(`Test run: ${result.testRunUrl}\n`)

  let changed = 0, same = 0, errors = 0

  for (const item of result.items) {
    const input = item.input as unknown[] | undefined
    const label = typeof input?.[0] === "string"
      ? input[0].slice(0, 80)
      : JSON.stringify(input?.[0] ?? "unknown").slice(0, 80)

    if (item.error) {
      console.log(`  ✗ "${label}"`)
      console.log(`    Error: ${item.error}`)
      errors++
    } else {
      const origStr =
        typeof item.originalOutput === "string"
          ? item.originalOutput
          : JSON.stringify(item.originalOutput)
      const newStr =
        typeof item.result === "string"
          ? item.result
          : JSON.stringify(item.result)
      const isSame = origStr === newStr
      const marker = isSame ? "=" : "Δ"

      console.log(`  ${marker} "${label}"`)
      console.log(`    Original: ${origStr}`)
      console.log(`    New:      ${newStr}`)

      isSame ? same++ : changed++
    }
  }

  console.log(`\n─── Summary ───`)
  console.log(`  Pipeline: ${pipeline}`)
  console.log(`  Replayed: ${result.items.length}`)
  console.log(`  Same:     ${same}`)
  console.log(`  Changed:  ${changed}`)
  if (errors > 0) console.log(`  Errors:   ${errors}`)
  console.log(`\n  ${result.testRunUrl}`)
}

main().catch((err) => {
  console.error("[replay] Fatal error:", err)
  process.exit(1)
})

Adapt the imports, pipeline names, and per-pipeline replay functions to match your project’s instrumented workflows.

Documentation Index

​Installation

​Quick Start

​Basic Configuration

​Tracing

​Custom (Recommended)

​Using getFunction() to Link Spans

​Multi-File Projects

​Wrapping Existing Functions Inline

​Using withSpan() Directly

​Automatic Nesting

​Span Options

​Span Context

​Span Trace ID

​Span Prompt

​Supported Frameworks

LangGraph / LangChain

OpenAI Agents SDK

BAML

Claude Agent SDK

​Trace Context

​Error Handling

​Advanced Configuration

​Replay

​Mocking child spans during replay

​Attaching a Code Change

​Replay Output Contract

​Replay Script

Installation

Quick Start

Basic Configuration

Tracing

Custom (Recommended)

Using `getFunction()` to Link Spans

Multi-File Projects

Wrapping Existing Functions Inline

Using `withSpan()` Directly

Automatic Nesting

Span Options

Span Context

Span Trace ID

Span Prompt

Supported Frameworks

Trace Context

Error Handling

Advanced Configuration

Replay

Mocking child spans during replay

Attaching a Code Change

Replay Output Contract

Replay Script