Ruby SDK - Bitfab

The Bitfab Ruby SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.

Framework-native adapters (LangGraph, OpenAI Agents, BAML, Claude Agent SDK) are not yet available for Ruby. See Frameworks overview for current coverage. Instrument Ruby code manually via Bitfab::Traceable or Bitfab.span.

Installation

# Bundler
bundle add bitfab

# Gem
gem install bitfab

Quick Start

require "bitfab"

Bitfab.configure(api_key: ENV.fetch("BITFAB_API_KEY"))

Need an API key? Get one from the Bitfab dashboard or see the API Keys guide for detailed setup instructions.

Coding Agent Prompt (Cursor, Claude Code)

Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):

Modify existing Ruby code to add Bitfab tracing.
Do NOT browse or web search. Use ONLY the API described below.

Bitfab Ruby SDK (authoritative excerpt):
- Install: `gem install bitfab` or `bundle add bitfab`
- Init:
  require "bitfab"
  Bitfab.configure(api_key: ENV.fetch("BITFAB_API_KEY"))
- Instrumentation (ONLY allowed form):
  class MyService
    include Bitfab::Traceable
    bitfab_function "<trace_function_key>"

    bitfab_span :method_name, type: "function"
    def method_name
      # ...
    end
  end
  (bitfab_span must be placed immediately ABOVE the `def` it instruments.)
- Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"
- DO NOT use a block form of bitfab_span.
- DO NOT extract helper methods.

Task:
1) Ensure the bitfab gem is added and initialization exists (Gemfile + initializer).
2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).
3) Present me with a numbered list of workflows you found. For each, describe:
   - What it does
   - Why it's worth instrumenting — what visibility tracing gives you into each step
4) After I choose which workflow(s) to instrument:
   - Add `bitfab_span` directly ABOVE each method's `def`
   - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues
   - Ensure each class includes `Bitfab::Traceable` and has `bitfab_function` set
5) Do not change method signature, behavior, or return value. Minimal diff.

Output:
- First: your numbered list of workflows with why each is worth instrumenting
- After my selection: minimal diffs for Gemfile, initializer, and the method changes

Basic Configuration

Bitfab.configure(api_key: String)

# Disable tracing (functions still execute, but no spans are sent)
Bitfab.configure(api_key: String, enabled: false)

Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All instrumented methods still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.

Tracing

Custom (Recommended)

Using `Bitfab::Traceable` to Link Spans

Include Bitfab::Traceable in a class, declare the trace function key once with bitfab_function, then use bitfab_span to wrap methods. Three declaration styles are supported:

class OrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  # Style 1: Before-def (recommended) — declare bitfab_span above the method
  bitfab_span :process_order, type: "function"
  def process_order(order_id)
    { order_id: order_id }
  end

  # Style 2: Inline — wrap the def directly
  bitfab_span def validate_order(order_id)
    { valid: true }
  end, type: "guardrail"

  # Style 3: After-def — declare bitfab_span after the method definition
  def enrich_order(order_id)
    { enriched: true }
  end
  bitfab_span :enrich_order, type: "function"
end

All three styles are equivalent. The before-def style is recommended for readability.

Multi-File Projects

For projects with instrumented methods spread across multiple files, create an initializer that configures Bitfab, then include Bitfab::Traceable in any class that needs tracing.

# config/initializers/bitfab.rb — single source of truth
require "bitfab"
Bitfab.configure(api_key: ENV.fetch("BITFAB_API_KEY"))

# app/services/process_order_service.rb
class ProcessOrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  bitfab_span :process_order, type: "function"
  def process_order(order_id)
    { order_id: order_id }
  end
end

# app/services/validate_order_service.rb
class ValidateOrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  bitfab_span :validate_order, type: "guardrail"
  def validate_order(order_id)
    { valid: true }
  end
end

Classes sharing the same bitfab_function key are grouped together. Spans from different classes are automatically linked as parent-child when one instrumented method calls another.

Using `bitfab_span` with Explicit Key

For a single span with an explicit trace function key:

class StandaloneService
  include Bitfab::Traceable

  bitfab_span :standalone_task, trace_function_key: "one-off-operation"
  def standalone_task
    "done"
  end
end

Automatic Nesting

Spans nest automatically based on call stack:

class Pipeline
  include Bitfab::Traceable
  bitfab_function "pipeline"

  bitfab_span :outer, type: "agent"
  def outer
    inner  # Becomes a child of "outer"
  end

  bitfab_span :inner, type: "function"
  def inner
    # ...
  end
end

Span Options

Parameters:

method_name (required): Symbol of the method to wrap
trace_function_key (optional): Override class-level bitfab_function
name (optional): Display name. Defaults to method name
type (optional): Span type. Defaults to "custom"

Span Types:

SPAN_TYPES = %w[
  llm        # LLM calls
  agent      # Agent workflows
  function   # Function calls
  guardrail  # Safety checks
  handoff    # Human handoffs
  custom     # Default
]

Examples:

class SafetyService
  include Bitfab::Traceable
  bitfab_function "safety-service"

  # Method name is automatically captured as span name
  bitfab_span :check_safety, type: "guardrail"
  def check_safety(content)
    { safe: !content.include?("unsafe") }
  end

  # Override with name option
  bitfab_span :validate_input, name: "InputValidator", type: "guardrail"
  def validate_input(input)
    { valid: !input.empty? }
  end
end

Span Context

Use Bitfab.current_span to get a handle to the active span, then call .add_context() to attach contextual key-value pairs from inside a traced method — useful for runtime values like request IDs, computed scores, or dynamic context:

class OrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  bitfab_span :process_order, type: "function"
  def process_order(order_id)
    user_id = current_user_id
    Bitfab.current_span.add_context("user_id" => user_id, "order_id" => order_id)
    { order_id: order_id, status: "completed" }
  end
end

Each add_context call pushes the entire hash as one entry. Multiple calls accumulate entries:

Bitfab.current_span.add_context("user_id" => "u-123")
Bitfab.current_span.add_context("request_id" => "req-789")
# Result: contexts: [{ "user_id" => "u-123" }, { "request_id" => "req-789" }]

You can also access the current trace ID via Bitfab.current_span.trace_id (returns an empty string outside a span):

trace_id = Bitfab.current_span.trace_id

Span Prompt

Use Bitfab.current_span to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:

class ClassificationService
  include Bitfab::Traceable
  bitfab_function "classification"

  bitfab_span :classify_text, type: "llm"
  def classify_text(text)
    prompt = "Classify the following text: #{text}"
    Bitfab.current_span.set_prompt(prompt)
    llm.complete(prompt)
  end
end

The last set_prompt call wins — it overwrites any previously set prompt on the span. Calling set_prompt outside a span context is a no-op (it never crashes).

Trace Context

Use Bitfab.current_trace to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:

class OrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  bitfab_span :process_order, type: "function"
  def process_order(order_id)
    trace = Bitfab.current_trace

    # Set session ID (stored as database column, filterable in dashboard)
    trace.set_session_id("session-123")

    # Set trace metadata (stored in raw trace data)
    trace.set_metadata("region" => "us-west-2", "environment" => "production")

    # Add context entries (stored as key-value pairs, accumulates across calls)
    trace.add_context("workflow" => "checkout-flow", "batch_id" => "batch-2024-01")

    { order_id: order_id, status: "completed" }
  end
end

set_session_id(id) — Groups traces by user session. Stored as a database column for efficient filtering.
set_metadata(hash) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
add_context(hash) — Key-value context entries. Accumulates across multiple calls.

Error Handling

Errors are captured in the span and re-raised:

class RiskyService
  include Bitfab::Traceable
  bitfab_function "risky-service"

  bitfab_span :risky
  def risky
    raise "error"
  end
end

begin
  RiskyService.new.risky
rescue => e
  # Span records error and timing
end

Flushing Traces

Bitfab.flush_traces(timeout: 30)  # Default: 30s

Traces flush automatically on process exit via at_exit hook.

Wrapping Third-Party Methods

Use Bitfab::Traceable.wrap to trace methods on external classes:

require "openai"

Bitfab::Traceable.wrap(
  OpenAI::Client, :chat,
  trace_function_key: "openai",
  name: "Chat",
  type: "llm"
)

# Now all calls to client.chat are traced
client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
client.chat(parameters: { model: "gpt-4", messages: [...] })

Replay

Replay historical traces through a method to create test runs. This re-runs past inputs through your updated code and compares the results.

client = Bitfab.client

# Pass an instance for instance methods
service = OrderService.new(api_key: "...", db: db)
result = client.replay(
  service, :process_order,
  trace_function_key: "order-processing",
  limit: 5
)

# Or pass a Class for class methods
result = client.replay(
  OrderService, :process_order,
  trace_function_key: "order-processing",
  limit: 5
)

puts "Test Run URL: #{result[:test_run_url]}"
puts "Test Run ID:  #{result[:test_run_id]}"

result[:items].each do |item|
  puts "Input: #{item[:input]}, Result: #{item[:result]}, Error: #{item[:error]}"
  puts "Duration (ms): #{item[:duration_ms]}"
  puts "Tokens: #{item[:tokens]}"    # { input:, output:, cached:, total: } or nil
  puts "Model: #{item[:model]}"
end

Per-item :duration_ms, :tokens, and :model come from the historical trace that fed the item, so you can reason about the cost and latency of the old runs. Each field is nil when the underlying trace didn’t capture it. Parameters:

receiver (required): An instance for instance methods, or a Class for class methods
method_name (required): Symbol of the method to replay
trace_function_key (required): The trace function key
limit (optional): Max traces to replay (default: 5)
trace_ids (optional): Array of specific trace IDs to replay
max_concurrency (optional): Max threads for parallel replay (default: 10)
code_change_description (optional): Rationale for the code change being tested in this replay (stored on the experiment)
code_change_files (optional): Array of edited files, each as { path:, before:, after: } (use "" for newly created or deleted files)
mock (optional): Mock strategy for child spans during replay. One of "none" (default, every child runs real code), "all" (every child returns its historical output), or "marked" (only spans tagged with mock_on_replay: true return historical output)

Notes:

receiver + method_name must resolve to the method that carries the traceable decoration — passing a plain wrapper around it will not resolve the trace function key.
trace_function_key is passed explicitly, so instance methods and class methods are disambiguated by the receiver.
Use a single Bitfab::Client across instrumentation and replay. If your instrumented module constructs a client at load and your replay script constructs another, they do not share registered trace functions — import the client from the instrumented module (or a shared singleton) rather than constructing a new one in the replay script.

Replay specific traces:

service = OrderService.new(api_key: "...")
result = client.replay(
  service, :process_order,
  trace_function_key: "order-processing",
  trace_ids: ["trace-id-1", "trace-id-2"]
)

Attaching a code change: Each replay creates an experiment (test run). When you’re iterating on a method and replaying after every edit, attach the change so the dashboard can show exactly what was edited alongside the results. Read each file before editing, edit, then read it again — the two strings go straight into code_change_files. There’s no diff format to construct.

before = File.read("lib/order_service.rb")
# ...edit lib/order_service.rb...
after = File.read("lib/order_service.rb")

result = client.replay(
  service, :process_order,
  trace_function_key: "order-processing",
  code_change_description: "fix off-by-one in retry logic",
  code_change_files: [
    {path: "lib/order_service.rb", before:, after:}
  ]
)

Both options are optional and independent — pass just code_change_description for a quick rationale-only annotation, or just code_change_files to record the literal edits. Mock child spans during replay: By default replay runs every child span’s real code. That’s fine for cheap, deterministic children, but expensive when children make paid LLM/API calls. Three mock strategies let you skip those child invocations and return their historical outputs instead:

# "none" (default) — every child span runs real code
client.replay(service, :process_order, trace_function_key: "order-processing", mock: "none")

# "all" — every child span returns its historical output, real code never runs
client.replay(service, :process_order, trace_function_key: "order-processing", mock: "all")

# "marked" — only spans declared with mock_on_replay: true return historical output
client.replay(service, :process_order, trace_function_key: "order-processing", mock: "marked")

Tag the child spans you want mocked at definition time:

class OrderService
  include Bitfab::Traceable
  bitfab_function "order-processing"

  # Paid LLM call — skip during replay under mock: "marked"
  bitfab_span :classify_intent, type: "llm", mock_on_replay: true
  def classify_intent(prompt); end

  # Cheap, deterministic — keep running real
  bitfab_span :persist, type: "function"
  def persist(order); end

  bitfab_span :process_order, type: "agent"
  def process_order(order)
    classify_intent(order.description)
    persist(order)
  end
end

Use mock: "marked" when you want to iterate on process_order’s logic without paying for the LLM call on each replay. Use mock: "all" when the goal is the cheapest possible replay (every child span returns its recorded output; only the root function executes real code). Repeated calls to the same trace_function_key are distinguished by call order — so step:0, step:1, step:2 correspond to the first, second, and third invocations. Unmarked spans still advance the counter, so a marked sibling that runs after an unmarked one lines up with the right historical entry.

Fluent API: `client.get_function`

Bind a trace_function_key once and wrap multiple classes or methods against it. Mirrors client.get_function in the Python SDK and client.getFunction in TypeScript.

fn = Bitfab.client.get_function("openai")

fn.wrap(OpenAI::Client, :chat, name: "Chat", type: "llm")
fn.wrap(OpenAI::Client, :embeddings, name: "Embed", type: "llm")

#wrap accepts the same options as Bitfab::Traceable.wrap (name, type, mock_on_replay), but the trace_function_key is fixed to the one bound on the returned Bitfab::BitfabFunction.

Replay Output Contract

Replay results are typically consumed by automation (CI logs, code reviewers, and coding agents reading stdout). Emit the full replay result hash as a single JSON block so a consumer can JSON.parse it and reason about every field, including the new per-item :duration_ms, :tokens, and :model. Never print only lengths, counts, hashes, or truncated previews, and never replace the JSON block with ad-hoc per-field log lines. Recommended script tail:

result = Bitfab.client.replay(service, :process_order, trace_function_key: "order-processing", limit:)

# Optional: human-readable summary first.
puts "Test run: #{result[:test_run_url]}"
puts "Items:    #{result[:items].length}"

# Then: full structured dump, ready for JSON.parse.
puts JSON.pretty_generate(result)

The dumped object includes every item’s :input, :result, :original_output, :error, :duration_ms, :tokens, and :model, plus :test_run_id and :test_run_url. Writing the same JSON to scripts/replay-result.json in parallel is optional but useful for later analysis. Per-item errors are part of the contract. If the wrapped method raises on a given trace, Bitfab.replay rescues it, sets item[:error], leaves item[:result] as nil, and continues. Treat items with item[:error] set as unreplayable, not as failing outputs — compute pass/fail only over items where it’s nil. This matters most for DB reads/writes: a stale FK, missing record, or rejected write is infra failure, not a regression. Don’t swallow per-item errors in the script. A custom begin/rescue that returns a placeholder turns infra failures into fake successes. Let the SDK record them. The only allowed top-level rescue is a fatal handler around main that exits non-zero, so callers can tell a whole-replay crash from a clean run with some unreplayable items. Environment. Replay executes in the app’s own process — the instrumented method is loaded as a library, and its DB clients, env vars, config loaders, and model IDs resolve from whatever environment the replay script is run under. The script must bootstrap the same environment the app uses (e.g. require "dotenv/load" at the top, or run via bundle exec dotenv ruby scripts/replay.rb). Do not mock these — they’re the same dependencies the app resolves in production. For replay to see the same DB rows the trace was captured against, point the script at the trace’s source environment (the :environment field on the trace — production / staging / development). Input serialization caveat. Replay deserializes historical span inputs and passes them back to your method. This works for strings, numbers, and plain hashes. If your span wraps a method that takes hydrated domain objects (ActiveRecord models, class instances, DB records), they won’t round-trip through serialization — move the span to where inputs are IDs or plain data and let the method fetch objects internally, or reshape arguments in the wrapper.

Replay Script

Create a standalone script to regression-test your trace functions against production data with one command. The script maps pipeline names to their replay functions, accepts CLI flags, and prints a side-by-side comparison with delta summaries.

#!/usr/bin/env ruby
# Replay production traces through instrumented functions.
#
# Uses Bitfab.client.replay to fetch real traces and re-run them
# through the current code, creating a test run for side-by-side comparison.
#
# Usage:
#   ruby scripts/replay.rb <pipeline>
#   ruby scripts/replay.rb <pipeline> --limit 20
#   ruby scripts/replay.rb <pipeline> --trace-ids id1,id2

require "dotenv/load"
require "json"
require_relative "../config/initializers/bitfab"
require_relative "../services/extraction_pipeline"
require_relative "../services/search_pipeline"

FUNCTIONS = {
  "extraction" => "my-extraction-pipeline",
  "search" => "my-search-pipeline",
}.freeze

pipeline = ARGV[0]

unless pipeline && FUNCTIONS.key?(pipeline)
  warn "Usage: ruby scripts/replay.rb <#{FUNCTIONS.keys.join('|')}> [--limit N] [--trace-ids id1,id2]"
  exit 1
end

limit = 10
trace_ids = nil

ARGV[1..].each_with_index do |arg, i|
  case arg
  when "--limit"
    limit = ARGV[i + 2].to_i
  when "--trace-ids"
    trace_ids = ARGV[i + 2].split(",").map(&:strip)
  end
end

# Each pipeline gets its own replay method — replay deserializes
# historical inputs, so if the method signature doesn't match the
# raw input shape, reshape the arguments in a thin wrapper here.

def replay_extraction(limit:, trace_ids:)
  service = ExtractionPipeline.new
  Bitfab.client.replay(
    service, :extract_memories,
    trace_function_key: FUNCTIONS["extraction"],
    limit:, trace_ids:
  )
end

def replay_search(limit:, trace_ids:)
  service = SearchPipeline.new
  Bitfab.client.replay(
    service, :search_documents,
    trace_function_key: FUNCTIONS["search"],
    limit:, trace_ids:
  )
end

REPLAY_FNS = {
  "extraction" => method(:replay_extraction),
  "search" => method(:replay_search),
}.freeze

function_key = FUNCTIONS[pipeline]
puts "[replay] Replaying #{trace_ids&.length || limit} traces from \"#{function_key}\"...\n"

result = REPLAY_FNS[pipeline].call(limit:, trace_ids:)
puts "Test run: #{result[:test_run_url]}\n"

changed = same = errors = 0
result[:items].each do |item|
  raw_input = item[:input] || []
  label = (raw_input.first || "unknown").to_s[0, 80]

  if item[:error]
    puts "  ✗ \"#{label}\""
    puts "    Error: #{item[:error]}"
    errors += 1
  else
    orig = item[:original_output]
    new_val = item[:result]
    orig_str = orig.is_a?(String) ? orig : orig.to_json
    new_str = new_val.is_a?(String) ? new_val : new_val.to_json
    is_same = orig_str == new_str
    marker = is_same ? "=" : "Δ"

    puts "  #{marker} \"#{label}\""
    puts "    Original: #{orig_str}"
    puts "    New:      #{new_str}"

    is_same ? same += 1 : changed += 1
  end
end

puts "\n─── Summary ───"
puts "  Pipeline: #{pipeline}"
puts "  Replayed: #{result[:items].length}"
puts "  Same:     #{same}"
puts "  Changed:  #{changed}"
puts "  Errors:   #{errors}" if errors > 0
puts "\n  #{result[:test_run_url]}"

Adapt the imports, pipeline names, and per-pipeline replay methods to match your project’s instrumented workflows.

Documentation Index

​Installation

​Quick Start

​Basic Configuration

​Tracing

​Custom (Recommended)

​Using Bitfab::Traceable to Link Spans

​Multi-File Projects

​Using bitfab_span with Explicit Key

​Automatic Nesting

​Span Options

​Span Context

​Span Prompt

​Trace Context

​Error Handling

​Flushing Traces

​Wrapping Third-Party Methods

​Replay

​Fluent API: client.get_function

​Replay Output Contract

​Replay Script

Installation

Quick Start

Basic Configuration

Tracing

Custom (Recommended)

Using `Bitfab::Traceable` to Link Spans

Multi-File Projects

Using `bitfab_span` with Explicit Key

Automatic Nesting

Span Options

Span Context

Span Prompt

Trace Context

Error Handling

Flushing Traces

Wrapping Third-Party Methods

Replay

Fluent API: `client.get_function`

Replay Output Contract

Replay Script