Use this file to discover all available pages before exploring further.
The Bitfab Ruby SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.
Framework-native adapters (LangGraph, OpenAI Agents, BAML, Claude Agent SDK) are not yet available for Ruby. See Frameworks overview for current coverage. Instrument Ruby code manually via Bitfab::Traceable or Bitfab.span.
Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):
Modify existing Ruby code to add Bitfab tracing.Do NOT browse or web search. Use ONLY the API described below.Bitfab Ruby SDK (authoritative excerpt):- Install: `gem install bitfab` or `bundle add bitfab`- Init: require "bitfab" Bitfab.configure(api_key: ENV.fetch("BITFAB_API_KEY"))- Instrumentation (ONLY allowed form): class MyService include Bitfab::Traceable bitfab_function "<trace_function_key>" bitfab_span :method_name, type: "function" def method_name # ... end end (bitfab_span must be placed immediately ABOVE the `def` it instruments.)- Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"- DO NOT use a block form of bitfab_span.- DO NOT extract helper methods.Task:1) Ensure the bitfab gem is added and initialization exists (Gemfile + initializer).2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).3) Present me with a numbered list of workflows you found. For each, describe: - What it does - Why it's worth instrumenting — what visibility tracing gives you into each step4) After I choose which workflow(s) to instrument: - Add `bitfab_span` directly ABOVE each method's `def` - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues - Ensure each class includes `Bitfab::Traceable` and has `bitfab_function` set5) Do not change method signature, behavior, or return value. Minimal diff.Output:- First: your numbered list of workflows with why each is worth instrumenting- After my selection: minimal diffs for Gemfile, initializer, and the method changes
Bitfab.configure(api_key: String)# Disable tracing (functions still execute, but no spans are sent)Bitfab.configure(api_key: String, enabled: false)
Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All instrumented methods still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.
Include Bitfab::Traceable in a class, declare the trace function key once with bitfab_function, then use bitfab_span to wrap methods. Three declaration styles are supported:
class OrderService include Bitfab::Traceable bitfab_function "order-processing" # Style 1: Before-def (recommended) — declare bitfab_span above the method bitfab_span :process_order, type: "function" def process_order(order_id) { order_id: order_id } end # Style 2: Inline — wrap the def directly bitfab_span def validate_order(order_id) { valid: true } end, type: "guardrail" # Style 3: After-def — declare bitfab_span after the method definition def enrich_order(order_id) { enriched: true } end bitfab_span :enrich_order, type: "function"end
All three styles are equivalent. The before-def style is recommended for readability.
For projects with instrumented methods spread across multiple files, create an initializer that configures Bitfab, then include Bitfab::Traceable in any class that needs tracing.
# config/initializers/bitfab.rb — single source of truthrequire "bitfab"Bitfab.configure(api_key: ENV.fetch("BITFAB_API_KEY"))
Classes sharing the same bitfab_function key are grouped together. Spans from different classes are automatically linked as parent-child when one instrumented method calls another.
name (optional): Display name. Defaults to method name
type (optional): Span type. Defaults to "custom"
Span Types:
SPAN_TYPES = %w[ llm # LLM calls agent # Agent workflows function # Function calls guardrail # Safety checks handoff # Human handoffs custom # Default]
Examples:
class SafetyService include Bitfab::Traceable bitfab_function "safety-service" # Method name is automatically captured as span name bitfab_span :check_safety, type: "guardrail" def check_safety(content) { safe: !content.include?("unsafe") } end # Override with name option bitfab_span :validate_input, name: "InputValidator", type: "guardrail" def validate_input(input) { valid: !input.empty? } endend
Use Bitfab.current_span to get a handle to the active span, then call .add_context() to attach contextual key-value pairs from inside a traced method — useful for runtime values like request IDs, computed scores, or dynamic context:
Use Bitfab.current_span to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:
class ClassificationService include Bitfab::Traceable bitfab_function "classification" bitfab_span :classify_text, type: "llm" def classify_text(text) prompt = "Classify the following text: #{text}" Bitfab.current_span.set_prompt(prompt) llm.complete(prompt) endend
The last set_prompt call wins — it overwrites any previously set prompt on the span. Calling set_prompt outside a span context is a no-op (it never crashes).
Use Bitfab.current_trace to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:
class OrderService include Bitfab::Traceable bitfab_function "order-processing" bitfab_span :process_order, type: "function" def process_order(order_id) trace = Bitfab.current_trace # Set session ID (stored as database column, filterable in dashboard) trace.set_session_id("session-123") # Set trace metadata (stored in raw trace data) trace.set_metadata("region" => "us-west-2", "environment" => "production") # Add context entries (stored as key-value pairs, accumulates across calls) trace.add_context("workflow" => "checkout-flow", "batch_id" => "batch-2024-01") { order_id: order_id, status: "completed" } endend
set_session_id(id) — Groups traces by user session. Stored as a database column for efficient filtering.
set_metadata(hash) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
add_context(hash) — Key-value context entries. Accumulates across multiple calls.
class RiskyService include Bitfab::Traceable bitfab_function "risky-service" bitfab_span :risky def risky raise "error" endendbegin RiskyService.new.riskyrescue => e # Span records error and timingend
Replay historical traces through a method to create test runs. This re-runs past inputs through your updated code and compares the results.
client = Bitfab.client# Pass an instance for instance methodsservice = OrderService.new(api_key: "...", db: db)result = client.replay( service, :process_order, trace_function_key: "order-processing", limit: 5)# Or pass a Class for class methodsresult = client.replay( OrderService, :process_order, trace_function_key: "order-processing", limit: 5)puts "Test Run URL: #{result[:test_run_url]}"puts "Test Run ID: #{result[:test_run_id]}"result[:items].each do |item| puts "Input: #{item[:input]}, Result: #{item[:result]}, Error: #{item[:error]}" puts "Duration (ms): #{item[:duration_ms]}" puts "Tokens: #{item[:tokens]}" # { input:, output:, cached:, total: } or nil puts "Model: #{item[:model]}"end
Per-item :duration_ms, :tokens, and :model come from the historical trace that fed the item, so you can reason about the cost and latency of the old runs. Each field is nil when the underlying trace didn’t capture it.Parameters:
receiver (required): An instance for instance methods, or a Class for class methods
method_name (required): Symbol of the method to replay
trace_function_key (required): The trace function key
limit (optional): Max traces to replay (default: 5)
trace_ids (optional): Array of specific trace IDs to replay
max_concurrency (optional): Max threads for parallel replay (default: 10)
code_change_description (optional): Rationale for the code change being tested in this replay (stored on the experiment)
code_change_files (optional): Array of edited files, each as { path:, before:, after: } (use "" for newly created or deleted files)
mock (optional): Mock strategy for child spans during replay. One of "none" (default, every child runs real code), "all" (every child returns its historical output), or "marked" (only spans tagged with mock_on_replay: true return historical output)
Notes:
receiver + method_name must resolve to the method that carries the traceable decoration — passing a plain wrapper around it will not resolve the trace function key.
trace_function_key is passed explicitly, so instance methods and class methods are disambiguated by the receiver.
Use a single Bitfab::Client across instrumentation and replay. If your instrumented module constructs a client at load and your replay script constructs another, they do not share registered trace functions — import the client from the instrumented module (or a shared singleton) rather than constructing a new one in the replay script.
Attaching a code change:Each replay creates an experiment (test run). When you’re iterating on a method and replaying after every edit, attach the change so the dashboard can show exactly what was edited alongside the results. Read each file before editing, edit, then read it again — the two strings go straight into code_change_files. There’s no diff format to construct.
Both options are optional and independent — pass just code_change_description for a quick rationale-only annotation, or just code_change_files to record the literal edits.Mock child spans during replay:By default replay runs every child span’s real code. That’s fine for cheap, deterministic children, but expensive when children make paid LLM/API calls. Three mock strategies let you skip those child invocations and return their historical outputs instead:
# "none" (default) — every child span runs real codeclient.replay(service, :process_order, trace_function_key: "order-processing", mock: "none")# "all" — every child span returns its historical output, real code never runsclient.replay(service, :process_order, trace_function_key: "order-processing", mock: "all")# "marked" — only spans declared with mock_on_replay: true return historical outputclient.replay(service, :process_order, trace_function_key: "order-processing", mock: "marked")
Tag the child spans you want mocked at definition time:
class OrderService include Bitfab::Traceable bitfab_function "order-processing" # Paid LLM call — skip during replay under mock: "marked" bitfab_span :classify_intent, type: "llm", mock_on_replay: true def classify_intent(prompt); end # Cheap, deterministic — keep running real bitfab_span :persist, type: "function" def persist(order); end bitfab_span :process_order, type: "agent" def process_order(order) classify_intent(order.description) persist(order) endend
Use mock: "marked" when you want to iterate on process_order’s logic without paying for the LLM call on each replay. Use mock: "all" when the goal is the cheapest possible replay (every child span returns its recorded output; only the root function executes real code).Repeated calls to the same trace_function_key are distinguished by call order — so step:0, step:1, step:2 correspond to the first, second, and third invocations. Unmarked spans still advance the counter, so a marked sibling that runs after an unmarked one lines up with the right historical entry.
Bind a trace_function_key once and wrap multiple classes or methods against it. Mirrors client.get_function in the Python SDK and client.getFunction in TypeScript.
#wrap accepts the same options as Bitfab::Traceable.wrap (name, type, mock_on_replay), but the trace_function_key is fixed to the one bound on the returned Bitfab::BitfabFunction.
Replay results are typically consumed by automation (CI logs, code reviewers, and coding agents reading stdout). Emit the full replay result hash as a single JSON block so a consumer can JSON.parse it and reason about every field, including the new per-item :duration_ms, :tokens, and :model. Never print only lengths, counts, hashes, or truncated previews, and never replace the JSON block with ad-hoc per-field log lines.Recommended script tail:
result = Bitfab.client.replay(service, :process_order, trace_function_key: "order-processing", limit:)# Optional: human-readable summary first.puts "Test run: #{result[:test_run_url]}"puts "Items: #{result[:items].length}"# Then: full structured dump, ready for JSON.parse.puts JSON.pretty_generate(result)
The dumped object includes every item’s :input, :result, :original_output, :error, :duration_ms, :tokens, and :model, plus :test_run_id and :test_run_url. Writing the same JSON to scripts/replay-result.json in parallel is optional but useful for later analysis.Per-item errors are part of the contract. If the wrapped method raises on a given trace, Bitfab.replay rescues it, sets item[:error], leaves item[:result] as nil, and continues. Treat items with item[:error] set as unreplayable, not as failing outputs — compute pass/fail only over items where it’s nil. This matters most for DB reads/writes: a stale FK, missing record, or rejected write is infra failure, not a regression.Don’t swallow per-item errors in the script. A custom begin/rescue that returns a placeholder turns infra failures into fake successes. Let the SDK record them. The only allowed top-level rescue is a fatal handler around main that exits non-zero, so callers can tell a whole-replay crash from a clean run with some unreplayable items.Environment. Replay executes in the app’s own process — the instrumented method is loaded as a library, and its DB clients, env vars, config loaders, and model IDs resolve from whatever environment the replay script is run under. The script must bootstrap the same environment the app uses (e.g. require "dotenv/load" at the top, or run via bundle exec dotenv ruby scripts/replay.rb). Do not mock these — they’re the same dependencies the app resolves in production. For replay to see the same DB rows the trace was captured against, point the script at the trace’s source environment (the :environment field on the trace — production / staging / development).Input serialization caveat. Replay deserializes historical span inputs and passes them back to your method. This works for strings, numbers, and plain hashes. If your span wraps a method that takes hydrated domain objects (ActiveRecord models, class instances, DB records), they won’t round-trip through serialization — move the span to where inputs are IDs or plain data and let the method fetch objects internally, or reshape arguments in the wrapper.
Create a standalone script to regression-test your trace functions against production data with one command. The script maps pipeline names to their replay functions, accepts CLI flags, and prints a side-by-side comparison with delta summaries.
#!/usr/bin/env ruby# Replay production traces through instrumented functions.## Uses Bitfab.client.replay to fetch real traces and re-run them# through the current code, creating a test run for side-by-side comparison.## Usage:# ruby scripts/replay.rb <pipeline># ruby scripts/replay.rb <pipeline> --limit 20# ruby scripts/replay.rb <pipeline> --trace-ids id1,id2require "dotenv/load"require "json"require_relative "../config/initializers/bitfab"require_relative "../services/extraction_pipeline"require_relative "../services/search_pipeline"FUNCTIONS = { "extraction" => "my-extraction-pipeline", "search" => "my-search-pipeline",}.freezepipeline = ARGV[0]unless pipeline && FUNCTIONS.key?(pipeline) warn "Usage: ruby scripts/replay.rb <#{FUNCTIONS.keys.join('|')}> [--limit N] [--trace-ids id1,id2]" exit 1endlimit = 10trace_ids = nilARGV[1..].each_with_index do |arg, i| case arg when "--limit" limit = ARGV[i + 2].to_i when "--trace-ids" trace_ids = ARGV[i + 2].split(",").map(&:strip) endend# Each pipeline gets its own replay method — replay deserializes# historical inputs, so if the method signature doesn't match the# raw input shape, reshape the arguments in a thin wrapper here.def replay_extraction(limit:, trace_ids:) service = ExtractionPipeline.new Bitfab.client.replay( service, :extract_memories, trace_function_key: FUNCTIONS["extraction"], limit:, trace_ids: )enddef replay_search(limit:, trace_ids:) service = SearchPipeline.new Bitfab.client.replay( service, :search_documents, trace_function_key: FUNCTIONS["search"], limit:, trace_ids: )endREPLAY_FNS = { "extraction" => method(:replay_extraction), "search" => method(:replay_search),}.freezefunction_key = FUNCTIONS[pipeline]puts "[replay] Replaying #{trace_ids&.length || limit} traces from \"#{function_key}\"...\n"result = REPLAY_FNS[pipeline].call(limit:, trace_ids:)puts "Test run: #{result[:test_run_url]}\n"changed = same = errors = 0result[:items].each do |item| raw_input = item[:input] || [] label = (raw_input.first || "unknown").to_s[0, 80] if item[:error] puts " ✗ \"#{label}\"" puts " Error: #{item[:error]}" errors += 1 else orig = item[:original_output] new_val = item[:result] orig_str = orig.is_a?(String) ? orig : orig.to_json new_str = new_val.is_a?(String) ? new_val : new_val.to_json is_same = orig_str == new_str marker = is_same ? "=" : "Δ" puts " #{marker} \"#{label}\"" puts " Original: #{orig_str}" puts " New: #{new_str}" is_same ? same += 1 : changed += 1 endendputs "\n─── Summary ───"puts " Pipeline: #{pipeline}"puts " Replayed: #{result[:items].length}"puts " Same: #{same}"puts " Changed: #{changed}"puts " Errors: #{errors}" if errors > 0puts "\n #{result[:test_run_url]}"
Adapt the imports, pipeline names, and per-pipeline replay methods to match your project’s instrumented workflows.