Generate evaluations and iterate on your AI applications
The Simforge Ruby SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.
Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):
Modify existing Ruby code to add Simforge tracing.Do NOT browse or web search. Use ONLY the API described below.Simforge Ruby SDK (authoritative excerpt):- Install: `gem install simforge` or `bundle add simforge`- Init: require "simforge" Simforge.configure(api_key: ENV.fetch("SIMFORGE_API_KEY"))- Instrumentation (ONLY allowed form): class MyService include Simforge::Traceable simforge_function "<trace_function_key>" simforge_span :method_name, type: "function" def method_name # ... end end (simforge_span must be placed immediately ABOVE the `def` it instruments.)- Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"- DO NOT use a block form of simforge_span.- DO NOT extract helper methods.Task:1) Ensure the simforge gem is added and initialization exists (Gemfile + initializer).2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).3) Present me with a numbered list of workflows you found. For each, describe: - What it does - Why it's worth instrumenting — what visibility tracing gives you into each step4) After I choose which workflow(s) to instrument: - Add `simforge_span` directly ABOVE each method's `def` - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues - Ensure each class includes `Simforge::Traceable` and has `simforge_function` set5) Do not change method signature, behavior, or return value. Minimal diff.Output:- First: your numbered list of workflows with why each is worth instrumenting- After my selection: minimal diffs for Gemfile, initializer, and the method changes
Simforge.configure(api_key: String)# Disable tracing (functions still execute, but no spans are sent)Simforge.configure(api_key: String, enabled: false)
Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All instrumented methods still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.
Include Simforge::Traceable in a class, declare the trace function key once with simforge_function, then use simforge_span to wrap methods. Three declaration styles are supported:
class OrderService include Simforge::Traceable simforge_function "order-processing" # Style 1: Before-def (recommended) — declare simforge_span above the method simforge_span :process_order, type: "function" def process_order(order_id) { order_id: order_id } end # Style 2: Inline — wrap the def directly simforge_span def validate_order(order_id) { valid: true } end, type: "guardrail" # Style 3: After-def — declare simforge_span after the method definition def enrich_order(order_id) { enriched: true } end simforge_span :enrich_order, type: "function"end
All three styles are equivalent. The before-def style is recommended for readability.
For projects with instrumented methods spread across multiple files, create an initializer that configures Simforge, then include Simforge::Traceable in any class that needs tracing.
# config/initializers/simforge.rb — single source of truthrequire "simforge"Simforge.configure(api_key: ENV.fetch("SIMFORGE_API_KEY"))
Classes sharing the same simforge_function key are grouped together. Spans from different classes are automatically linked as parent-child when one instrumented method calls another.
name (optional): Display name. Defaults to method name
type (optional): Span type. Defaults to "custom"
Span Types:
SPAN_TYPES = %w[ llm # LLM calls agent # Agent workflows function # Function calls guardrail # Safety checks handoff # Human handoffs custom # Default]
Examples:
class SafetyService include Simforge::Traceable simforge_function "safety-service" # Method name is automatically captured as span name simforge_span :check_safety, type: "guardrail" def check_safety(content) { safe: !content.include?("unsafe") } end # Override with name option simforge_span :validate_input, name: "InputValidator", type: "guardrail" def validate_input(input) { valid: !input.empty? } endend
Use Simforge.current_span to get a handle to the active span, then call .add_context() to attach contextual key-value pairs from inside a traced method — useful for runtime values like request IDs, computed scores, or dynamic context:
Use Simforge.current_span to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:
class ClassificationService include Simforge::Traceable simforge_function "classification" simforge_span :classify_text, type: "llm" def classify_text(text) prompt = "Classify the following text: #{text}" Simforge.current_span.set_prompt(prompt) llm.complete(prompt) endend
The last set_prompt call wins — it overwrites any previously set prompt on the span. Calling set_prompt outside a span context is a no-op (it never crashes).
Use Simforge.current_trace to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:
class OrderService include Simforge::Traceable simforge_function "order-processing" simforge_span :process_order, type: "function" def process_order(order_id) trace = Simforge.current_trace # Set session ID (stored as database column, filterable in dashboard) trace.set_session_id("session-123") # Set trace metadata (stored in raw trace data) trace.set_metadata("region" => "us-west-2", "environment" => "production") # Add context entries (stored as key-value pairs, accumulates across calls) trace.add_context("workflow" => "checkout-flow", "batch_id" => "batch-2024-01") { order_id: order_id, status: "completed" } endend
set_session_id(id) — Groups traces by user session. Stored as a database column for efficient filtering.
set_metadata(hash) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
add_context(hash) — Key-value context entries. Accumulates across multiple calls.
class RiskyService include Simforge::Traceable simforge_function "risky-service" simforge_span :risky def risky raise "error" endendbegin RiskyService.new.riskyrescue => e # Span records error and timingend
Create a standalone script to regression-test your trace functions against production data with one command. The script maps pipeline names to their replay functions, accepts CLI flags, and prints a side-by-side comparison with delta summaries.
#!/usr/bin/env ruby# Replay production traces through instrumented functions.## Uses Simforge.client.replay to fetch real traces and re-run them# through the current code, creating a test run for side-by-side comparison.## Usage:# ruby scripts/replay.rb <pipeline># ruby scripts/replay.rb <pipeline> --limit 20# ruby scripts/replay.rb <pipeline> --trace-ids id1,id2require "dotenv/load"require_relative "../config/initializers/simforge"require_relative "../services/extraction_pipeline"require_relative "../services/search_pipeline"FUNCTIONS = { "extraction" => "my-extraction-pipeline", "search" => "my-search-pipeline",}.freezepipeline = ARGV[0]unless pipeline && FUNCTIONS.key?(pipeline) warn "Usage: ruby scripts/replay.rb <#{FUNCTIONS.keys.join('|')}> [--limit N] [--trace-ids id1,id2]" exit 1endlimit = 10trace_ids = nilARGV[1..].each_with_index do |arg, i| case arg when "--limit" limit = ARGV[i + 2].to_i when "--trace-ids" trace_ids = ARGV[i + 2].split(",").map(&:strip) endend# Each pipeline gets its own replay method — replay deserializes# historical inputs, so if the method signature doesn't match the# raw input shape, reshape the arguments in a thin wrapper here.def replay_extraction(limit:, trace_ids:) service = ExtractionPipeline.new Simforge.client.replay( service, :extract_memories, trace_function_key: FUNCTIONS["extraction"], limit:, trace_ids: )enddef replay_search(limit:, trace_ids:) service = SearchPipeline.new Simforge.client.replay( service, :search_documents, trace_function_key: FUNCTIONS["search"], limit:, trace_ids: )endREPLAY_FNS = { "extraction" => method(:replay_extraction), "search" => method(:replay_search),}.freezefunction_key = FUNCTIONS[pipeline]puts "[replay] Replaying #{trace_ids&.length || limit} traces from \"#{function_key}\"...\n"result = REPLAY_FNS[pipeline].call(limit:, trace_ids:)puts "Test run: #{result[:test_run_url]}\n"changed = same = errors = 0result[:items].each do |item| raw_input = item[:input] || [] label = (raw_input.first || "unknown").to_s[0, 80] if item[:error] puts " ✗ \"#{label}\"" puts " Error: #{item[:error]}" errors += 1 else orig_count = item[:original_output].is_a?(Array) ? item[:original_output].length : 0 new_count = item[:result].is_a?(Array) ? item[:result].length : 0 delta = new_count - orig_count delta_str = delta == 0 ? "=" : delta > 0 ? "+#{delta}" : delta.to_s puts " #{delta == 0 ? '=' : 'Δ'} \"#{label}\"" puts " #{orig_count} → #{new_count} (#{delta_str})" delta == 0 ? same += 1 : changed += 1 endendputs "\n─── Summary ───"puts " Pipeline: #{pipeline}"puts " Replayed: #{result[:items].length}"puts " Same: #{same}"puts " Changed: #{changed}"puts " Errors: #{errors}" if errors > 0puts "\n #{result[:test_run_url]}"
Adapt the imports, pipeline names, and per-pipeline replay methods to match your project’s instrumented workflows.