Generate evaluations and iterate on your AI applications
The Simforge TypeScript SDK captures your AI function calls to automatically generate evaluations. Re-run your prompts with different models, parameters, and inputs to iterate faster.
Copy this prompt into your coding agent (tested with Cursor and Claude Code using Sonnet 4.5):
Modify existing TypeScript code to add Simforge tracing.Do NOT browse or web search. Use ONLY the API described below.Simforge TypeScript SDK (authoritative excerpt):- Install: `npm install @goharvest/simforge` or `pnpm add @goharvest/simforge`- Init: import { Simforge } from "@goharvest/simforge" const simforge = new Simforge({ apiKey: process.env.SIMFORGE_API_KEY })- Instrumentation (ONLY allowed form - use getFunction): // Declare trace function key once const myService = simforge.getFunction("<trace_function_key>") // Wrap functions with withSpan const tracedFn = myService.withSpan(originalFunction) // Or with options: const tracedFn = myService.withSpan({ name: "DisplayName", type: "function" }, originalFunction) // Span types: "llm", "agent", "function", "guardrail", "handoff", "custom"- DO NOT modify the original function.- DO NOT extract helper methods.Task:1) Ensure @goharvest/simforge is installed and initialization exists.2) Read the codebase and identify ALL AI workflows (LLM calls, agent runs, AI-driven decisions).3) Present me with a numbered list of workflows you found. For each, describe: - What it does - Why it's worth instrumenting — what visibility tracing gives you into each step4) After I choose which workflow(s) to instrument: - Create a function wrapper with `simforge.getFunction("<trace_function_key>")` - Wrap the functions with `myService.withSpan(originalFunction)` - Instrument intermediate steps (not just the final output) so each trace has enough context to diagnose issues - Replace usages of the original functions with the traced versions5) Do not change function signature, behavior, or return value. Minimal diff.Output:- First: your numbered list of workflows with why each is worth instrumenting- After my selection: minimal diffs for dependencies, initialization, and the function wrapping
new Simforge({ apiKey: string })// Disable tracing (functions still execute, but no spans are sent)new Simforge({ apiKey: string, enabled: false })
Missing API key doesn’t crash. If the API key is missing, empty, or whitespace-only, the SDK automatically disables tracing and logs a warning. All wrapped functions still execute normally — no spans are sent, no errors are thrown. You don’t need any conditional logic around the API key.
For projects with instrumented functions spread across multiple files, create a dedicated file that initializes Simforge and exports the function. Import it wherever you need to instrument.
// lib/simforge.ts — single source of truthimport { Simforge } from "@goharvest/simforge"const simforge = new Simforge({ apiKey: process.env.SIMFORGE_API_KEY })export const orderService = simforge.getFunction("order-processing")
// services/processOrder.tsimport { orderService } from "../lib/simforge"async function processOrder(orderId: string) { return { orderId }}export const tracedProcessOrder = orderService.withSpan(processOrder)
// services/validateOrder.tsimport { orderService } from "../lib/simforge"async function validateOrder(orderId: string) { return { valid: true }}export const tracedValidateOrder = orderService.withSpan(validateOrder)
Spans from different files are automatically linked as parent-child when one wrapped function calls another.
When wrapping a function you didn’t define (e.g. an SDK or library call), pass it directly to withSpan and call the result immediately. This ensures the arguments are captured as span input.
// ✅ GOOD — pass function directly, arguments are captured as span inputconst result = await orderService.withSpan( { name: "ProcessOrder", type: "function" }, processOrder,)(orderId)// ❌ BAD — anonymous wrapper loses all input capture (span has no input)const result = await orderService.withSpan( { name: "ProcessOrder", type: "function" }, async () => processOrder(orderId),)()
Never wrap functions in an anonymous function like async () => fn(args). The SDK captures the wrapper function’s arguments as span input — an anonymous wrapper has no arguments, so the span records nothing.
Use getCurrentSpan() to get a handle to the active span, then call .addContext() to attach contextual key-value pairs from inside a traced function — useful for runtime values like request IDs, computed scores, or dynamic context:
Access the current trace ID from within a span using getCurrentSpan().traceId. This is useful for capturing trace IDs to use with replay or for logging:
Use getCurrentSpan() to set the prompt string on the current span. This is stored in span_data.prompt and is useful for capturing the exact prompt text sent to an LLM:
import { getCurrentSpan } from "@goharvest/simforge"async function classifyText(text: string) { const prompt = `Classify the following text: ${text}` getCurrentSpan()?.setPrompt(prompt) const result = await llm.complete(prompt) return result}const traced = simforge.withSpan("classification", { type: "llm" }, classifyText)
The last setPrompt call wins — it overwrites any previously set prompt on the span. Calling setPrompt outside a span context is a no-op (it never crashes).
If you use BAML for your LLM calls, wrapBAML automatically captures the rendered prompt and LLM metadata (model, provider, token counts, duration) on the current span — no manual setPrompt or addContext calls needed.
npm install @boundaryml/baml
import { b } from "./baml_client"// Pass your BAML client to the constructorconst simforge = new Simforge({ apiKey: process.env.SIMFORGE_API_KEY, bamlClient: b,})// Wrap a BAML method — prompt and metadata are captured automaticallyconst tracedClassify = simforge.withSpan( "classify", { type: "llm" }, simforge.wrapBAML(b.ClassifyText),)const result = await tracedClassify("Hello world")
wrapBAML works by creating a BAML Collector, running the method through a tracked client, then extracting:
Prompt → setPrompt() with the rendered messages (system + user)
If @boundaryml/baml is not installed, the method is called directly without instrumentation.
Accessing the BAML Collector
The wrapped function exposes a .collector property containing the BAML Collector instance from the most recent call. This is useful for accessing raw token counts, timing data, or other metadata that BAML captures:
const tracedClassify = simforge.withSpan( "classify", { type: "llm" }, simforge.wrapBAML(b.ClassifyText),)await tracedClassify("Hello world")// Access the collector from the last callconst collector = tracedClassify.collector
The .collector is null before the first call or if @boundaryml/baml is not installed.
onCollector Callback
For more control, pass an onCollector callback via WrapBAMLOptions. It fires after each BAML invocation with the Collector instance:
const tracedClassify = simforge.withSpan( "classify", { type: "llm" }, simforge.wrapBAML(b.ClassifyText, { onCollector: (collector) => { // Process token usage, timing, or other collector data console.log("BAML collector:", collector) }, }),)
When using the two-argument form (passing bamlClient explicitly), options go in the third argument:
Use getCurrentTrace() to set context that applies to the entire trace (all spans within a single execution). This is useful for grouping traces by session or attaching trace-level metadata:
import { getCurrentTrace } from "@goharvest/simforge"const traced = simforge.withSpan("order-processing", { type: "function" }, async () => { const trace = getCurrentTrace() // Set session ID (stored as database column, filterable in dashboard) trace?.setSessionId("session-123") // Set trace metadata (stored in raw trace data) trace?.setMetadata({ region: "us-west-2", environment: "production" }) // Add context entries (stored as key-value pairs, accumulates across calls) trace?.addContext({ workflow: "checkout-flow", batch_id: "batch-2024-01" }) return { status: "completed" }})
setSessionId(id) — Groups traces by user session. Stored as a database column for efficient filtering.
setMetadata(obj) — Arbitrary key-value metadata on the trace. Merges with existing metadata.
addContext(obj) — Key-value context entries. Accumulates across multiple calls.
Simforge’s native functions improve prompt tuning efficacy. The auto-tuning engine has full access to prompts—unlike other Agent SDKs that nest user instructions inside system prompts, making prompts inaccessible to tracing.
const result = await simforge.call("ExtractName", { text: "My name is John Doe" })// Returns typed result
Create a standalone script to regression-test your trace functions against production data with one command. The script maps pipeline names to their replay functions, accepts CLI flags, and prints a side-by-side comparison with delta summaries.
/** * Replay production traces through instrumented functions. * * Uses simforge.replay() to fetch real traces and re-run them through * the current code, creating a test run for side-by-side comparison. * * Usage: * npx tsx scripts/replay.ts <pipeline> * npx tsx scripts/replay.ts <pipeline> --limit 20 * npx tsx scripts/replay.ts <pipeline> --trace-ids id1,id2 */import "dotenv/config"import { simforge } from "../lib/simforge"import { extractMemories } from "../services/extraction"import { searchDocuments } from "../services/search"const FUNCTIONS = { extraction: "my-extraction-pipeline", search: "my-search-pipeline",} as consttype Pipeline = keyof typeof FUNCTIONSconst pipeline = process.argv[2] as Pipeline | undefinedconst args = process.argv.slice(3)if (!pipeline || !FUNCTIONS[pipeline]) { console.error( `Usage: npx tsx scripts/replay.ts <${Object.keys(FUNCTIONS).join("|")}> [--limit N] [--trace-ids id1,id2]`, ) process.exit(1)}let limit = 10let traceIds: string[] | undefinedfor (let i = 0; i < args.length; i++) { if (args[i] === "--limit" && args[i + 1]) { limit = Number.parseInt(args[i + 1], 10) i++ } else if (args[i] === "--trace-ids" && args[i + 1]) { traceIds = args[i + 1].split(",").map((id) => id.trim()) i++ }}// Each pipeline gets its own replay function — replay deserializes// historical inputs, so if the function signature doesn't match the// raw input shape, reshape the arguments in a thin wrapper here.async function replayExtraction() { const fn = async (conversation: string, existingItems: unknown[]) => { return extractMemories(conversation, existingItems) } return simforge.replay(FUNCTIONS.extraction, fn, { limit, ...(traceIds ? { traceIds } : {}), })}async function replaySearch() { const fn = async (query: string, opts: Record<string, unknown>) => { return searchDocuments(query, { userId: opts.userId as string, limit: (opts.limit as number) ?? 10, }) } return simforge.replay(FUNCTIONS.search, fn, { limit, ...(traceIds ? { traceIds } : {}), })}const REPLAY_FNS: Record<Pipeline, () => ReturnType<typeof simforge.replay>> = { extraction: replayExtraction, search: replaySearch,}async function main() { const functionKey = FUNCTIONS[pipeline!] console.log(`[replay] Replaying ${traceIds?.length ?? limit} traces from "${functionKey}"...\n`) const result = await REPLAY_FNS[pipeline!]() console.log(`Test run: ${result.testRunUrl}\n`) let changed = 0, same = 0, errors = 0 for (const item of result.items) { const input = item.input as unknown[] | undefined const label = typeof input?.[0] === "string" ? input[0].slice(0, 80) : JSON.stringify(input?.[0] ?? "unknown").slice(0, 80) if (item.error) { console.log(` ✗ "${label}"`) console.log(` Error: ${item.error}`) errors++ } else { const origCount = Array.isArray(item.originalOutput) ? item.originalOutput.length : 0 const newCount = Array.isArray(item.result) ? (item.result as unknown[]).length : 0 const delta = newCount - origCount const deltaStr = delta === 0 ? "=" : delta > 0 ? `+${delta}` : `${delta}` console.log(` ${delta === 0 ? "=" : "Δ"} "${label}"`) console.log(` ${origCount} → ${newCount} (${deltaStr})`) delta !== 0 ? changed++ : same++ } } console.log(`\n─── Summary ───`) console.log(` Pipeline: ${pipeline}`) console.log(` Replayed: ${result.items.length}`) console.log(` Same: ${same}`) console.log(` Changed: ${changed}`) if (errors > 0) console.log(` Errors: ${errors}`) console.log(`\n ${result.testRunUrl}`)}main().catch((err) => { console.error("[replay] Fatal error:", err) process.exit(1)})
Adapt the imports, pipeline names, and per-pipeline replay functions to match your project’s instrumented workflows.