Changelog - Bitfab

Dashboard

Complete grader set on experiment panels

The graders panel on an experiment now lists every grader scoring that run, including the ones inherited from its dataset, so it matches the pass and fail results shown on each trace. Graders that come from the dataset are marked with a small icon you can hover to confirm, and re-runs stay limited to the experiment’s own graders.

Dashboard

Reliable grader reruns

You can now see live progress when re-running graders on a dataset or experiment, and that progress resumes if you refresh or return while the run is still active. Starting the same rerun again reconnects to the existing work instead of creating a duplicate, while a different selection is blocked until the current run finishes. Completion updates recover even if a live update is missed, so results do not stay stuck in a running state.

Live grading progress on trace rows

When graders re-score a dataset or experiment, each trace row now shows its grading as it happens instead of going blank. A loading ring appears the moment scoring starts and the pass rate fills in live as each grader reports, so you can watch results arrive. Finished rows keep their green or red verdict, with the pill’s border showing the pass/fail split.

Dashboard

Per-grader labeling on the trace list

When you open a trace that has graders from the Traces page, the labeling panel now shows each grader’s verdict so you can review and label them one by one, the same per-grader view you get inside a dataset. Traces without graders keep the standard pass/fail labeling panel.

Dashboard

Each dataset now shows the experiments run against it, right in the dataset view. See each run’s pass rate at a glance and jump straight to a single experiment, or open the full list filtered to that dataset, without leaving your workflow.

Dashboard

Graders now have their own section

Every trace function has a Graders section alongside Traces, Datasets, and Experiments. It lists your graders with their pass rate and the number of datasets and experiments each one is attached to, and you can sort by status, pass rate, evaluation volume, or recency, and archive or unarchive a grader without leaving the list.

Grader overview with dataset and experiment footprint

Opening a grader now shows an overview of what it does and where it runs: its overall pass rate, the criteria it evaluates, and every dataset and experiment it is attached to with that grader’s pass rate on each. Pass rates combine the grader’s automated results with your own human labels.

TypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.30.1, Python SDK v0.30.1, Ruby SDK v0.30.1

Automatic code-change capture on replay

When you replay traces after editing your code, the SDK now attaches the diff to the resulting experiment automatically, so you can see exactly what changed alongside the results with no extra arguments. If you don’t pass codeChangeFiles to replay(), it captures your working-tree changes against your trunk and includes them; passing an explicit code change still takes precedence when you want a precise per-edit before/after.Available in the TypeScript, Python, and Ruby SDKs. Opt out with BITFAB_DISABLE_CODE_CHANGE_CAPTURE.

TypeScript SDKPython SDKRuby SDKDashboard

TypeScript SDK v0.30.0, Python SDK v0.30.0, Ruby SDK v0.30.0

Attach graders to a single replay

replay() now takes a graderIds / grader_ids option so you can grade one experiment with specific graders without permanently adding them to the dataset. The run is graded by the union of the graders you pass and the dataset’s own graders, so a one-off check runs alongside your standard ones.

await client.replay("my-function", processInput, {
  datasetId: "<dataset-id>",
  graderIds: ["<grader-id>"],
})

Available in the TypeScript, Python (grader_ids), and Ruby (grader_ids) SDKs.

Dashboard

Accurate experiment pass rates

Completed experiments without assigned graders now show a pass percentage from their trace labels instead of --%, so the summary reflects the results already available below it. Experiment rows use grader results when graders are assigned and label results otherwise, while runs with labels still outstanding remain pending.

Honest experiment comparisons

Experiment trace rows now omit the before-to-after grader badge when the original trace has no compatible grader results. The current grader score remains visible, so you can see the replay outcome without a misleading missing baseline.

Graders on the experiment page

Experiments now show the graders attached to each run right beside the traces, with a per-grader pass/fail breakdown for the run. Open a grader to read its criteria, use Manage to attach or detach graders, and Re-run to re-grade the experiment’s traces. The pass and fail bars update live as grading runs.

DashboardPluginsCLI

Plugins v0.9.5, CLI v0.2.212

Group and grade experiments from your coding agent

You can now organize existing experiment runs into a shared group and attach a grader across every current experiment using the new create_experiment_group and add_grader_to_experiment_group MCP tools. Completed experiments queue only missing grader results immediately, while pending experiments use the grader when they finish. Grader assignments stay on the experiments already in the group, so experiments added later do not inherit them automatically.

Dashboard

Replays for up to 5,000 recent traces

Replay runs can now select up to 5,000 recent traces at once, up from 100, so larger evaluations no longer need to be split into manual batches. Starting a replay now avoids loading every child span up front, while dataset-backed replays continue to use the dataset’s full trace list regardless of the recent-trace limit.

Dashboard

Guidance for creating datasets and experiments

Datasets and experiments are created by asking your coding agent, and the dashboard now shows you how. When a datasets or experiments page has none yet, it explains what the primitive is and gives you a ready-to-copy prompt to paste into your coding agent. The “New” button on those pages opens the same guidance with an example, so you are never left on a blank screen wondering how to start.

DashboardPluginsCLI

Plugins v0.9.4, CLI v0.2.211

Paginated grader lists in coding agents

list_graders now returns manageable pages for functions with large grader collections, with name search and a cursor for fetching the next page. It returns active graders by default, can include archived definitions when requested, and leaves grader-training pipeline entries out of the results.

PluginsCLI

Plugins v0.9.3, CLI v0.2.210

Assistant runs your plan without pausing to ask

The Bitfab assistant no longer stops after each experiment to ask whether to keep going or to revert a fix. It now reports the results and automatically continues through the experiments you already approved, wrapping up on its own once the plan is done. A multi-experiment run finishes in one pass instead of prompting between rounds.

DashboardPluginsCLI

Plugins v0.9.2, CLI v0.2.209

Attach graders to experiments from your coding agent

You can now attach graders directly to an experiment so they run against its replay traces, using two new Bitfab MCP tools, add_graders_to_experiment and remove_graders_from_experiment. Your coding agent can manage an experiment’s grading scope without leaving its workflow, the same way it already manages dataset graders. At completion an experiment is scored against the union of these direct attachments and its dataset’s current graders.

Dashboard

Experiments grouped by run

Your experiments page now groups a trace function’s runs into groups, so each iteration reads as one unit rather than a flat list of runs. Groups are ordered by their most recent run, each header links back to the dataset it was run against, and you can collapse a group to focus. Runs launched without a group are gathered under an “Ungrouped” heading.

Dashboard

Live dataset grader pass rates

Dataset grader pass-rate pills now stay current as new grader results arrive or grader assignments change. The datasets list refreshes automatically, so you can monitor evaluation progress without reloading the page.

Dashboard

Experiment labeling shows only the run’s graders

When you open a trace inside an experiment, the labeling panel now shows only the graders that experiment was scored against: its dataset graders plus any graders attached to the run, rather than every grader defined for the function. Experiments with no graders open straight to the pass/fail labeling panel.

DashboardPluginsCLI

Plugins v0.9.1, CLI v0.2.208

Grader definitions after saves

After you create or update an automated grader through Bitfab MCP tools, your coding agent now presents the full saved definition instead of a generic success message. The response includes the grader function, status, evaluation focus, and any passing or failing criteria so you can immediately verify what will be evaluated.

Dashboard

Reliable Studio links

Links that open Studio now preserve all handoff parameters, including repeated values, across sign-in, datasets, experiments, trace plans, and template previews. Dataset-linked experiments also switch to the dataset organization automatically, so shared links open in the correct context.

Dashboard

Manage graders from a dataset

You can now attach and remove graders directly from a dataset. Open Manage in the Graders panel on a dataset page to move graders between Available and Attached, with search and sorting to find the right one. Attached graders score every trace in the dataset, and changes take effect immediately.

Dashboard

Experiment graders run across every replay

You can now attach graders directly to an experiment, and Bitfab runs them together with the dataset’s assigned graders across every trace in the completed replay. The finalized grader set stays with the experiment, so late-arriving replay traces are evaluated consistently and completed experiment results retain the exact grader coverage that ran.

Dashboard

Framework labels on trace plans

Trace plan headers now show the instrumentation framework Bitfab detected for your workflow, next to the function name and language. Plans that span multiple frameworks list each one, so you can see at a glance how a workflow is instrumented.

Dashboard

Consistent labels across trace lists

Workflow, dataset, and experiment trace lists now use the same current label state, so pass, fail, and skipped results stay consistent across views. List loading also avoids fetching unused trace metadata, making these pages more efficient without changing how traces are managed.

Dashboard

Keep your place when switching functions

Switching trace functions from the sidebar now keeps you in the current section, such as Traces, Datasets, Graders, or Experiments, instead of sending you back to Traces. When you switch from a specific trace, dataset, or grader, Bitfab opens the matching section list for the new function so resource IDs are not carried across functions.

Dashboard

Live grader results across review views

Dataset reviews, experiment cards, experiment trace rows, and trace lists now update immediately as grader results and dataset grader assignments change. Pass rates and per-grader verdicts stay current during grading and review without a manual refresh.

Dashboard

More reliable dataset grader labeling

Dataset grader labels now stay consistent when automated evaluations and human reviews overlap, so one source no longer overwrites the other. Newly added completed traces are automatically graded by their assigned dataset graders, and the labeling panel preserves in-progress selections when a save or refresh fails.

Dashboard

Grader pass rates on trace lists

When a dataset has graders assigned, trace rows now show how many graders passed as a pass-rate bar (for example, 3 of 4) instead of a single Pass or Fail. It appears on the dataset review page, on experiment rows, and in the experiment header, so you can see at a glance how each trace and each run scored across all of its graders. The main traces list shows the same pass rate for grader-evaluated traces.

Dashboard

Run graders without tuning

Active graders now evaluate traces directly from their saved criteria, so dataset grader runs work immediately without tuning. If a tuned prompt exists, Bitfab still uses it; empty or stale prompts fall back to the grader criteria.

Dashboard

Live dataset review updates

Dataset pages, trace lists, and labeling panels now stay in sync as graders are assigned or removed and labels are added. Reviewers see the current dataset grader order immediately, while removed graders disappear from active labeling workflows without deleting their historical labels.

Dashboard

Label traces while reviewing experiments

You can now label traces without leaving the experiments view. Open the before/after comparison for any trace and use the new Label toggle, next to the Diff / Original / Replayed switch, to score it. When the dataset has graders, the panel shows each grader for per-grader approval or override; otherwise it is a simple pass/fail verdict with notes.

Dashboard

Smoother dataset grader reruns

Dataset grader reruns now keep traces in a grading state through transient evaluation failures, so temporary issues no longer appear as permanent errors. If every retry fails, the trace still moves to an error state instead of remaining stuck in grading.

Dashboard

Compare individual spans in the Diff view

When you compare a replayed trace against its original, the experiment Diff view now lets you open any span that changed, not just the whole trace. The span tree greys out spans whose input and output stayed the same and keeps the changed ones selectable, so you can jump straight to what your code change actually affected.

Clearer replay diffs

The Diff view now shows the input side fully expanded, so you can read exactly what the model received, and hovering a highlighted line tells you whether the replayed run excluded or included it.

Dashboard

Trace details open in a side panel

Clicking a trace on the traces page now opens a slide-in detail panel next to the list, matching how dataset and experiment review work, instead of navigating to a separate page. You can label the trace Pass, Fail, or Skip right from the panel and move through traces with Save & Next. Links to individual traces still work: sharing or reloading a trace URL opens the same list with the panel already open, and your active filters stay in the URL.

Redesigned trace list rows

Trace rows across the traces, dataset, and experiment lists now share one design that separates how the run went from how it was judged. A leading icon shows the run state (running, completed, errored, or a replay), while the verdict pill shows Pass, Fail, Skip, or an agent suggestion awaiting review, with a robot glyph marking machine verdicts. Replayed traces are tinted indigo so re-runs stand out, unlabeled rows show an input and output preview, and reviewer notes appear inline on labeled rows.

Re-run graders on a dataset

When a dataset has graders attached, you can now re-run them across every trace in the dataset directly from the dataset page. Open the Graders panel, choose which graders to run, and follow the run from running to finished. You can also click any grader to view its evaluation criteria and prompt.

Dashboard

Label datasets grader-by-grader

When a dataset has graders assigned, reviewing a trace now shows a labeling panel with one row per grader instead of a single pass/fail. Approve an automated grader’s suggested verdict in one click or override it, add a note, and move through the dataset trace by trace. Human labels always take precedence over the automated suggestions.

Dashboard

Reliable experiment history pagination

Experiment histories now load every run when several experiments start at nearly the same time. Infinite scrolling no longer skips or repeats experiments that share the same timestamp.

Dashboard

Scroll through complete experiment histories

Experiment histories now keep loading as you scroll, so older runs remain available instead of stopping after the newest 50. This works across function, dataset, and experiment-group views. The loading footer stays visible while more runs remain, with a spinner while the next page arrives.

Dashboard

Dataset graders now grade automatically

The graders you assign to a dataset now run automatically on that dataset’s traces, and on the replayed traces when you run an experiment against the dataset. You get grades on your dataset and experiment results without kicking off anything by hand.

DashboardPluginsCLI

Plugins v0.9.0, CLI v0.2.207

Create and edit graders from your coding assistant

Claude Code, Cursor, and Codex can now create and edit automated graders for a traced function with the new save_grader and list_graders tools. Ask for a check like “the reply never invents order numbers” and your coding agent defines the grader, then renames, updates, archives, or restores it on request without leaving your workflow. Graders are saved as definitions for now; nothing runs them against new traces automatically yet.

Dashboard

Diff view for replay comparisons

When you replay a trace to test a code change, the trace comparison drawer now opens on a new Diff view that shows the replayed run against the original side by side, input on the left and output on the right, with changed lines highlighted. You can tell whether the change helped without flipping between the Original and Replayed panes and holding both in your head. Use the Diff, Original, and Replayed toggle at the top of the drawer to switch views; your choice sticks as you step through the run.

DashboardPluginsCLI

Plugins v0.8.161, CLI v0.2.206

Assign graders to datasets from your coding assistant

Claude Code, Cursor, and Codex can now assign graders to datasets with add_graders_to_dataset and remove_graders_from_dataset. list_datasets now includes assigned graders, so you can inspect evaluation coverage and update it without leaving your coding workflow.

Dashboard

See experiment annotations in trace comparisons

Experiment trace comparisons now show the replay annotation beside the pass/fail transition, so you can see the labeler’s reasoning without returning to the trace list. Hover over a truncated annotation to read its full text in a tooltip.You can now move through traces and spans with the keyboard while labeling a dataset or comparing experiment results. Use the arrow keys or Vim keys (h, j, k, l) to step between traces and their spans, and press Esc to close a trace’s detail view.

CLI

CLI v0.2.205

Start setup with a specific request

You can now pass --prompt (or -p) to bitfab init and bitfab setup to tell the setup agent what you want instrumented from the start. The prompt is forwarded into setup in Claude Code, Codex, and Cursor, so onboarding can begin with the workflow you already have in mind.

Python SDK

Python SDK v0.29.2

Lower tracing overhead in the Python SDK

Traced functions in the Python SDK now return as soon as their trace data is captured, instead of waiting for spans to finish uploading to Bitfab. For latency-sensitive code, this takes a network round-trip out of your own request path while your traces keep uploading in the background.

Dashboard

Browse past experiments from the dashboard

Every trace function now has an Experiments tab next to Traces and Datasets, listing the experiments that have run against it, newest first. Each run shows its pass rate and how many traces were fixed, regressed, still passing, or still failing versus the original, so you can tell at a glance whether a change helped. Expand a run to inspect its traces, compare the original and updated output, and view the code change that produced it.

TypeScript SDKPython SDKRuby SDKDashboardPluginsCLI

TypeScript SDK v0.29.1, Python SDK v0.29.1, Ruby SDK v0.29.1, Plugins v0.8.158, CLI v0.2.203

Replay verdicts persist by the original trace

When your coding agent evaluates a replay run, its pass/fail verdicts now persist against the trace each item was replayed from: pass testRunId plus the item’s originalTraceId to update_agent_labels and Bitfab resolves them onto that run’s replay traces. Verdicts reliably reach the experiments page without the agent ever needing a server-generated replay trace id.

Replay items rename source to original

Replay items, progress events, and adapt-inputs context now name the replayed-from trace originalTraceId and originalSpanId (snake_case in Python and Ruby); the previous sourceTraceId/sourceSpanId names keep working everywhere as deprecated aliases, so existing scripts are unaffected. An item’s traceId is now null while the run streams and is filled in with the server replay id when the run completes.

PluginsCLI

Plugins v0.8.157, CLI v0.2.202

Replay verdicts are saved automatically

When you replay a single trace to check whether a fix worked, the assistant now saves its pass/fail verdict onto that replay trace instead of only showing it in chat, so your conclusion sticks and appears alongside the trace. The replay path stays lightweight, with no Studio, dataset, or experiment setup. If your SDK is too old to return a replay trace ID, the verdict stays in chat with a prompt to upgrade the SDK.

TypeScript SDKPython SDKRuby SDKDashboardPluginsCLI

TypeScript SDK v0.29.0, Python SDK v0.29.0, Ruby SDK v0.29.0, Plugins v0.8.156, CLI v0.2.201

Inject custom values into specific spans during replay

When you replay a trace, you can now override the output of a chosen span instead of running its real code or replaying its recorded output. Match a span by its name, function key, or type, then return a fixed value or one computed from the span’s live inputs and its original recorded output. Available in the TypeScript, Python, and Ruby SDKs.

await client.replay("my-workflow", runWorkflow, {
  mockOverride: {
    match: (node) => node.spanName === "Summarizer",
    value: async (ctx) => ({
      ...(await ctx.getOriginalOutput()),
      score: 1,
    }),
  },
})

Dashboard

Long trace results stay visible

Large arrays in trace Input and Output views now open automatically, so you can see long result lists without an extra click. When those lists contain nested objects or arrays, each entry stays collapsed to keep the trace readable; compact scalar values remain visible.

DashboardPluginsTypeScript SDKPython SDKRuby SDKGo SDKCLI

Plugins v0.8.155, TypeScript SDK v0.28.11, Python SDK v0.27.8, Ruby SDK v0.23.8, Go SDK v0.12.5, CLI v0.2.200

Read one span without loading the full trace

Fetch a single persisted span directly from a trace in the TypeScript, Python, Ruby, and Go SDKs. Select it by canonical id or by name; repeated names return the last occurrence by default, with options for the first or a zero-based occurrence.

const span = await bitfab.getTraceSpan(traceId, {
  name: "generate",
})

DashboardPluginsTypeScript SDKCLI

Plugins v0.8.154, TypeScript SDK v0.28.10, CLI v0.2.199

Reliable nested traces from the first Node.js call

The TypeScript SDK now preserves nested span context from the first traced call in Node.js, including ESM and CommonJS applications. For async-generator streams, wrap the controller that owns the iteration in an outer withSpan call so service-side production and consumer-side work appear under one trace.

DashboardPlugins

Plugins v0.8.153

Trace plan warnings for spans that won’t replay

Trace plans now flag spans that won’t replay cleanly before you confirm. If the entry point’s input can’t be serialized, the plan shows a “Root not replayable” warning; spans mocked despite output that can’t be serialized are called out the same way. When a plan carries any of these warnings, Confirm opens a confirmation step so you accept them deliberately rather than by accident. The warnings show on the trace plan review page, and your coding agent applies the same rules when it drafts a plan.

Dashboard

Structured Map and Set trace output

Trace outputs containing JavaScript Maps and Sets now render as navigable structured data instead of collapsed string representations. Maps preserve distinct keys even when their string forms collide, and nested Maps and Sets remain expandable in the trace viewer.

Dashboard

More reliable dashboard startup

The Bitfab dashboard now avoids analytics initialization crashes in browsers where cookie access is unavailable or blocked. Analytics stays inactive until it is ready, so affected sessions can load the dashboard normally.

Plugins

Plugins v0.8.152

Edit span templates without leaving the trace you’re viewing

When you ask your coding agent to change how spans render, Bitfab no longer pulls you onto a separate template-preview page. If you’re already looking at a trace of that function, it now offers to edit the templates in place, and your open trace re-renders live with each change you make. The dedicated preview page, with click-to-target editing on the function’s most recent trace, is still one option away when you want it. Available in the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.150

Fix flow reverts changes that cause regressions

When you use the Bitfab assistant’s fix flow and re-run your full dataset to lock in a fix, it now checks whether your change broke traces that were passing before. If it finds real regressions, the assistant recommends reverting the fix and starting a fresh attempt, so you never ship a change that trades one fixed trace for several broken ones. The trace you were fixing stays saved in your dataset as a regression test to revisit. Available in the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.149

Choose where fixed traces get saved

When you fix a failing trace with the Bitfab assistant (/bitfab:assistant fix), it now asks which dataset to save the fixed scenario to, instead of silently adding it to whichever dataset already existed. Pick an existing dataset, create a new one, or continue without saving. When you have several datasets it recommends the most recently used one and keeps the list short, so the choice stays quick.

Dashboard

Trace plans stay available longer

Trace plans now remain available for 365 days, giving you much more time to return to an instrumentation plan before confirming it. The longer review window applies to newly created trace plans.

Session length in chat summaries

Chat session summaries now show how long each coding session lasted, so you can put the product feedback in context at a glance. Slack notifications show the readable duration, while generic webhooks include the exact start time and duration in milliseconds.

Dashboard

Redesigned trace input and output view

Trace spans now show Input, Output, Context, and Error as distinct, color-coded zones with sticky headers, so you always know which part of a span you’re reading as you scroll. Input and Output sit side by side and stack automatically when the panel is narrow. When a span has an error, the Output header shows a control that jumps straight to the error details.

Collapsible JSON for trace payloads

Trace input and output now render as an interactive tree you can expand and collapse, instead of a flat text dump. Deeply nested objects, large arrays, and long embedding vectors stay collapsed by default, so you can drill into just the parts you care about.

Dashboard

A clearer trace plan review

The trace plan review page now speaks the same visual language as its “What is a replay?” explainer. Each span shows a type-colored dot with an icon for what happens on replay (re-runs live, mocked from the recording, or skipped inside a mock), its classification written out beside the name, and a Re-run | Mock toggle on the right. Every control has a tooltip, and the tree navigates with the arrow keys.

The replay entry point always re-runs

A replay starts by re-running the top traced span, so its Mock control is now disabled with an explanation instead of silently having no effect. If you untrace spans above a mocked one, the newly promoted entry point switches to re-run and its children are no longer marked as skipped. Run/mock choices also survive untracing and re-tracing a span, and closing the replay explainer with Escape no longer cancels the whole plan.

DashboardTypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.28.9, Python SDK v0.27.7, Ruby SDK v0.23.7

See which spans were mocked on a replay

When you replay a trace, spans set to mock on replay are served from the original trace’s recorded output instead of re-executing. The trace view now marks those spans with a badge, both in the span tree and on the span header, so you can tell at a glance which nodes were replayed from history and which re-ran live. Recording that disposition requires the latest TypeScript, Python, or Ruby SDK.

PluginsCLI

Plugins v0.8.146, CLI v0.2.192

A replayability check before you accept a trace plan

When you instrument an AI workflow, Bitfab now verifies the traced function’s root can be replayed before it proposes the trace plan, instead of letting you accept a plan and only then discover the root cannot be replayed. If the root’s inputs are not serializable, setup resolves it up front by moving the trace boundary inward, using a framework handler, or refactoring, so the plan you confirm is one you can actually replay against later. Available in Claude Code, Cursor, and Codex.

DashboardPluginsCLI

Plugins v0.8.144, CLI v0.2.190

A faster path through instrumentation setup

Bitfab setup now moves directly from finishing one instrumented workflow to choosing the next workflow, selecting another target, or finishing setup. It shows how to exercise the workflow and run its generated replay command, while replay coverage checks remain available as an explicit action. When you choose the next workflow, Bitfab refreshes its workflow scan so targeted setup runs do not miss other candidates. Available in Claude Code, Cursor, and Codex.

PluginsCLI

Plugins v0.8.143, CLI v0.2.189

Studio opens in your normal browser

Bitfab Studio now opens in your usual browser as a regular tab, so you keep the address bar, tab controls, and the rest of your browser workflow. When a Studio session ends on macOS, the plugin closes only its matching tab and leaves your other browser tabs untouched.

PluginsCLI

Plugins v0.8.142, CLI v0.2.188

Honest Studio launch reporting

When a plugin command opens Bitfab Studio, it now always surfaces a clickable link in chat, and the message no longer claims a window opened before one actually did. If a browser could not be launched at all (common on remote or SSH sessions, or when no supported browser is available), the command reports why and the surfaced link still connects the session when you click it.

Automatic recovery when a Studio window never appears

Previously, if a Studio window failed to surface, commands could wait indefinitely and later opens kept pointing at the dead session until it was cleared by hand. The plugin now detects a window that never connected, ends the wait with a clear reason so your coding agent offers a retry instead of treating it as a cancel, and clears the session automatically so the next open starts a fresh window. Stale or unresponsive Studio background processes are also detected and restarted on their own, so Studio commands always run the version of the plugin you have installed.

Dashboard

Animated replay explainer on the trace plan page

The trace plan’s “What is a replay?” modal now teaches by showing. A side-by-side animation plays the original run recording each call’s input and output, then a replay re-running it: injecting the recorded input, answering mocked calls straight from the recording, and skipping everything nested under a mock. Each call is annotated with its replay plan (runs live, from recording, or skipped) so you can see why the replay behaves the way it does before confirming your plan. With reduced motion enabled, the modal shows the final annotated diagram as a static picture instead.

PluginsCLI

Plugins v0.8.139, CLI v0.2.187

If bitfab login reported that a Studio window was recorded as open but was not responding, there was previously no way to clear it from the command line. Running bitfab login --force now clears the stale Studio session before opening a fresh window, so you can get straight back to signing in. The error message also points you to the flag whenever you hit that state.

Dashboard

More reliable trace ingestion

Bitfab now handles traces and spans that contain Postgres-incompatible text in their raw payloads, so ingestion can continue instead of failing the write. This improves reliability for SDK uploads that include null bytes or malformed Unicode from upstream tools.

PluginsCLI

Plugins v0.8.138, CLI v0.2.186

Readable analyze-repo summaries

bitfab analyze-repo now prints a compact terminal report after uploading draft trace plans, so you can see the selected workflows, frameworks, instrumentation effort, suggested capture methods, replay mocks, and real-data value without opening plan links. The same summary output works across Claude Code, Codex, and Cursor runs, with long lines wrapped for terminal readability and skipped candidates kept in the report.

More reliable trace plan setup

Asking Bitfab to create a trace plan now reliably opens it in Bitfab Studio for review, instead of occasionally rendering the plan inline in the chat. Requests like “create a trace plan” or “instrument the next function” route directly into the setup flow, and following up to instrument another function reopens the Studio confirmation UI automatically.

Dashboard

More reliable trace search indexing

Bitfab now recovers from temporary upstream interruptions while preparing trace search summaries, reducing cases where newly ingested traces fail to become searchable. The retry behavior covers rate limits, provider-side failures, and network transport failures while still stopping on deterministic request errors.

PluginsCLI

Plugins v0.8.135, CLI v0.2.184

Steer `analyze-repo` with a prompt

bitfab analyze-repo now takes free-text guidance so you can point it at the parts of your codebase you care about. Pass --prompt (short form -p, or just a trailing quoted argument) with something like “focus on the billing and checkout flows” and the scan biases toward those areas when picking which AI workflows to draft trace plans for, topping up any remaining slots from the rest of the repo. Combine it with --limit to cap how many plans it uploads.

Go SDK

Go SDK v0.12.4

Go SDK: `drop()` is safe to call across goroutines

Calling drop() on a trace from one goroutine while another goroutine finishes a span on the same trace no longer races on the trace’s dropped state. If your Go service drops traces from a different goroutine than the one running the traced work, that path is now safe.

Dashboard

Bitfab now waits for the browser sign-in handoff to confirm that the CLI received its credentials before showing success. The Studio close page stays in a finishing state while sign-in completes, and if delivery fails it leaves the page open with a retry option instead of making the terminal wait silently.

TypeScript SDKPython SDKRuby SDKGo SDK

TypeScript SDK v0.28.8, Python SDK v0.27.6, Ruby SDK v0.23.6, Go SDK v0.12.3

`drop()` now stops later spans from being sent

Calling drop() on the current trace now prevents any span that finishes afterward from being uploaded at all, so dropping a run that carries sensitive data keeps that data local instead of sending it and clearing it server-side. Spans already sent before the drop() call are still cleared, and the trace is still marked dropped. Available in the TypeScript, Python, Ruby, and Go SDKs.

TypeScript SDKPython SDKRuby SDKGo SDK

TypeScript SDK v0.28.7, Python SDK v0.27.5, Ruby SDK v0.23.5, Go SDK v0.12.2

Discard an in-flight trace with `drop()`

You can now discard the current trace at runtime from your own code, when you decide it shouldn’t be recorded (a health check, a cache hit, any path with no useful signal). Call drop() on the current trace and Bitfab skips it: its inputs, outputs, and spans are never stored, and anything already uploaded for that trace is cleared.

import { getCurrentTrace } from "@bitfab/sdk"

// inside a traced function, when you decide this trace isn't worth keeping
getCurrentTrace().drop()

Available in the TypeScript, Python, Ruby, and Go SDKs (getCurrentTrace().drop(), get_current_trace().drop(), and GetCurrentTrace(ctx).Drop() in Go). The call is always safe: if there’s no active trace it does nothing, and it never throws or interrupts your code.

CLI

CLI v0.2.180

More reliable analyze-repo from the CLI

bitfab analyze-repo now runs non-interactively through Claude Code, Codex, and Cursor Agent, so you can scan a repository and upload draft trace plans without opening an editor UI. Use --editor to choose the agent and --limit to cap how many plans are drafted.The CLI also gives these headless runs clearer outcomes: uploaded plans share a Bitfab run identity, logs redact API keys safely, and terminated agent processes fail with a clear error instead of looking successful.

DashboardPluginsCLI

Plugins v0.8.131, CLI v0.2.179

Analyze-repo runs link plans and session logs

The bitfab analyze-repo command now gives each scan a shared run identity, so the draft trace plans it uploads and the optional captured Claude Code session are tied together in Bitfab. This makes it easier to audit what the agent found, which plans it created, and the conversation that produced them.

PluginsCLI

Plugins v0.8.129, CLI v0.2.175

Codex can run analyze-repo headlessly

bitfab analyze-repo --editor codex now runs through codex exec without opening the Codex TUI, using the same non-interactive scan and draft trace-plan upload flow that was already available through Claude Code. Cursor Agent is now supported too through its --print headless mode.The bitfab analyze-repo command now opens the Bitfab sign-in flow when an interactive run is not authenticated, then continues the scan after login succeeds. Non-interactive runs still stop with clear instructions, and if a stale environment API key blocks verification the CLI explains how to fix it.Plugin login now keeps the Studio sign-in URL visible and reports success as its own status line, which makes local and dev sign-ins easier to follow. Logout now clears project-local credentials before falling back to global credentials, so worktree-specific logins can be reset without affecting other projects.

CLI

CLI v0.2.172

The bitfab CLI now checks whether Claude Code, Codex, or Cursor is signed in before it launches setup, assistant, SDK update, or analyze-repo. If the editor agent is logged out or the editor CLI is missing, Bitfab stops early with the login command to run instead of opening an agent session that fails later.

CLI

CLI v0.2.171

Command-specific help in the Bitfab CLI

The bitfab CLI now shows help for individual commands, so you can check the right flags and usage without triggering the command itself. Use bitfab help <command>, <command> --help, or <command> -h to inspect commands before running onboarding, login, install, or other workflows.

Plugins

Plugins v0.8.126

More accurate natural-language routing in the plugin skills

The setup and assistant skills now route free-form requests to the right mode more reliably. Phrases like “trace a new workflow,” “why aren’t my traces showing up,” or “did my fix work on this trace” land in the correct mode without you having to name it. Each skill also lists its modes and what they do up front, so the full set of things it can do is visible at a glance.

Plugins

Plugins v0.8.125

Clearer plugin setup status

Bitfab plugin setup now makes local and dev authentication easier to verify. Login success messages include the non-production endpoint when the plugin is pointed away from production, and plugin update checks show the version that is already installed or was just updated.

DashboardPlugins

Plugins v0.8.124

Trace plans isolate external calls by default

Setup-generated trace plans now mark mockable external reads and side effects, such as database queries, HTTP calls, and writes, to return their recorded output during replay. LLM calls and local code stay live by default, so replay still tests the model behavior you are trying to improve. If an external parent span would skip live child spans, the server now keeps that parent live and expects the smaller external boundary to be mocked instead.

Plugins

Plugins v0.8.123

Replay scripts stay aligned with production roots

Bitfab setup and assistant workflows now require generated replay scripts to call the same production root wrapper for traced functions, instead of introducing a replay-only helper that can drift from runtime behavior. For handler-based integrations, replay guidance now points back to the same production framework entrypoint and includes a root-parity checklist before setup finishes.

PluginsTypeScript SDKPython SDKRuby SDK

Plugins v0.8.122, TypeScript SDK v0.28.6, Python SDK v0.27.4, Ruby SDK v0.23.4

More reliable replay results

Replays run through the Bitfab plugin no longer fail to report their result when the replay script prints extra output (framework logs, env-loader noise). The TypeScript, Python, and Ruby SDKs now write the full replay result to a file the plugin reads directly, so a passing replay is never mistaken for a failed one.

Plugins

Plugins v0.8.121

Scan a repo for AI workflows from the terminal

Run npx bitfab-cli analyze-repo to headlessly scan a repository for its AI workflows and upload a draft trace plan for each of the top candidates, with no prompts and no code changes. Cap how many plans it uploads with --limit (default 5), then review and confirm the drafts in Studio. Available for Claude Code.

Plugins

Plugins v0.8.120

Setup asks before rewriting your code

When /bitfab:setup instruments a function, it now pauses and asks for approval before restructuring an existing framework or SDK call to attach a trace, instead of rewriting it silently. Purely additive instrumentation (wrapping an unchanged call) proceeds as before; only a change that would modify existing code stops for your confirmation, and any rewrite you approve preserves the original behavior exactly.

Dashboard

Trace plans stay valid longer

Trace plans no longer expire after 30 minutes. When you set up tracing and step away before confirming a plan, it now stays valid for 7 days, so you can pick up where you left off instead of recreating it.

Plugins

Plugins v0.8.119

More reliable Studio sessions

Studio now stays put across restarts. If Studio’s background process reloads, your active session is restored and reopens on the page you were last viewing instead of being lost or opening a duplicate window. Switching between pages (a dataset, a trace plan, the experiments view) now reuses the open Studio tab instead of closing and reopening it.

DashboardPlugins

Plugins v0.8.118

Verified replay labels in assistant runs

Bitfab plugins now verify replay labels immediately after persisting them, so benchmark scorecards only continue once the server reports the expected effective PASS, FAIL, or skipped state. During assistant and replay workflows, persistReplayLabels parses the update_agent_labels response and stops with verification-failed if any trace label is missing or mismatched, giving the agent a clear retry path instead of reporting partial results.

DashboardPlugins

Plugins v0.8.117

Framework-aware replay mocking

The trace planner now decides which spans can be mocked on replay based on how your framework captures them. Spans your code wraps directly can be mocked; spans a framework observes from the outside (LangChain, LangGraph, and similar callback-based integrations) re-run live on replay instead, while Vercel AI SDK model calls stay mockable. This keeps model calls from being dropped or wrongly mocked when you set up tracing on a framework app with /bitfab:setup.

Mock a span the planner flagged

When the trace planner marks a span as not mockable, you can now override it and mock it anyway from the trace plan page in Studio. The choice is kept as a warning rather than blocked, and the plan footer shows a warning count so you can review these before confirming.

Plugins

Plugins v0.8.116

Batch-analyze a repo for what to trace

The Bitfab setup plugin can now scan a whole codebase and draft trace plans in one non-interactive pass. Run /bitfab:setup analyze-repo and it finds your AI workflows, picks the top few worth tracing, and uploads a draft trace plan for each, without prompting or changing any code. Review the drafts in Studio, then run /bitfab:setup instrument on the ones you want to wire up.

DashboardPlugins

Plugins v0.8.115

Replay status is available as an MCP tool

The Claude, Cursor, and Codex plugins now expose get_replay_status directly through the local Bitfab MCP server. During replay-based assistant runs, agents can map local replay trace IDs to server trace IDs while the run is still in progress, so they can persist per-trace verdicts incrementally without relying on a separate command wrapper.The plugin MCP tool list has also dropped the deprecated grader tools, keeping trace inspection, datasets, labeling, experiments, setup, and replay status aligned across the editor plugins and Bitfab MCP endpoint.

DashboardPlugins

Plugins v0.8.114

Clearer replay label errors

Replay label persistence now fails loudly when a verdict batch includes trace IDs that do not exist in the active organization, instead of saving only part of the batch and treating the rest as skipped. The Claude, Cursor, and Codex plugins now guide agents to remap replay results to server replay trace IDs and retry, so experiment labels are less likely to disappear behind a misleading success message.

DashboardPlugins

Plugins v0.8.113

Trace plans analyze context nodes before review

Bitfab setup now asks Claude, Cursor, and Codex to classify replay behavior for every trace-plan node, including surrounding context nodes that are not initially captured. When you toggle those context nodes into capture from the trace-plan review, they already have a replay decision instead of needing a second classification pass.Modify flows now backfill missing analysis on older trace plans before opening review, so expanding an existing plan uses the same per-node replay guidance.

DashboardPlugins

Plugins v0.8.112

Clearer dataset wording in assistant fixes

The Bitfab assistant fix flow now describes the final step as adding the trace to a dataset with a validated failing label, instead of using capture language for dataset membership. In Claude, Cursor, and Codex, the flow still replays the target trace first, then saves it to a dataset only after the replay passes and lets you choose Studio, a full dataset rerun, another iteration, or stop.

Plugins

Plugins v0.8.111

Assistant fixes ask before guessing

The Bitfab assistant fix flow now confirms why a trace is wrong before changing code when the trace or conversation does not already provide a clear failure reason. In Claude, Cursor, and Codex, targeted trace fixes reuse an existing failing label, a user-stated defect, or obvious trace evidence; otherwise the agent asks what correct behavior should be before replaying or saving the trace as a regression test.

Dashboard

See when experiment results are still settling

When you open the code-change summary for an experiment that is still running, it now shows how many replays are still running or awaiting agent labels, with a note that the breakdown is provisional and will keep updating as they finish. Before, the summary presented partial results as if they were final.

More accurate unpaired counts

Replays that are still running, awaiting labels, or errored are no longer counted as “unpaired” in the experiment breakdown, and each now appears as its own segment in the run progress bar. The unpaired count now reflects only replays that genuinely could not be matched to an original.

Dashboard

Experiment replays update as traces finish

Experiment pages now refresh replay progress as each trace finishes, so results appear while the replay is still running instead of waiting for later labeling work. Runs outside an experiment group no longer trigger experiment updates, keeping grouped experiment views focused on the runs they are showing.

Plugins

Plugins v0.8.110

Trace descriptions route to assistant fixes

The Bitfab assistant now treats requests like “find and fix the trace where…” as targeted fixes, even when you describe the bad output instead of pasting a trace ID. In Claude, Cursor, and Codex, the fix flow first verifies that the request is trace-backed, then uses local instrumentation and trace search to find the matching failing trace before making changes.

Dashboard

Expired trace plans show their final state

Trace plan review now shows when an awaiting plan has expired instead of letting you try to confirm it and then showing an error. The review bar switches to a disabled Plan expired action as soon as the plan is stale, including when an already-open review page crosses its expiry time.

DashboardPluginsTypeScript SDK

Plugins v0.8.108, TypeScript SDK v0.28.5

More reliable BAML client execution

BAML client execution is more reliable across the Bitfab dashboard and TypeScript SDK. Generated OpenAI clients no longer send unsupported temperature options to GPT-5 and o-series models, while supported OpenAI, Claude, and Gemini clients keep deterministic sampling where the provider accepts it.Shared Bitfab BAML fallbacks now use current tested Gemini and Claude models, so duplicate detection, entity extraction, trace summaries, and custom BAML prompts avoid stale model endpoints that could fail before the prompt ran.

DashboardPlugins

Plugins v0.8.107

Trace plans show when spans cannot be mocked

Trace plan review now distinguishes spans that can return recorded output during replay from spans that must re-run live. When Bitfab cannot mock a span because the recorded output is not serializable or the span comes from library instrumentation, the plan keeps it on re-run and shows the reason in the replay control tooltip, so reviewers know why the mock toggle is unavailable.The Claude, Cursor, and Codex plugins now pass that mockability metadata through create_trace_plan, so generated plans can mark those spans before you confirm instrumentation.

PluginsTypeScript SDKPython SDKRuby SDK

Plugins v0.8.105, TypeScript SDK v0.28.4, Python SDK v0.27.3, Ruby SDK v0.23.3

Replay mocks are marked by default

Replay now uses mock: "marked" by default in the TypeScript, Python, and Ruby SDKs. A span tagged with mockOnReplay: true / mock_on_replay: true now returns its recorded output during replay without also passing a mock option. Pass mock: "none" when you explicitly want every child span to run real code, or mock: "all" when you want every child span to return historical output.The docs now treat replay mocking as a first-class workflow, with a dedicated Replay Mocking guide linked from the introduction and SDK pages.

DashboardPlugins

Plugins v0.8.104

Redesigned trace plan review

The trace plan you confirm when setting up tracing has a cleaner, more informative review screen. Each captured span is now labeled with what it does, code, llm call, read, or write, and whether replay re-runs it live or serves its recorded output, so you can see at a glance how your workflow will replay. The span tree is easier to scan: you can toggle spans in or out of the capture set and mark which ones to mock on replay, and a “How to review” guide walks you through it the first time.

Plugins

Plugins v0.8.103

Manual Studio links in plugin flows

Bitfab plugins now print a copyable Studio link whenever they open a fresh Studio page, so you can still get to login, setup, datasets, experiments, and trace-plan flows if your editor hides the browser launch. The agent instructions for Claude, Cursor, and Codex now tell the agent to surface that link in chat as Studio opened: <url>, while the command output also includes Studio opened at: <url> for terminal visibility.

Plugins

Plugins v0.8.102

Setup signs you in before instrumenting

Running /bitfab:setup instrument now signs you into Bitfab before it starts analyzing your code, so instrumenting a new AI workflow no longer stalls partway through when you aren’t logged in. Previously the flow could begin and then fail later at the trace-plan step; it now authenticates up front, matching how the other setup modes already work.

Plugins

Plugins v0.8.101

Safer failed-fix regression capture

The Bitfab assistant now only marks an unresolved fix as saved after the failing trace is actually attached to the selected dataset. If a dataset attach is skipped, the Claude, Cursor, and Codex plugins keep the fix unsaved and guide the agent to choose a function-scoped dataset instead, so a failed fix cannot disappear without a real dataset entry.Codex Studio navigation also returns control to the conversation once the Studio page is open and ready to report events, instead of waiting in the foreground while you inspect the page.

Plugins

Plugins v0.8.100

More reliable Codex session capture

The Codex plugin now captures active Bitfab sessions more reliably while you work, so chat-session history is less likely to miss turns during tool-heavy workflows. Final session uploads still complete when Codex stops, and late background capture work can no longer revive stale local progress after a session has closed.

Dashboard

Coding-agent session summaries

Bitfab now turns ended coding-agent chat sessions into structured product-feedback summaries, so teams can see what the agent tried, where Bitfab helped, where the plugin or service got in the way, and what should be improved next. Idle sessions are closed automatically, reopened sessions are summarized again after new work, and configured webhooks can send the summary to Slack with a link back to the chat session.

DashboardPlugins

Plugins v0.8.98

Trace environment in `read_traces`

read_traces now shows each trace’s environment alongside the trace id, function, status, and timing details, so agents can distinguish production, staging, local, and unset traces during investigations. When a trace has no stored environment tag, the tool prints unset instead of hiding the field.

PluginsDashboard

Plugins v0.8.96

Test-first fixes with the assistant’s `fix` command

The assistant’s fix command now proves a fix before it saves anything. It diagnoses the failing trace, makes the change, and replays just that one trace; only once the replay passes does it add the trace to a dataset with a validated failing label, then offers to re-run the whole dataset (in Studio or your terminal). If the fix does not land, you can save the trace as a failing test to revisit later. When you open the fixed trace in Studio, the experiments page now lands directly on that trace’s before/after comparison instead of a list you have to click into. Available in the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.95

Replay your first captured trace during setup

When you instrument a workflow with /bitfab:setup, Bitfab now waits for your first trace to land and then offers to replay that exact trace right away, so you can confirm your replay script works before moving on. Available in the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.93

Codex uses project-local Bitfab credentials from cached launches

The Codex plugin now resolves its active worktree even when the MCP server starts from Codex’s plugin cache without a recorded session id. That keeps setup, login, and plugin actions pointed at the project’s local Bitfab connection instead of falling back to stale global credentials.

DashboardPlugins

Plugins v0.8.92

More dependable coding-agent session capture

Bitfab now keeps coding-agent chat sessions in order across transcript compaction and retry recovery, so older turns stay attached to the same session instead of shifting sequence positions. Sessions that stop without new turns also preserve their real activity and end times, and the dashboard now recognizes Bitfab slash-command activity in the same session activity view.

PluginsTypeScript SDKPython SDKRuby SDK

Plugins v0.8.91, TypeScript SDK v0.28.2

More reliable replay mocks

Replay mocking in the TypeScript SDK now matches older historical spans even when the recorded span tree does not include a span name, falling back to the trace function key for the mock lookup. This keeps mock: "all" and mock: "marked" replay runs working on older traces instead of rerunning child spans that were meant to use their historical outputs.The TypeScript and Python replay examples now show both mock modes with mockOnReplay and mock_on_replay, so you can verify that expensive child steps are skipped before running a full experiment.

Plugins

Plugins v0.8.90

Cleaner final replay results

The Bitfab assistant now keeps the final replay event log focused on one server-backed item reference per replayed trace, even when progress events already wrote per-trace payload files. This prevents assistant evaluations from scoring the same replayed trace twice in mixed-payload runs, while live progress rows still support incremental evaluation as traces finish.

PluginsTypeScript SDKPython SDKRuby SDK

Plugins v0.8.89, TypeScript SDK v0.28.1, Python SDK v0.27.1, Ruby SDK v0.23.1

Live per-item replay evaluation

The Bitfab assistant now scores and records each replayed trace as soon as it finishes, instead of waiting for the entire replay run to complete. On long replays, Studio’s experiment view fills in pass/fail verdicts trace by trace while the run is still going, so you see results as they land instead of all at once at the end.The replay onProgress callback in the TypeScript, Python, and Ruby SDKs now reports per-item detail for each trace as it settles, including the deserialized inputs, the replayed output, the original output, and token usage, so tools watching a running replay can evaluate results mid-run rather than waiting for the final ReplayResult.

TypeScript SDKPython SDKRuby SDKGo SDK

TypeScript SDK v0.28.0, Python SDK v0.27.0, Ruby SDK v0.23.0, Go SDK v0.12.0

Reliable tracing regardless of API key load order

The Bitfab SDKs now resolve your API key the first time a traced function runs, instead of when the client is constructed. If your key (or .env file) loads after the client is created, tracing still activates, so a client built before your environment is ready no longer silently drops every trace.All four SDKs (TypeScript, Python, Ruby, Go) also gain a BITFAB_API_KEY environment fallback when you don’t pass a key explicitly, a callable key form for deferred resolution (e.g. new Bitfab({ apiKey: () => process.env.BITFAB_API_KEY })), and an opt-in strict mode that fails loud on a missing key instead of disabling tracing quietly.

Plugins

Plugins v0.8.86

Trace-first assistant investigations

The Bitfab assistant now grounds traced-function investigations in actual trace evidence before recommending a fix or revert. When it investigates an AI workflow failure, it first derives the function key from local instrumentation, searches the matching traces, and separates what the trace proves from code-based inference, so debugging starts from the run that failed instead of only from static code.

Plugins

Plugins v0.8.82

Targeted trace fixes before dataset runs

The assistant fix flow now replays only the target failing trace first, then asks whether to inspect the before/after in Studio, run the full dataset, keep iterating, or stop. This makes /bitfab:assistant fix <trace-id>, /bitfab-assistant fix <trace-id>, and $bitfab:assistant fix <trace-id> faster to validate and prevents a dataset run from hiding whether the original bug is green. If you choose the full dataset run, Bitfab opens Studio as an experiment so you can catch regressions after the target fix is proven.

DashboardPlugins

Plugins v0.8.81

Clearer LangChain setup guidance

Bitfab setup guidance now recognizes LangGraph and LangChain projects earlier and recommends the callback handler instead of manually wrapping graph nodes, tools, retrievers, or model calls. The docs and setup prompts also clarify that the handler already records a replayable framework root, so you only need a same-key outer span when there is meaningful application work around the graph or chain invocation.

DashboardPluginsTypeScript SDKPython SDK

Plugins v0.8.80, TypeScript SDK v0.27.2, Python SDK v0.26.2

In-progress LangGraph and LangChain traces

Bitfab now shows LangGraph and plain LangChain runs as in progress as soon as the framework callback root starts, so long-running agents appear in the dashboard before their final output is available. The TypeScript and Python callback handlers also keep configured LangChain run names on chain, model, tool, and retriever spans, making trace trees easier to line up with your graph or chain config.

const handler = bitfab.getLangGraphCallbackHandler("support-agent")
await agent.invoke(input, { callbacks: [handler] })

Cleaner framework replay inputs

Python tool spans now preserve empty structured tool inputs as {} instead of falling back to raw text, and TypeScript chain callbacks handle both LangChain callback argument orders for run names and parent IDs. This keeps handler-captured traces more accurate for replay and avoids nested framework callbacks marking the outer trace complete too early.

Plugins

Plugins v0.8.79

Project-local plugin credentials from cached launches

Bitfab plugins now keep using your project-local connection settings when Claude Code, Cursor, or Codex launches the plugin from a cache or nested workspace. Local config.local.json and credentials.local.json files are resolved together from the same project search path, so agents are less likely to fall back to the wrong workspace or global credentials.

Plugins

Plugins v0.8.78

Studio navigation commands now work consistently in the Claude Code, Cursor, and Codex plugins when a Bitfab skill opens a specific Studio page. The plugins now ship matching command wrappers, so agent workflows can rely on the same Studio actions regardless of which editor plugin you use.

DashboardPluginsTypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.27.1, Python SDK v0.26.1, Ruby SDK v0.22.1, Plugins v0.8.77

Name replay experiments

You can now give Bitfab experiments a readable name when you create a dashboard test run or start a replay. Experiment cards show the name next to the short run ID, making it easier to compare baselines, prompt edits, and code-change runs without opening each result.

Replay names in SDKs and plugins

TypeScript, Python, and Ruby replay calls now accept a name option, and replay scripts can forward it with --name. The Bitfab plugins detect whether the installed SDK and script support experiment names before using the flag, and recommend an upgrade when a project is too old.

TypeScript SDKPython SDKRuby SDKPluginsDashboard

TypeScript SDK v0.27.0, Python SDK v0.26.0, Ruby SDK v0.22.0, Plugins v0.8.76

Live replay progress in setup and assistant runs

Bitfab replay runs can now stream per-trace progress in Claude, Cursor, and Codex while the replay keeps running in the background. The plugins show each trace as it settles, write a per-run event log under .bitfab/replays/<run-id>/events.jsonl, and store full per-item outputs under .bitfab/replays/<run-id>/items/ so agents can review complete outputs without mixing progress logs into the result JSON.

Replay progress reporters for SDK scripts

TypeScript, Python, and Ruby SDKs now include ready-made replay progress callbacks: reportReplayProgress, report_replay_progress, and Bitfab.report_replay_progress. Pass them into replay scripts to emit plugin-readable progress events while stdout stays reserved for the final ReplayResult JSON.

TypeScript SDKPython SDKRuby SDKGo SDKPluginsDashboard

TypeScript SDK v0.26.2, Python SDK v0.25.2, Ruby SDK v0.21.2, Go SDK v0.11.2, Plugins v0.8.75

Capture warnings stay out of execution errors

SDK spans that have to degrade serialization now mark those payload warnings as SDK-sourced capture notices. The dashboard shows them as a bordered “Capture incomplete” tag on the span instead of mixing them into execution errors, so real runtime failures stay distinct from lossy capture metadata.

Plugins

Plugins v0.8.72

Studio opens in an app-style window from Codex

The Bitfab Codex plugin now opens Studio in the same focused app-style browser window used by the other editor plugins when Chromium supports it. If the app-window launch cannot start, the plugin falls back to your normal browser, and BITFAB_DISABLE_CHROME_APP_WINDOWS=1 still forces the tabbed fallback.

Dashboard

Chat-session capture handles unsupported text characters

Bitfab now keeps coding-agent chat-session capture working when terminal output includes characters that databases cannot store in JSON fields. Unsupported characters are safely replaced so the rest of the transcript, tool calls, and usage details remain available in the dashboard.

Dashboard

More reliable chat session capture

Bitfab now captures coding-agent chat sessions more reliably when turns include token usage details. This keeps session histories and usage metadata flowing into the dashboard instead of dropping affected turns during ingestion.

PluginsDashboard

Plugins v0.8.67

Fix a failing trace end to end

The Bitfab assistant has a new fix mode that takes one failing trace and drives it to green. Point it at a trace (/bitfab:assistant fix <trace-id> in Claude Code, /bitfab-assistant fix <trace-id> in Cursor) and it adds the trace to a dataset with a validated failing label, edits the code, then replays until the trace passes, flagging any regressions and offering to fix the other failing traces too. It only engages when you point it at a real Bitfab trace, so an ordinary “fix this bug” on untraced code still goes through normal coding. Works across the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.66

Signing in to the Bitfab plugin no longer opens duplicate Studio windows when a login is retried before it finishes. The plugin now reattaches to the window that’s already waiting for you, so a sign-in that looks slow won’t stack up extra browser windows.

PluginsDashboard

Plugins v0.8.64

Filter trace search by database snapshot

The search_traces tool now accepts a hasDbSnapshot filter, so you can scope a search to traces that captured a database snapshot, or only those that didn’t. Use it to quickly find the traces that can be replayed against their historical database state. Works across the Claude, Cursor, and Codex plugins.

Dashboard

Signing in to the Bitfab plugin with Google now completes reliably and connects your coding agent. Previously a Google sign-in could leave the plugin waiting on a login that never registered.

Dashboard

Faster usage page

The usage page loads noticeably faster, especially for organizations with high trace volumes, and switching the function filter now applies instantly instead of pausing while the page catches up.

Dashboard

Shareable links for usage filters

The usage page now keeps your function filter in the page URL, so a filtered view can be bookmarked or shared as a link and it stays put as you move around. Your browser’s back and forward buttons also step through filter changes.

Dashboard

Accurate “Snapshot captured” badge on traces

The “Snapshot captured” badge now appears only on traces that actually captured a database snapshot, instead of on every trace once your organization connected a source database. The trace list and trace view now give an accurate at-a-glance signal of which traces can be replayed against their database state from when they ran.

Dashboard

Break down usage by function

The usage page can now be scoped to a single traced function: pick a function and the totals, chart, periods table, and CSV export all narrow to just that one. A new “By function” chart mode plots one line per function so you can compare volume across functions over time. The function picker and chart mode appear once your organization has more than one traced function.

TypeScript SDKPython SDKPlugins

TypeScript SDK v0.26.0, Python SDK v0.25.0, Plugins v0.8.60

Bind framework handlers to your trace key once

The getFunction() handle now hands you framework handlers and middleware already bound to its trace key, so an outer span and the handler share one key without repeating the string. Available for the Claude Agent SDK, LangGraph/LangChain, and (TypeScript only) the Vercel AI SDK.

const pipeline = bitfab.getFunction("my-agent")

const runAgent = pipeline.withSpan({ type: "agent" }, async (prompt) => {
  const handler = pipeline.getClaudeAgentHandler() // same key, no retyping
  // the handler-traced agent run nests under this span
})

Python exposes the same on get_function(): pipeline.get_claude_agent_handler() and pipeline.get_langgraph_callback_handler().

TypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.25.0, Python SDK v0.24.0, Ruby SDK v0.21.0

Live progress callbacks for replay

Replay now takes an optional progress callback, so you can show live progress while a run is in flight instead of waiting for it to finish. It fires once per trace as each one settles, with running totals you can render however you like.

await bitfab.replay("my-function", fn, {
  onProgress: ({ completed, total, errored }) => {
    process.stderr.write(`\rReplaying ${completed}/${total} (${errored} errored)`)
  },
})

It’s onProgress in the TypeScript SDK and on_progress in Python and Ruby. The totals report how many traces have finished, how many ran without error, and how many threw; pass/fail verdicts are assigned after the run, so the live totals split ran-ok versus errored. Upgrade to @bitfab/sdk v0.25.0, the bitfab Python package v0.24.0, or the Ruby gem v0.21.0.

Dashboard

Clearer experiment run summaries

Experiment run rows now report accurate counts: a replay that finished with an error shows as “errored” instead of “awaiting labels”, and a trace still awaiting an agent label is no longer also counted as “unpaired”. Hover any status pill to see the full breakdown of passing, failing, awaiting, and errored traces. The row also stays readable at any panel width, collapsing to the essentials when space is tight.

TypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.24.1, Python SDK v0.23.2, Ruby SDK v0.20.2

Replay results report the replayed run’s token usage

When you replay traces, each result item’s tokens now reflects the token usage of the replayed run instead of the original trace. Compare it against the original trace’s recorded usage to see how a code change moved cost. durationMs and model stay as the original trace’s reference values, and the token counts match what the experiments view shows. Upgrade to @bitfab/sdk v0.24.1, the bitfab Python package v0.23.2, or the Ruby gem v0.20.2.

Dashboard

See database snapshot status on your traces

Traces now show whether they can be replayed against the database state from when they ran, and whether a replay actually used that historical state. A “Snapshot captured” badge marks an original trace you can replay against its past database; on replays, “Snapshot replayed” means your code read the historical database branch and “Snapshot unused” means it fell back to the live database. The indicator appears on the trace list, the trace and span detail headers, and in dataset and experiment rows, and only shows for organizations with a connected source database.

Watch experiment results appear live during replay

When you start a replay, its experiment now appears right away and fills in trace by trace as results come back, instead of staying on “Waiting for experiments to start…” until the whole replay finished. You can watch the pass and fail counts climb in real time as each trace completes.

TypeScript SDK

TypeScript SDK v0.24.0

The TypeScript SDK never crashes or hangs your app

Tracing is now fully fail-open. If anything in the SDK’s instrumentation goes wrong (a runtime without a usable crypto, an oversized payload, a serialization edge), your traced function still runs and returns its real value, and your Node process still exits cleanly instead of being held open by a pending timer. When tracing has to degrade, the SDK logs a one-time [bitfab] warning so a dropped span is visible rather than silent. Upgrade to @bitfab/sdk v0.24.0.

Type-checking no longer needs the OpenAI Agents package

The SDK’s published type definitions no longer reference @openai/agents, so projects that don’t use the OpenAI Agents integration type-check cleanly without installing that package (previously tsc could fail with “Cannot find module ‘@openai/agents’” under skipLibCheck: false). If you use getOpenAiAgentHandler(key).wrapRun(...), its result is now typed structurally: read finalOutput and cast it to your agent’s output type.

Dashboard

More reliable trace ingestion under load

Traces now ingest reliably during high-volume bursts. We fixed a case where ingesting a trace could time out and fail when its search index was being built inside the same database operation that saved the trace. Search indexing now runs after the trace is safely stored, so a slow index can never block or fail ingestion.

TypeScript SDKPlugins

TypeScript SDK v0.23.3, Plugins v0.8.54

TypeScript SDK: builds no longer fail on unused optional integrations

If you used the TypeScript SDK with one integration (like the Vercel AI SDK) but not others, your bundler could fail at build time with Module not found: Can't resolve '@openai/agents', even though you never used the OpenAI Agents integration. The SDK no longer references its optional peer dependencies (@openai/agents, @boundaryml/baml) in any way a bundler tries to resolve up front, so your app only needs to install the integrations it actually uses. Upgrade to @bitfab/sdk v0.23.3.

TypeScript SDKPlugins

TypeScript SDK v0.23.2, Plugins v0.8.52

Vercel AI SDK tracing

Bitfab now traces the Vercel AI SDK out of the box. Wrap your model with getVercelAiMiddleware and every generateText, streamText, generateObject, and streamObject call is captured as a traced LLM span, including streaming responses and which provider answered each call (handy when you fall back between models). The Bitfab plugin also detects and instruments Vercel AI SDK projects automatically during setup.

import { openai } from "@ai-sdk/openai"
import { streamText, wrapLanguageModel } from "ai"

const model = wrapLanguageModel({
  model: openai("gpt-4o"),
  middleware: bitfab.getVercelAiMiddleware("chat-turn"),
})

streamText({ model, messages }) // traced; your live stream is untouched

Ruby SDK

Ruby SDK v0.20.1

Ruby SDK guards against replay key mismatches

replay() in the Ruby SDK now raises an ArgumentError when the trace_function_key: you pass does not match the key the method is actually traced under, instead of silently fetching one function’s history and recording the run under another. This brings Ruby in line with the TypeScript and Python SDKs.

TypeScript SDKPython SDK

TypeScript SDK v0.23.1, Python SDK v0.23.1

Replayed traces match the original trace structure

Replaying a handler-instrumented run (OpenAI Agents, Claude Agent SDK) now produces a root span named after your trace function key, matching the original production trace instead of the replayed function’s name. For OpenAI Agents, a replayed run also nests under a single root span instead of an extra duplicated one, so a replay’s span tree lines up with the trace it replays.

Plugins

Plugins v0.8.50

Closed Studio windows stay closed

When you close a Studio window, it now stays closed. Previously the plugin could reopen a Studio window on its own shortly after you dismissed it; that no longer happens. Studio windows open only when you or your coding agent explicitly ask for one.

Plugins

Plugins v0.8.48

Recovering a stalled Studio session

If a Studio window crashed, was closed, or your machine went to sleep, reopening Studio could stay stuck reporting that the existing session was unreachable, even after you cleared it. Clearing a stalled session now reliably reopens a fresh Studio window.

Plugins

Plugins v0.8.47

More reliable Studio sessions

Fixed an issue where leftover Studio background helpers from earlier plugin versions could pile up and cause a Studio session to hang or fail to open. The plugin now clears out stale helpers before starting a session, so opening Studio stays reliable across plugin updates.

TypeScript SDKPython SDK

TypeScript SDK v0.23.0, Python SDK v0.23.0

LangGraph and LangChain retriever steps are now traced

Retriever calls in your LangGraph or LangChain graphs now appear as spans, capturing the query and the documents returned, so retrieval shows up alongside the rest of your agent’s work. No code change is needed beyond the callback handler you already pass.

Trace streamed OpenAI Agents runs

You can now trace streamed agent runs. In TypeScript, pass { stream: true } to wrapRun; in Python, use the new wrap_run_streamed async generator as a drop-in for Runner.run_streamed. The run’s input and final output are recorded on the root span once the stream finishes, so streaming runs are traced without changing how you consume the events.

async for event in handler.wrap_run_streamed(agent, "What's the weather?"):
    ...  # handle each streamed event

Read the BAML Collector after a call

wrapBAML / wrap_baml now expose the BAML Collector from the most recent call through a .collector attribute, plus an onCollector / on_collector callback that runs after each invocation. Use it to inspect the prompt, model, and token usage yourself, in addition to the metadata Bitfab already captures.

classify = bitfab.wrap_baml(b.ClassifyText, on_collector=lambda c: log(c))

Dashboard

Faster trace lists for high-volume functions

The traces list now loads quickly for functions with very large trace volumes, where it could previously take several seconds or fail to load. The speedup is automatic, with nothing to configure.

TypeScript SDKPython SDKPlugins

TypeScript SDK v0.22.0, Python SDK v0.22.0, Plugins v0.8.45

Replayable OpenAI Agents runs with one line

If you trace the OpenAI Agents SDK with Bitfab, you can now make agent runs replayable with a drop-in change. The new run wrapper records the run’s input as a replayable root, with the tracing processor’s spans nested underneath, so the trace replays by key with no hand-written wrapper. Keep registering the processor for the internals and swap your run call for the wrapper.

const handler = bitfab.getOpenAiAgentHandler("my-agent")
const result = await handler.wrapRun(agent, "user input")

In Python, use get_openai_agent_handler("my-agent").wrap_run(agent, "user input").

Dashboard

More reliable organization switching

Switching between organizations now reliably loads the selected organization’s data. Previously, changing orgs (including opening a Studio page for a resource that belongs to another organization you’re a member of) could leave the view stuck on your previous organization and show missing or “not found” data until you refreshed. The switch now finishes applying before the page reloads, so the correct organization’s data loads the first time.

Plugins

Plugins v0.8.44

Replayable instrumentation for OpenAI Agents

When you set up Bitfab tracing, it now makes sure each traced workflow can be replayed, not just observed. For OpenAI Agents, that means wrapping the agent run in a root span that captures its input, so you can re-run the trace against your current code. The setup health check also flags any instrumented function whose root can’t be replayed and points you to the fix.

TypeScript SDKPython SDK

TypeScript SDK v0.21.2, Python SDK v0.21.3

Full Claude Agent SDK tracing in TypeScript

Tracing for the Claude Agent SDK now captures complete agent runs in TypeScript: every LLM turn, tool call, and subagent becomes a span, with token usage included. Get a handler with getClaudeAgentHandler, inject the hooks with instrumentOptions, and wrap the query() stream:

const handler = bitfab.getClaudeAgentHandler("my-agent")
const options = handler.instrumentOptions({ model: "claude-sonnet-4-5" })
for await (const message of handler.wrapQuery(query({ prompt, options }), { input: prompt })) {
  // your normal message handling
}

Replay handler-traced agent runs

Agent runs captured by the Claude Agent SDK handler are now replayable without wrapping your code in an extra span. Pass the prompt as input to the wrap call and the handler records it as the run’s root, so replay() can re-run each historical prompt against your current code. Works in both the TypeScript and Python SDKs.

Dashboard

Trace and dataset labeling panels adapt to narrow widths

The dataset review and trace detail views now reflow to fit tight spaces, so labeling traces in the Studio side panel stays comfortable even when the panel is narrow. The labeling toolbar collapses onto a compact top row, and a trace’s input, output, and context sections stack cleanly instead of overflowing or leaving uneven gaps between them.

Compact span tree with hover details

When the trace panel is narrow, the span tree collapses to an icon rail that keeps the full call hierarchy while handing the freed space to the span content. Hover any span to see its name, position, duration, and token count, and the token figure follows your selected token view (all or uncached).

PluginsDashboard

Plugins v0.8.42

Cost-optimize mode for the assistant

The Bitfab assistant has a dedicated cost-optimize mode for cutting token spend. Run /bitfab:assistant cost-optimize <function-key> and it first profiles where a dataset’s tokens go (prompt size, redundant context, output shape, model choice), then edits and replays against your labeled dataset to lower cost while holding the pass rate. The token-cost view turns on automatically, reporting token deltas next to pass/fail. Available across the Claude, Cursor, and Codex plugins.

Before and after on every experiment trace

In the experiments view, any replay trace paired with an original now always shows the before and after comparison: the verdict change, the Original/Updated toggle, and the token trend. Values that aren’t known render as blanks instead of hiding the comparison, so still-passing and still-failing traces are as easy to inspect as fixed and regressed ones.

TypeScript SDKPython SDK

TypeScript SDK v0.21.1, Python SDK v0.21.2

More accurate OpenAI Agents trace capture

Traces from the OpenAI Agents SDK integration now capture tool-call and generation inputs correctly. Previously the TypeScript SDK could drop these inputs and record them as empty, and both SDKs added empty input and response fields to agent and other non-LLM spans. Tool calls, agent steps, and other spans now show exactly the data the run produced. Update to TypeScript SDK 0.21.1 or Python SDK 0.21.2 to pick up the fix.

Plugins

Plugins v0.8.40

Faster labeling and evaluation on large datasets

When the assistant labels a fresh batch of traces or evaluates an experiment, it now judges the traces in parallel on larger datasets, so big batches finish noticeably faster. The speed-up kicks in automatically above roughly 20 traces; smaller batches are unchanged.

PluginsDashboard

Plugins v0.8.39

Read a span field in full when a trace is truncated

When Bitfab shows you a trace, large span fields (a long input or a big tool output) are truncated so the response stays readable. The new read_span_field tool fetches the complete, untruncated value of a single field (input, output, reasoning, content, errors, or context) for one span, so your coding agent can pull the full text only when it actually needs it. read_traces now points you to it whenever a field is truncated. Available across the Claude, Cursor, and Codex plugins.

Dashboard

Experiments show when a replay is awaiting labels

In the experiments view, a replay that has finished running but hasn’t been scored yet is now marked “awaiting agent labels” instead of an indefinite loading spinner. The state appears on the run’s pass-rate pill, its trace rows, and the progress bar, and stays out of the pass rate until a verdict lands, so a run that is done replaying reads as settled rather than stuck.

PluginsDashboard

Plugins v0.8.38

Studio windows close more reliably at the end of a flow

Studio windows now close themselves when a flow finishes, with a backup that helps close the window if the browser does not. You will see fewer stray Studio windows left open after setup, assistant, and other flows wrap up.

Plugins

Plugins v0.8.35

Setting up Bitfab with npx bitfab-cli init (or signing in with bitfab login) now completes reliably. A recent CLI version could fail to open the sign-in window and report that you were not authenticated; the CLI now opens Studio and finishes login as expected.

Faster trace loading in the assistant

When the Bitfab assistant builds or labels a dataset, it now loads the dataset’s traces in parallel batches instead of one group at a time. Large datasets come into context faster and more reliably, with no change to how you run the assistant. This applies across the Claude, Cursor, and Codex plugins.

PluginsPlugin Lib

Plugins v0.8.34

Studio session daemon for persistent browser management

Bitfab plugins now include a session daemon that manages Studio browser windows as a persistent background process. The daemon keeps Studio alive across coding agent restarts: if a browser window crashes, it re-spawns automatically; if you navigate to a new page, it reuses the existing window instead of opening a second one; and if a page refresh fires a transient close event, the daemon waits briefly before treating it as a real close.

Plugins

Plugins v0.8.32

Built-in help on every command

Every Bitfab plugin and CLI command now responds to -h / --help with a usage line, a short description, and per-argument details, then exits without running. You (and your coding agent) can discover how to call a command without leaving the terminal, and bitfab init -h now prints help instead of starting the full onboarding flow.

Dashboard

Studio sessions reconnect after expiring

Returning to Bitfab Studio after a session has been idle now reconnects automatically instead of showing a “Could not connect to this session” error. Previously an expired session left you stuck until you signed in again; the session is now re-established transparently when you come back.

Plugins

Plugins v0.8.26

No duplicate Studio window after signing in

Signing in to Bitfab from your coding agent no longer leaves a leftover Studio window or opens a duplicate. The next action reuses the window you signed in with instead of spawning a second one.

Python SDKPlugins

Python SDK v0.21.1, Plugins v0.8.25

More reliable tracing for Python agent frameworks

The Python SDK no longer drops a span or trace when a value that can’t be JSON-serialized (such as a Pydantic model) appears outside a span’s input or response, like a tool’s output or a trace’s metadata. Spans and traces from the OpenAI Agents, LangGraph, and Claude Agent SDK integrations now always reach Bitfab, and any value that couldn’t be captured faithfully is flagged as non-replayable instead of silently lost.

PluginsDashboard

Plugins v0.8.22

Compact trace reads by default

The read_traces tool now returns bounded, truncated span details by default (scope: "summary"), so the Bitfab assistant can scan many traces in one pass without overflowing its context. Each span keeps the head and tail of its largest fields, so big inputs and outputs stay legible at a glance; pass scope: "full" when you need complete, untruncated detail on a handful of traces.

TypeScript SDKPython SDKPlugins

TypeScript SDK v0.21.0, Python SDK v0.21.0, Plugins v0.8.20

Trace streaming functions without a refactor

You can now instrument a streaming function (one that returns a live response stream) by adding a single finalize option, with no need to restructure your code. Your function still streams to users exactly as before, while Bitfab records a clean, replayable summary of the turn (text, token usage, tool calls) as the trace output.In the TypeScript SDK, pass finalize to withSpan and use the built-in finalizers.aiSdk helper for the Vercel AI SDK:

import { finalizers } from "@bitfab/sdk"

const runChatTurn = bitfab.withSpan(
  "chat-turn",
  { type: "agent", finalize: finalizers.aiSdk },
  () => streamText({ model, messages }),
)

In the Python SDK, decorate an async generator that yields its chunks and pass finalizers.openai_chunks or finalizers.anthropic_events. And when you run setup on a streaming endpoint, Bitfab now guides you straight to this instead of asking you to refactor.

DashboardTypeScript SDKPython SDKRuby SDKPlugins

SDKs v0.20.0, Plugins v0.8.19

Experiments stay linked to their dataset

When you benchmark or run an experiment against a dataset, the run is now durably attached to that dataset, so it always appears on the dataset’s experiments page, even when the underlying traces can’t be matched back by lineage. Before, the link was inferred from shared traces and could be missed.

`datasetId` option when replaying

The TypeScript, Python, and Ruby SDKs’ replay() now takes a datasetId / dataset_id option that attributes the run to a dataset. Passed on its own, it replays exactly that dataset’s traces, so you no longer need to look up and pass the trace IDs yourself.

await client.replay("my-function", processInput, { datasetId: "<dataset-id>" })

The Claude, Cursor, and Codex plugins use this automatically when you benchmark a dataset.

Plugins

Plugins v0.8.18

Point Bitfab setup at a specific workflow

When you set up Bitfab tracing, the plugin now asks how you want to find what to instrument: let it scan your codebase for AI workflows, or point it straight at a specific file, function, or directory. Choose the targeted option when you already know what you want traced and want to skip the full scan. Available in the Claude, Cursor, and Codex plugins.

Plugins

Plugins v0.8.14

Faster Bitfab assistant runs

The Bitfab assistant no longer pauses between steps while it reports progress to the Studio sidebar. Those activity updates are now sent in the background, so the assistant keeps moving instead of waiting on the network. The difference is most noticeable during longer flows like dataset building and experiments.

Dashboard

Studio follows your work across organizations

Opening a Studio link for a dataset, trace plan, trace, test run, or experiment that lives in a different organization than your session started in now switches you into the right organization and loads the page, instead of showing a not-found error. If you belong to only one organization, nothing changes. When you are viewing a page in an organization other than the one your coding agent is connected to, a small indicator in the Studio header makes that clear.Following a link to a specific trace, trace plan, or other page while signed out now takes you to that exact page after you sign in or sign up, instead of dropping you on your default traces view. This works for both email and Google sign-in.

DashboardTypeScript SDKPython SDKPlugins

TypeScript SDK v0.19.1, Python SDK v0.19.1, Plugins v0.8.10

Uncached token basis in the experiments cost view

The experiments cost lens can now show “uncached” tokens (input minus cached reads, plus output), so you can see the tokens you actually pay full price for instead of the cheap cache reads. With the token lens on, switch between All and Uncached from the “Token count” control in the experiments header, and each trace’s cost trend recolors so the two views read apart at a glance. This surfaces cost regressions that prompt caching would otherwise hide behind a flat total.

Claude Agent SDK reports the full prompt size in inputTokens

The Claude Agent SDK handler (TypeScript and Python) now folds cache reads and cache creation into inputTokens, so it reports the full prompt size, consistent with the LangGraph integration. The cached portion is still reported separately as cacheReadTokens. If you read inputTokens directly for cache-heavy calls, expect a larger value than before.

TypeScript SDKPython SDKRuby SDKGo SDKPlugins

TypeScript SDK v0.19.0, Python SDK v0.19.0, Ruby SDK v0.18.0, Go SDK v0.11.0, Plugins v0.8.9

Spans never silently drop on non-serializable inputs

All four SDKs now keep a span even when an input or output can’t be JSON-encoded: the offending value is replaced with a placeholder and a warning flags that the trace may not be replayable, instead of the whole span being dropped. The LangGraph, OpenAI Agents, and Claude Agent SDK handlers, along with the span decorator, also capture nested values more completely, so handler-instrumented traces stay replayable.

Dashboard

Cleared the phantom “Grading” status on trace rows

Trace rows no longer show a spinning “Grading…” status when your project has no graders configured. Previously every incoming trace briefly displayed a grading indicator that never resolved. Traces that do have graders are unaffected and still show their pass and fail scores.

DashboardPlugins

Plugins v0.8.7

Cached tokens in the per-span token breakdown

When the experiments token view is on, each span’s token breakdown now shows cached input tokens alongside input and output, so you can see how much of a span’s prompt was read from cache rather than processed fresh.

Token usage in experiment trace results

The get_experiment_traces tool now reports token usage (input, output, cached, and total) for each replay trace and the original it is compared against. Your coding agent can use this to reason about cost and cache-read changes between runs, not just the pass or fail verdict.

Dashboard

Before/after comparison for replays with an unlabeled original

In the experiments view, a replay whose original trace was never labeled now shows a neutral “Unlabeled” badge alongside its own Pass or Fail result and opens the same before/after comparison as a scored trace, instead of hiding the comparison entirely. Open the replay to toggle between the original and updated runs and see the change in context, even while the replay is still being evaluated.

DashboardTypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.18.2, Python SDK v0.18.2, Ruby SDK v0.17.1

Replay tracks whether the database snapshot was actually used

When you replay traces against a historical database branch, each replayed trace now records whether a branch was provisioned for it and whether your code actually read the branch connection URL. A replay that checks env.active but never reads env.databaseUrl (env.database_url in Python and Ruby) is recorded as not having used the branch, which catches the common silent failure where a connection pool created at module import sends the replay to your live database instead. Tracking is automatic in the TypeScript (0.18.2), Python (0.18.2), and Ruby (0.17.1) SDKs with no code changes; the Database Branching docs describe the three states a replayed trace can record.

Dashboard

See your organization’s usage

A new Usage page (in the user menu) shows how many traces your organization has ingested and how much data they carry over time. Switch between daily, weekly, and monthly views, pick a date range, and read totals with trend indicators at the top; a usage-by-period table breaks it down further and exports to CSV. Dates and time buckets follow your local time zone.

Plugins

Plugins v0.8.4

Ask Bitfab to cut token usage

The Bitfab assistant now responds to token and cost reduction requests. Tell your coding agent something like “use Bitfab to reduce token usage on one of my datasets” and it enters the experiment flow: it changes prompts or code, replays your labeled dataset, and checks that token usage drops without hurting the pass rate. The experiments view shows original vs. replay token totals next to each verdict, so you can see the savings as results stream in.

Dashboard

Skipped replay traces are marked done, not left spinning

When you skip a replay trace during review, the experiments view now shows it with a “Skipped” badge instead of leaving the row spinning as if it were still being evaluated. The run summary reads “9/11 passing · 1 skipped” so a skipped trace is clearly counted as intentionally unscored, rather than dropped from the totals or mistaken for a result that is still loading.

Dashboard

Replay environments branch your organization’s own database copy

When replaying with a ReplayEnvironment, each replayed item now branches your organization’s own managed database copy (the one provisioned when you connect a database in the dashboard) at the trace’s capture time, so replays read your data as it was, never anyone else’s. Replay branches are cleaned up automatically after each run, with a background sweep catching anything an interrupted run leaves behind.

DashboardPlugins

Plugins v0.8.3

Token cost in experiments and the trace viewer

When an experiment is about reducing token usage, you can now see per-trace and per-span token counts alongside pass/fail, so you can tell whether a change actually cut cost as the results stream in. It shows up in the experiments comparison and the trace viewer, with the token figures styled distinctly from the green/red verdict so cost reads as a measurement, not another pass/fail signal. The Bitfab assistant turns it on automatically when an iteration is framed around cutting tokens or cost.

Plugins

Plugins v0.8.2

Guided setup for database branching in replay

A new /bitfab:setup db-branching flow walks you through replaying traces against your database as it was at trace time, instead of today’s data (TypeScript, Python, and Ruby). Connect a Postgres database once in the dashboard, and the flow polls until the branchable copy is ready, then wires a ReplayEnvironment into your replay scripts so each replayed item reads from a per-trace branch. A new connection-status check lets the flow tell you exactly when your database is connected and provisioned.

Dashboard

Reliable database connection setup

Connecting a source database for trace replay now completes reliably regardless of database size. Setup runs in the background in resumable stages and the dashboard’s connection status tracks it as it progresses, instead of timing out partway through on large databases. Failures (an unreachable database, an invalid connection string) now surface promptly as a failed connection with the underlying reason, rather than leaving the status stuck on checking.

TypeScript SDKPython SDK

TypeScript SDK v0.18.1, Python SDK v0.18.1, Plugins v0.8.1

Full token-usage capture for LangChain and LangGraph

LLM spans traced through the LangChain/LangGraph callback handler now record token usage from modern LangChain releases, which put counts on each message’s usage_metadata rather than the legacy llm_output.token_usage location. Streaming agents are covered too: usage is read from the final aggregated chunk. Provider-native shapes from OpenAI, Anthropic, and Google are recognized as fallbacks, and the legacy location keeps working.Cached prompt tokens are now reported as cachedInputTokens on the span, and Anthropic input counts include cache reads so they reflect the true prompt size (heavily cached agents will see input counts go up; that is the previous under-count being fixed). Only provider-reported numbers are recorded, nothing is estimated. For OpenAI streaming, enable stream_usage=True (Python) or stream_options: {"include_usage": true} (TypeScript) so usage arrives on the stream.

PluginsTypeScript SDKPython SDKDashboard

Plugins v0.8.0, TypeScript SDK v0.18.0, Python SDK v0.18.0

Replay for handler-instrumented workflows

Workflows traced through a framework handler (LangGraph, LangChain, Claude Agent SDK, OpenAI Agents) can now be replayed, even though they have no decorated root function in your code. Pass the handler’s trace function key plus any plain callable to replay(), and the SDK wraps it internally so every replayed run records a trace tied to the experiment:

handler = bitfab.get_langgraph_callback_handler("my-agent")

def replay_my_agent(state):
    return graph.invoke(state, config={"callbacks": [handler]})

result = bitfab.replay("my-agent", replay_my_agent, limit=10)

The same works in TypeScript with bitfab.replay("my-agent", fn, options), where plain callables are wrapped automatically. The setup and assistant plugin skills now offer handler instrumentation as a first-class option for workflows whose entry points take live objects (database handles, billing callbacks), and write the matching replay script for you. See “Replaying handler-instrumented functions” in the Python and TypeScript SDK docs.

Dashboard

Fixed database connector provisioning in production

Fixed a bug where connecting a database for trace snapshots failed in production before provisioning could start. Database connector setup from the dashboard now completes as expected.

Plugins

Plugins v0.7.18

More reliable Studio sessions

Studio now keeps a single, durable connection open for the whole assistant session. Moving between pages (dataset review, experiments, trace plans) reuses the open Studio window instead of opening extra background connections, so signals like marking a dataset done or ending a session are no longer missed. The live Studio updates in the assistant flow are dependable from start to finish, even when a session is reused from an earlier run.

TypeScript SDKPython SDKRuby SDK

TypeScript SDK v0.17.0, Python SDK v0.17.0, Ruby SDK v0.17.0

Replay a trace against its historical database state

The Python and Ruby SDKs can now replay a recorded trace against the database as it was when the trace ran. Pass a ReplayEnvironment to replay() and read its database URL inside your function; Bitfab resolves a per-trace database branch for each item and releases it when the item finishes. This brings Python and Ruby to parity with the TypeScript SDK.

env = ReplayEnvironment()

@client.span("lookup-user")
def lookup_user(user_id):
    db_url = env.database_url if env.active else LIVE_DATABASE_URL
    ...

client.replay(lookup_user, environment=env)

Every trace is now automatically pinned to its capture-time snapshot, so any trace can be replayed this way later, with no extra configuration.

TypeScript SDKPython SDKPlugins

TypeScript SDK v0.16.2, Python SDK v0.16.2, Plugins v0.7.16

Trace LangChain by its own name

The Bitfab callback handler has always traced plain LangChain chains as well as LangGraph graphs (both share the same callback system), but every entry point was named after LangGraph. The SDKs now expose LangChain-named aliases that return the identical handler: getLangChainCallbackHandler() in TypeScript and get_langchain_callback_handler() in Python, with the class also exported as BitfabLangChainCallbackHandler.

handler = bitfab.get_langchain_callback_handler("summarize-doc")
result = chain.invoke({"document": doc_text}, config={"callbacks": [handler]})

The plugin’s setup flow now recommends the LangChain-named methods when instrumenting a project that uses LangChain without LangGraph, and the framework docs include a plain-LangChain quick start.

Plugins

Plugins v0.7.15

Keep your app’s API key in sync when you switch organizations

When you switch organizations with the Bitfab plugin, it now offers to update your project’s local BITFAB_API_KEY as well. Previously the switch only repointed the plugin, so traces your own code sent kept landing in the old organization until you updated the key by hand. After you run /bitfab:setup switch-org (or /bitfab-setup switch-org in Cursor, $bitfab:setup switch-org in Codex), the agent finds every .env file that defines the key and, with your go-ahead, updates them in place.

Dashboard

Connection fields on the Integrations page no longer clear while you type

Entering a database connection string or integration secret on the Integrations page now stays put as you type. Browser password managers were treating these masked fields as login passwords and overwriting them mid-entry; they’re now opted out, so your input holds.

PluginsDashboard

Plugins v0.7.14

Switch organizations from your coding agent

You can now switch which Bitfab organization a plugin reads and writes without leaving your editor. Run /bitfab:setup switch-org (or /bitfab-setup switch-org in Cursor, $bitfab:setup switch-org in Codex): the agent lists the organizations you belong to, switches to the one you pick, and swaps the plugin’s API key to the new org. The agent can also call the new list_organizations tool on its own to check which org it is currently pointed at. Your already-open browser tabs keep showing their current org until the next time the plugin opens Studio.

Dashboard

Faster traces list

The traces list now loads significantly faster for functions with a large number of traces. Opening a function’s traces no longer waits on a full count of every matching trace before showing results, so the page appears as soon as the traces are ready.

PluginsDashboard

Plugins v0.7.13

When you run a Bitfab plugin command while signed out and a Studio window is already open, the plugin now reuses that window to sign you in instead of opening a second one. There’s only ever one Studio window now, where before a fresh window could appear and leave the original orphaned. Running login while you’re already signed in is also a no-op, and you can re-authenticate on demand by running login with --force.

PluginsDashboard

Plugins v0.7.12

See a dataset’s experiment history in Studio

You can now open the experiments page scoped to a single dataset. Ask the Bitfab assistant to “show experiments for a dataset” and Studio lists every experiment that replayed one of the dataset’s traces, giving you that dataset’s full run history in one place. Previously the experiments page opened only for specific test runs or an experiment group.

Plugins

Plugins v0.7.11

More reliable Studio session cleanup

Studio sessions now close themselves cleanly when an assistant or setup run finishes: the browser tab closes, background processes stop, and nothing is left running even if the window refuses to close. Recovering from an unreachable Studio window (“Open a new Studio”) also cleans up the old window and its background process instead of leaving them behind.

Dashboard

Accurate error indicators in the trace viewer

Spans with no recorded errors no longer show a false “Error detected” badge or an “Execution Error” section in the trace viewer. Error detection now correctly ignores empty error data, and a related fix ensures a real execution error can no longer be hidden by an empty error list.

Live updates arrive reliably on the hosted dashboard

Pages that update in real time, such as dataset review, trace lists, and experiment results, could miss updates on the hosted dashboard and only show new data after a manual refresh. Event delivery now completes reliably, so new traces, labels, template changes, and experiment results appear the moment they happen.

Plugins

Plugins v0.7.10

Replay a single trace to check your fix

The assistant has a new replay mode for the quickest version of the improvement loop: you already made a fix and just want to know whether one specific trace passes now. Run /bitfab:assistant replay <function-key> <trace-id> (or simply ask “did my fix work on <trace-id>?”) and the agent finds your replay script, re-runs that one trace through your current code, and reports a pass/fail verdict in chat. It skips everything heavier: no browser, no dataset, no labeling, and nothing is persisted, so it’s safe to run as often as you like while iterating.

TypeScript SDKPython SDKRuby SDKPlugins

TypeScript SDK v0.16.1, Python SDK v0.16.1, Ruby SDK v0.16.1, Plugins v0.7.9

Replay accepts limit and trace IDs together

Passing both limit and traceIds (trace_ids in Python and Ruby) to replay() no longer throws. The SDK now logs a warning and ignores limit, since an explicit trace ID list already determines how many traces replay. This applies to the TypeScript, Python, and Ruby SDKs, so replay scripts that forward both flags keep working instead of crashing.

TypeScript SDKPython SDKRuby SDKPlugins

TypeScript SDK v0.16.0, Python SDK v0.16.0, Ruby SDK v0.16.0, Plugins v0.7.7

Replay traces whose function signature changed

When you rename, reorder, or restructure a traced function’s arguments, replay can no longer feed it the inputs recorded against the old shape. replay() in the TypeScript, Python, and Ruby SDKs now takes an adaptInputs (adapt_inputs in Python and Ruby) hook that reshapes each trace’s recorded inputs onto the current signature, so older traces keep running.

await bitfab.replay("my-function", updatedFn, {
  adaptInputs: (inputs, ctx) => [{ userId: inputs[0], limit: inputs[1] }],
})

The hook runs per trace and is isolated per item: if one trace can’t be reshaped, that item alone reports the error and the rest of the run continues.

The assistant recovers replays broken by signature changes

When you iterate on a traced function in your coding agent and a replay fails because the signature drifted since the traces were captured, the Bitfab assistant now recognizes the mismatch instead of treating it as an environment error, and helps you write a small committed input adapter so those traces rejoin the run.

DashboardPlugins

Plugins v0.7.6

Spot traces that can’t be replayed

The trace viewer now flags traces that won’t replay against your current code, either because they captured no top-level span or because their recorded inputs no longer fit the function’s current signature. A “Can’t replay” badge appears on dataset rows and in the trace detail view, so you can see at a glance which traces a replay will actually cover before you run it.When your coding agent opens a dataset, the Bitfab plugins pass along your function’s current input shape, and the check runs live against it. Nothing is stored, so the badge can’t go stale.

TypeScript SDKPython SDKRuby SDKDashboardPlugins

TypeScript SDK v0.15.0, Python SDK v0.15.0, Ruby SDK v0.15.0, Plugins v0.7.5

Replay results now persist reliably

replay() in the TypeScript, Python, and Ruby SDKs now waits for each replayed item’s trace to be fully persisted before completing the test run. item.traceId (trace_id in Python and Ruby) is a real server trace ID you can use immediately. Previously, a race could leave every trace ID null and the experiments page empty, even though the replay appeared to succeed.Failures are no longer silent. If none of the replayed items’ traces reached the server (for example, the replayed function isn’t instrumented), replay() raises an error explaining why. If only some items fail to persist, those items return a null trace ID with a logged error and the rest of the run comes back intact, so one bad trace costs you one data point instead of the whole run.

Plugins

Plugins v0.7.4

Clearer skill routing in plugin flows

The setup, assistant, and update skills now state exactly where each choice leads (“Update all → step 7”, “Skip → stop”), so coding agents follow multi-step flows more reliably instead of inferring the wiring from prose. The update skill on Claude Code also runs as chained sub-skills: each phase hands off directly to the next with the invocation mode attached, removing a class of lost-context routing mistakes in long sessions.

DashboardPlugins

Plugins v0.7.3

Live dataset review in Studio

Dataset review now always happens on the dataset’s own page in Studio, which updates in real time as your coding agent adds traces and applies labels. Previously the agent could leave you on a function-level review page that only showed new activity after a manual refresh.That older function-level page now redirects to the function’s most recent dataset, so existing links and older plugin versions keep working.

TypeScript SDKPython SDKRuby SDKDashboardPlugins

TypeScript SDK v0.14.0, Python SDK v0.14.0, Ruby SDK v0.14.0, Plugins v0.7.2

Replay by trace IDs no longer truncates

Replaying specific traces by ID previously capped the list at the default limit, silently dropping the rest of your selection: 12 IDs in could mean only 5 replayed, skewing experiment results without warning. An explicit ID list now always replays every trace in it (up to 100).limit and traceIds are now mutually exclusive: limit means “replay my last N traces”, and an ID list speaks for itself. Passing both raises a clear error instead of guessing.

// Replay your last 10 traces
await bitfab.replay("my-function", fn, { limit: 10 })

// Replay exactly these traces (up to 100)
await bitfab.replay("my-function", fn, { traceIds: ["id-1", "id-2"] })

Available in the TypeScript, Python, and Ruby SDKs v0.14.0, with matching trace_ids semantics in Python and Ruby. Older SDK versions get the core fix server-side: explicit ID lists are no longer truncated by a default limit.

PluginsDashboard

Plugins v0.7.1

No more duplicate Studio windows

Your coding agent now keeps exact track of its Studio window. Ending a session no longer makes the next Studio command open a second window while the old one lingers: the existing window is reused, and the agent only forgets a window once the browser confirms it actually closed. Refreshing the Studio page mid-session is also safe; the connection re-establishes itself instead of being mistaken for a close.Commands that reconnect to an already-open Studio window now react only to what happens after they connect, so a previously ended session or an earlier run’s activity can no longer end a new command prematurely. If a Studio window disappears without a trace (for example the browser quit entirely), the agent detects that it is unreachable and offers to reopen instead of guessing.

Plugins

Plugins v0.7.0

Watch benchmark runs live in Studio

The assistant skill’s benchmark mode is terminal-only by default, but you can now add the studio keyword (for example, benchmark <function> studio, or just ask in natural language to “open studio”) to open Studio’s experiments page and watch each trace’s pass/fail verdict stream in as the replay runs. The default stays terminal-only, so existing benchmark runs are unchanged unless you opt in.

Dashboard

Traces are flagged errored only when your code fails

A trace is now marked as errored only when your traced code throws, not when the Bitfab SDK hits a serialization or ingestion error while recording the trace. The error indicator in the trace list now reflects failures in your own functions, so SDK-side noise no longer surfaces as a failed trace.

Dashboard

Errored traces open on the failing span

When you open a trace that recorded an error, the trace viewer now jumps straight to the first span that failed instead of starting on the trace root. You land on the error and its message right away, without scanning the span tree for the red marker. Traces without errors open exactly as before, and any span you deep-link to still takes precedence.

Dashboard

Errored spans highlighted in the trace viewer

Spans that recorded an error are now flagged directly in the trace viewer. The span tree marks failed spans in red, and the span header shows an Error tag with the error message on hover, so you can spot failures in a trace without opening each span.

Python SDKRuby SDK

Python SDK v0.13.5, Ruby SDK v0.12.5

Replay keeps going when an individual trace fails

The Python and Ruby SDKs now isolate per-trace errors during replay(): if one historical trace fails to load or its function raises, that result is marked with an error and the rest of the run still completes, instead of the whole replay aborting. This matches the TypeScript SDK’s behavior, so a single bad trace no longer costs you the entire run.

Dashboard

Copy buttons in custom span templates

Custom span templates can now drop in a clipboard icon button next to any field value with {{ value | copyButton | safe }}. Pass a string for the tooltip and accessible label, e.g. {{ span.id | copyButton("Copy span id") | safe }}. The button copies the value (objects and arrays are pretty-printed as JSON), flashes “Copied”, and works inside the template’s isolated shadow DOM without any extra JavaScript or CSS in the template.

PluginsDashboard

Plugins v0.6.75

When your coding agent opens Bitfab Studio and you sign in, the Studio tab now connects reliably on the first try. Previously a timing issue could leave a freshly opened session showing “Could not connect to this session” until you re-ran the command; signing in now hands off cleanly to the live session every time.

TypeScript SDKPython SDKRuby SDKGo SDKPluginsDashboard

TypeScript SDK v0.13.8, Python SDK v0.13.4, Ruby SDK v0.12.4, Go SDK v0.10.2, Plugins v0.6.74

Error source classification on spans

Span errors now carry an explicit source tag so you can tell whether an error came from your code or from the SDK itself. When your traced function throws, the error is recorded with source: "code". SDK-internal failures (like serialization errors) are tagged source: "sdk". Both types appear in the unified errors field on the span, replacing the previous split between span_data.error and the errors column.All four SDKs support this: set error_source: "code" automatically when a traced function fails.

const result = await bitfab.span("my-function", async () => {
  throw new Error("something went wrong")
  // error_source: "code" is set automatically
})

When the same span is sent multiple times (for example, during retries), errors from all payloads are merged and deduplicated rather than replaced.

Dashboard

Connect a database for per-trace snapshots

You can now connect your Postgres database from the new Database page so Bitfab can take an isolated snapshot per trace at replay time. Paste a connection string and activate: replays run against a fresh branch of your database, so they never touch or slow down production. The page shows live status (activating, active, or failed, with a support contact if it can’t reach your database), and you can deactivate at any time from a confirmation dialog.

Quickly add a trace to a dataset

The /bitfab:assistant command has a new lightweight add-trace mode that attaches one or more existing traces to a dataset and stops, without the full label-and-iterate flow. Run /bitfab:assistant add-trace <trace-id> (the function key is inferred from the trace) or just ask your coding agent to “add this trace to a dataset”. It picks or creates the right dataset for you, and if you point it at several traces it makes sure they all belong to the same function before attaching.

Plugins

Plugins v0.6.72

Diagnose your tracing setup with `/bitfab:setup inspect`

A new inspect mode checks whether your Bitfab tracing is healthy: whether you’re authenticated, what’s instrumented in this repo, whether the plugin and SDK are up to date, whether your replay scripts cover every trace function, and whether traces are actually arriving. It then walks the available fixes one at a time, asking before each change. Run /bitfab:setup inspect, or just ask your coding agent something like “why aren’t my traces showing up?”.

Get oriented with `/bitfab:setup explain`

A new read-only explain mode prints a quick overview of what Bitfab is and what each setup mode does, without authenticating or scanning your code. Run /bitfab:setup explain or ask “what is Bitfab?”.

Plugins

Plugins v0.6.71

Smarter Studio window reuse

The Bitfab plugin now reuses your existing Studio window whenever it’s still open, instead of risking a duplicate window or a stale “not responding” prompt. Close the Studio tab and the next action opens a fresh window right away; only a window that’s genuinely unreachable, like after a crash or your machine sleeping, will ask whether to retry or open a new one.

PluginsDashboard

Plugins v0.6.70

Clearer Studio connection status

Studio’s connection indicator now tells you exactly what’s happening: “Studio connected” when your agent is live, “Awaiting agent” while it’s away, and “Studio disconnected” if the browser loses its live connection to the session (it reconnects automatically). Reconnecting to a session you already have open now reuses that Studio tab instead of opening a second window.

Plugins

Plugins v0.6.69

Benchmark a dataset against your current code

Run /bitfab:assistant benchmark <key> to replay a labeled dataset against your current code without changing anything, then read a pass/fail scorecard that shows which traces still pass, still fail, regressed, or were fixed. Use it to measure where your function stands right now, as a regression baseline or a quick check after unrelated work, instead of starting an experiment loop. You can also just say “benchmark my dataset” in plain language and the assistant routes there.

PluginsDashboard

Plugins v0.6.67

More reliable Studio sessions

Studio no longer shows a stray “agent disconnected” popup on the session-complete page. When your agent disconnects mid-session, the reconnect prompt now rejoins your existing Studio session instead of starting a new one. And if you run more than one coding agent in the same project, each keeps its own Studio session instead of overwriting the other’s.

Plugins

Plugins v0.6.66

When you sign in to Bitfab from your coding agent and the browser doesn’t open automatically, the login flow now prints the sign-in URL so you can open it manually. Previously it told you to visit the URL without showing one.

Plugins

Plugins v0.6.65

The setup login flow now uses a single authentication method that works everywhere, including SSH sessions, containers, and cloud IDEs. The separate login headless mode has been removed since the standard login already handles these environments automatically via its server-polled channel.

Plugins

Plugins v0.6.63

Running /bitfab:assistant without being authenticated now logs you in inline through Studio instead of stopping the flow. Previously, unauthenticated users were told to run a separate login command first, breaking the workflow. The assistant now opens Studio’s sign-in page directly and continues automatically once you’ve signed in.

PluginsDashboard

Plugins v0.6.60

The bitfab init login flow now completes reliably instead of hanging after sign-in. Previously, the published CLI used a query parameter the close page didn’t recognize, so the authentication callback never fired and the CLI waited indefinitely. The close page also now displays “Login complete” instead of the generic “Session Complete” message.

PluginsDashboard

Plugins v0.6.58

Experiment results show label annotations

The experiments page now displays the label annotation for each trace instead of the raw function output. This matches how the dataset page already renders traces and makes it easier to scan experiment results for what passed, what failed, and why. Annotations from both human reviewers and the agent’s automated labeling are shown.

Replay script upgrades no longer interrupt the assistant flow

When the assistant detects that your replay script needs an upgrade (missing code-change or experiment-group support), it now edits the script directly instead of launching a separate setup flow. Previously, this would break the assistant’s continuity and drop you to an empty prompt. The upgrade happens inline and the experiment flow continues automatically.

Dashboard

Faster disconnect detection in Studio

When an agent closes its Studio session, the browser now shows “disconnected” within milliseconds instead of up to 60 seconds. Previously, the connection status indicator could display “Agent connected” long after the agent process had exited because the server-side heartbeat lingered in cache.Opening a Studio page without being signed in now redirects you to the sign-in page instead of showing a “Could not connect to this session” error. After signing in, you’re returned to the page you originally requested with the session intact.

PluginsDashboard

Plugins v0.6.56

Plugin commands that open Studio pages (experiments, trace plans, datasets, template previews) no longer fail with “Not authenticated” when you haven’t logged in yet. Instead, they open Studio directly and redirect you to the sign-in page. After you sign in, you land on the page the command originally requested. For interactive commands like trace plan confirmation and dataset review, the plugin saves your credentials automatically so the bidirectional event channel works normally after login.

Plugins

Plugins v0.6.52

Automatic replay script capability detection

The assistant now checks whether your replay script supports the latest experiment features before running experiments. If your script is missing support for code diffs, experiment groups, or trace ID tracking, the assistant offers to upgrade your SDK and regenerate the script in place. You can also choose to continue without the missing features. This replaces the previous behavior where outdated scripts would silently skip features or produce incomplete experiment results.

Plugins

Plugins v0.6.51

Dataset mode now continues through failure diagnosis and experiments

Fixed a bug where the assistant’s dataset mode could stop after building the dataset instead of continuing to diagnose failures and run experiments. The flow’s internal instructions contradicted its routing in three places, which could cause the agent to exit early. Dataset mode now reliably progresses through the full pipeline: build dataset, diagnose failures, iterate with experiments, and wrap up.

Dashboard

Clear error messages when Studio can’t connect to a session

When Studio fails to connect to an agent session or switch to the correct organization, it now shows a clear error message instead of silently loading in the wrong context. This prevents the confusing state where traces appear missing because Studio was looking in a different organization than the one your plugin authenticated against.

DashboardPluginsTypeScript SDKPython SDKRuby SDK

Plugins v0.6.50, TypeScript SDK v0.13.6, Python SDK v0.13.3, Ruby SDK v0.12.3

Live experiment streaming

The experiments page now streams results in real time as replays complete. When the assistant runs experiments, it opens the experiments viewer before the first replay starts and new results appear automatically via server-sent events as each test run finishes. Previously, the experiments page only opened after all replays completed.All three SDKs now accept an experimentGroupId parameter on replay() that groups multiple test runs into a single experiment batch:

// TypeScript
await client.replay("my-function", traceIds, {
  experimentGroupId: "550e8400-e29b-41d4-a716-446655440000",
})

# Python
await client.replay("my-function", trace_ids,
    experiment_group_id="550e8400-e29b-41d4-a716-446655440000")

# Ruby
client.replay("my-function", trace_ids,
  experiment_group_id: "550e8400-e29b-41d4-a716-446655440000")

The experiments page accepts ?experimentGroupId= as a query parameter, and falls back to ?testRunIds= for replay scripts that haven’t been updated yet.

Plugins

Plugins v0.6.49

Inline template editing during labeling

You can now edit trace view templates in chat while labeling traces, without leaving the dataset review page. When you ask the assistant to change how a span type renders (e.g. “edit the LLM template”), it reads and updates the template inline using MCP tools, and the dataset page re-renders automatically. Previously, template editing required invoking the setup flow, which navigated Studio away from the dataset and broke the labeling session.

Dashboard

Auto-update trace plan in Studio

When you modify your span capture setup and create a new trace plan, Studio now automatically navigates to the updated plan. Previously, you had to manually refresh or re-navigate to see the latest version after making changes to the capture configuration.

Plugins

Plugins v0.6.48

Investigate mode continues through the full pipeline

When you run the assistant in investigate mode, the flow now continues through diagnosis and experiments after building a dataset, matching how dataset mode already works. Previously, investigate mode stopped after dataset building, requiring you to restart in experiment mode to iterate on fixes. All assistant modes now serve as entry points into the same pipeline, converging at wrap-up regardless of where they start.

Plugins

Plugins v0.6.47

Graceful Studio close on flow exit

When the assistant finishes a flow (wrap-up, early stop, or sub-mode completion like dataset-only), the Studio tab now closes gracefully instead of lingering in a disconnected state. The agent navigates to a close route that lets the Studio clean up before the background process is terminated.

Better error messages when agents pass wrong CLI arguments

Plugin commands now validate arguments against a declarative schema before executing, catching common agent mistakes like invented --flag syntax, missing arguments, or invalid UUIDs. When a command receives bad input, it prints a usage string showing the expected arguments and a clear error message, so the agent can self-correct on the next attempt.

Dashboard

Reconnect guidance when Studio loses agent connection

When your coding agent disconnects from Studio, a popup now appears after 30 seconds with a copiable prompt you can paste into your agent to reconnect. The popup includes your session ID so the agent can rejoin the same session. If you dismiss it, the popup reappears with increasing intervals, and it auto-dismisses if the agent reconnects on its own.

Plugins

Plugins v0.6.46

Dataset mode continues to diagnosis and experiments

When you run /bitfab:assistant dataset, the flow now continues past labeling into Phase 4 (diagnosis and experiments), matching the behavior of the full /bitfab:assistant flow. Previously, dataset mode stopped after labeling, requiring you to restart in the default mode to run experiments on the same dataset.

PluginsDashboard

Plugins v0.6.45

Live experiment verdicts in Studio

When your replay script returns trace IDs (requires SDK v0.13.5+), the assistant now opens the experiments page in Studio before running evaluations, so you can watch pass/fail verdicts populate in real time. If your SDK predates trace ID support, the assistant prompts you to update and falls back to showing evaluation results as text in the agent.

Plugins

Plugins v0.6.43

Code-change diffs in the experiment viewer

When the assistant runs experiments, it now captures before-and-after file snapshots for every edit and attaches them to the replay. The experiment viewer can then display the literal code change alongside pass/fail results, so you can see exactly what was tried in each iteration. Existing replay scripts that predate this feature continue to work; the assistant detects whether the script supports the new --code-change flag and gracefully skips the metadata if it doesn’t.

TypeScript SDKPython SDKRuby SDKDashboard

TypeScript SDK v0.13.5, Python SDK v0.13.2, Ruby SDK v0.12.2, Plugins v0.6.42

Replay results include trace IDs

Each replay result item now includes a traceId (or trace_id in Python/Ruby) that links directly to the server-side trace created during replay. Previously, matching a replay result back to its trace in the dashboard required heuristics based on input similarity. Now you can navigate straight to the trace.

const result = await bitfab.replay("my-function-key", myFunction, {
  limit: 5,
});
for (const item of result.items) {
  console.log(`Trace: ${item.traceId}`);
}

Available in TypeScript SDK v0.13.5, Python SDK v0.13.2, and Ruby SDK v0.12.2.

DashboardPlugins

Plugins v0.6.39

Experiments distinguish errored traces from pending

When a replay trace fails during execution (before it can be graded), the experiments page now shows it as “errored” instead of lumping it in with pending traces. Errored traces appear with a red “Error” badge, a red row tint, and a dedicated segment in the progress bar. The pass-rate pill also now shows context-appropriate states: “N errored” when all traces errored, “N pending” with a spinner when grading is in progress, and “X/Y so far” for partially graded runs.

Plugins

Plugins v0.6.38

Assistant reviews existing dataset traces before searching for new ones

When you pick a dataset that already has traces, the assistant now goes straight to the review page instead of asking how to source new candidates. Previously, datasets with unlabeled (but present) traces were treated like empty datasets, which skipped past the traces you already had. Empty datasets still get the “what kind of traces should I find?” prompt as before.

Dashboard

Experiments page updates after replay completion

The experiments page now refreshes automatically when a replay finishes and as labels are applied. Previously, completing a replay didn’t trigger an update, so the page could appear empty until manually refreshed. Summary counts and individual trace labels now appear as soon as they’re available.

Plugins

Plugins v0.6.37

Studio session recovery after agent interruptions

The plugin assistant now automatically reconnects to an existing Studio browser session when the background polling process is interrupted (for example, by a long conversation triggering context compaction). Previously, losing the background process meant the assistant would open a duplicate Studio window. Now it resumes the existing session seamlessly, keeping the same browser tab and session state.

PluginsDashboard

Plugins v0.6.35

Experiment and dataset tools for the assistant flow

The plugin assistant can now work with experiments and datasets directly. Two new MCP tools, list_experiments and get_experiment_traces, let the assistant list recent experiments for a function and drill into individual trace verdicts (fixed, regressed, still-passing, still-failing). The search_traces tool also now accepts testRunId and datasetId parameters, so you can scope trace searches to a specific experiment run or dataset without manual filtering.

Plugins

Plugins v0.6.34

Fixed investigate mode opening a broken Studio page

Running /bitfab:assistant investigate <key> no longer opens Studio to a non-existent page. The investigate mode now lands on the Studio root, which loads correctly. The investigation itself (trace reading, code exploration, findings summary) was unaffected since it runs via tool calls, not Studio navigation.

PluginsDashboard

Plugins v0.6.33

Trace plan links now open correctly in Studio

Clicking a trace plan link from the plugin assistant now opens the trace plan inside the Studio session. Previously, the link pointed to a standalone route outside of Studio, which bypassed the authenticated Studio context.

Dashboard

Experiment results stream in real time

Running experiments on a dataset no longer blocks until every test finishes. The experiments page opens immediately and results stream in trace by trace: the progress bar fills, pass/fail counts update, and trace rows appear as each test completes. If background execution fails, the run is marked as failed and the page updates accordingly instead of getting stuck on “pending.”

Dashboard

Trace list loads reliably for high-volume functions

The traces page now loads correctly on the first visit for functions with very high trace creation rates. Previously, real-time update events could interfere with the initial page load, causing the trace list to appear empty until you navigated away and back.

PluginsDashboard

Plugins v0.6.32

Studio navigations that include query parameters (such as opening the experiments page with specific test run IDs) no longer time out with “not responding.” The same fix also ensures that navigating to the same page with different parameters is recognized correctly, so the assistant flow proceeds without interruption.

Dashboard

Dataset traces now stream in real-time in Studio

Traces added to a dataset while viewing it in Studio now appear immediately without a page refresh. A recent migration to Studio’s route tree accidentally dropped the real-time event connection, so newly added traces were invisible until you reloaded.

Plugins

Plugins v0.6.29

The plugin assistant’s experiment mode now correctly opens the experiments page. Previously, it attempted to navigate to a non-existent per-function experiments route.

Dashboard

Studio reconnects automatically after sleep

Studio connections now recover automatically when your laptop wakes from sleep. Previously, closing your lid and reopening could leave the session unresponsive until you manually refreshed the page.

Dashboard

Dataset page shows errors instead of misleading empty state

The dataset review page now displays a clear error message when traces fail to load, instead of incorrectly showing “No traces yet.” This helps you quickly identify loading failures, such as viewing a dataset while signed into the wrong organization.

TypeScript SDKDashboardPlugins

TypeScript SDK v0.13.4, Plugins v0.6.27

Per-trace DB branching for replay (alpha)

Replay can now run against the database state at the moment a trace was recorded, not your current production database. This raises fidelity for agents whose behavior depends on stored state, like a refund decision that read a since-cancelled order or a retrieval agent that saw last week’s index. Available in the TypeScript SDK; backed by Neon preview branches on the Bitfab service.Wire it up:

import { Bitfab } from "bitfab"

const bitfab = new Bitfab({
  apiKey: process.env.BITFAB_API_KEY,
  dbSnapshot: { provider: "neon" },
})

const env = new Bitfab.ReplayEnvironment()

await bitfab.replay("refund-agent", async (orderId) => {
  const url = env.active ? env.databaseUrl : process.env.DATABASE_URL
  const pool = new Pool({ connectionString: url })
  return decideRefund(orderId, pool)
}, { environment: env })

Alpha caveats. Single-tenant in this release; per-org Neon project configuration and the Ardent customer-LSN to replica-LSN mapping land in a follow-up. Reach out if you want it turned on for your account.

Dashboard

Studio sessions survive page refreshes

Refreshing the Studio tab no longer kills your coding agent’s session. Previously, a browser refresh was indistinguishable from closing the tab, so the plugin would immediately tear down the connection. Now the plugin waits up to 10 seconds for the page to reload before ending the session, so you can refresh freely without interrupting your workflow.

Plugins

Plugins v0.6.26

When your project has a .bitfab/credentials.local.json file (used to isolate credentials per project), logging in now writes the new API key to that file instead of the global credentials store. Previously, login always wrote to ~/.config/bitfab/credentials.json, which meant the project-local file stayed empty and the plugin fell back to the global key, defeating the isolation.

PluginsDashboard

Plugins v0.6.25

Studio auto-switches org to match the plugin session

When your browser’s active org differs from the org bound to the plugin’s API key, Studio now detects the mismatch and automatically switches to the correct org on load. This fixes the “Awaiting agent” stuck state that could occur when you belong to multiple organizations and your browser happened to be on a different one than your plugin.If you’re not a member of the session’s org, the plugin now cleanly aborts and tells you why, instead of hanging indefinitely.

Dashboard

Studio agent connection recovers after laptop sleep

Studio now correctly restores the agent connection indicator after your laptop sleeps and wakes. Previously, the “agent disconnected” banner could get stuck even though the agent had successfully reconnected. The fix ensures heartbeat tracking refreshes on every poll and that the browser verifies the actual connection state when resuming after a gap.

PluginsTypeScript SDK

Plugins v0.6.23, TypeScript SDK v0.13.3

TypeScript SDK is now `@bitfab/sdk`

The TypeScript SDK package has moved from bitfab to @bitfab/sdk. New installs should use the scoped name:

npm install @bitfab/sdk

import { Bitfab } from "@bitfab/sdk";

The old bitfab package continues to work but now prints a deprecation warning on import. Running the plugin’s update command (/bitfab:update) detects the legacy package and warns you to switch, even if the version number is current.

PluginsDashboard

Plugins v0.6.21

Studio is now the single browser surface for all plugin flows

Every plugin CLI flow (login, trace plan confirmation, dataset review, template preview) now opens inside Studio instead of launching a separate browser window. If Studio is already open, the plugin navigates it in place rather than opening a new window. This means fewer browser tabs, a consistent UI, and the ability to stay in one window while working with the assistant.Headless login is now available for environments where a browser can’t reach your terminal (SSH, cloud IDEs, CI). Visit /studio/auth/claude in any browser, sign in, copy the token, and paste it back into your coding agent.Closing Studio during a trace plan confirmation now cleanly cancels the operation instead of leaving the CLI in an error state.

PluginsPlugin Lib

Plugins v0.6.20

Automatic package rename in update flow

Running /bitfab:update now detects the legacy bitfab npm package and offers to switch it to @bitfab/sdk. The update flow removes the old package, installs the new one, and rewrites imports in your source files. If you’re already on @bitfab/sdk, nothing changes; the flow works as before.

Plugins

Plugins v0.6.19

Studio URL guard and auth verification

Studio now always opens at the correct /studio path. Previously, certain launch conditions could cause the Studio window to open at the site root instead of the Studio interface. A path guard now normalizes the URL before the browser window opens.The plugin’s auth status check now verifies your API key against the current server. If you switch between servers (e.g., local development to production), the status command correctly reports that re-authentication is needed instead of showing a stale “authenticated” state.

DashboardPlugins

Plugins v0.6.18

Live-streaming dataset pages in Studio

When the Studio assistant creates or picks a dataset, the dataset review page now opens immediately instead of waiting for all traces to be labeled and attached first. Traces appear on the page in real time as the agent finds, labels, and attaches them. Label changes on traces already in a dataset also update live, so you can watch rows move between the “Agent labeled,” “Labeled,” and “Unlabeled” sections without refreshing.If the page is empty while the agent is still working, a “Building your dataset” indicator shows that traces are on the way.

Dashboard

Trace plan review page scrolls

The trace plan review page now scrolls when a plan has more captured nodes than fit on screen. Before, long plans clipped the Advanced selection toggle and any inline error messages below the fold; opening a plan with around 40 captured nodes now scrolls cleanly through the full call tree.

TypeScript SDKPlugins

TypeScript SDK v0.13.2, Plugins v0.6.16

TypeScript SDK available as @bitfab/sdk

The TypeScript SDK is now published under the scoped package name @bitfab/sdk in addition to the existing bitfab package. Both names resolve to the same code and will stay in sync on every release. If you prefer scoped package names for clarity in your package.json, you can switch your import at any time:

npm install @bitfab/sdk

No code changes are required beyond updating the package name in your imports. Both import { Bitfab } from "bitfab" and import { Bitfab } from "@bitfab/sdk" work identically.

TypeScript SDKPython SDKRuby SDKPlugins

TypeScript SDK v0.13.1, Python SDK v0.13.1, Ruby SDK v0.12.1, Plugins v0.6.14

SDK serialization hardening

Trace spans now ship reliably even when function inputs or outputs are difficult to serialize. Objects with circular references, oversized payloads (over 512 KB), or classes that throw during serialization no longer cause lost spans. Instead, the SDK replaces the problematic value with a descriptive <unserializable: ClassName (reason)> stub so the span still appears in your traces with full timing and metadata.This fix applies to the TypeScript, Python, and Ruby SDKs. No code changes are needed on your side; update to the latest SDK version to get the improvement automatically.

PluginsDashboardPlugin Lib

Plugins v0.6.13

Plugin can query Studio browser state

Plugins can now check whether the Studio browser tab is connected and which page is currently active via the new getStudioState function. This lets the plugin make smarter decisions before navigating, for example skipping a navigation command when Studio is already on the target page, or surfacing a connection warning when the browser tab has been closed.

PluginsDashboard

Plugins v0.6.12

Studio now shows a recoverable error screen when a runtime error or unexpected crash occurs during a session. Instead of a blank page or a full 404, you see a “Try again” button that retries without losing your session context.Agent-initiated navigation is now validated against a known route whitelist. When an agent tries to navigate to an invalid or out-of-scope path, it receives an immediate navigation-blocked event with a reason string instead of waiting for a 12-second timeout. This helps agents self-correct faster when a requested page doesn’t exist.

PluginsPlugin Lib

Plugins v0.6.11

Studio stays connected through long conversations

The Bitfab plugin now persists the link between your coding agent conversation and your Studio session. Previously, when a long conversation triggered context compaction, the agent lost track of which Studio window it had opened, requiring you to reopen Studio manually. Now the mapping is written to disk and recovered automatically after compaction, so Studio commands continue working seamlessly in extended sessions.

Dashboard

Code change diffs in experiments

When an experiment replays traces against a code change, you can now view the exact diff that was tested. Click the file stats on any experiment card or the code-change pill in the trace detail header to open a side-by-side diff modal. The modal also shows how the dataset reacted overall (fixed, regressed, still passing, still failing) or, when opened from a single trace, whether that specific trace flipped.

Dashboard

Signing in to a Bitfab plugin (Claude Code, Cursor, or Codex) via the browser now reliably completes the automatic handoff back to your terminal. Previously, the login page could lose the callback parameters during a redirect, causing every login to fall through to the manual “copy and paste this token” flow even when the browser and terminal were on the same machine.

Plugins

Plugins v0.6.9

Investigate a trace function with /bitfab:assistant

Run /bitfab:assistant investigate [<key>] to characterize an issue in a trace function without going through the full assistant flow. The agent reads recent traces and your code based on what you describe, then offers three follow-ups: stop with an in-chat summary, save a written report under .bitfab/analysis/, or hand off to dataset building when the findings include reproducible failures worth labeling. The function key is optional; when omitted, the agent picks it from your description or asks.

PluginsDashboard

Plugins v0.6.8

Agent-initiated Studio session close

Agents can now programmatically end a Studio session when their work is complete. The closeStudio() helper sends a completion event with an optional message, and the browser automatically closes or shows a “Session Complete” screen with the agent’s message. This replaces the need for users to manually click “End session” when the agent is done.

PluginsDashboard

Plugins v0.6.7

Trace plan confirmation lands inside Studio

The Bitfab plugin’s trace-plan confirmation page (where you review which spans your function will capture) now renders inside your existing Studio tab during /bitfab:setup instead of spawning a second browser window. Studio’s header and agent indicator stay visible while you decide; Confirm or Cancel keeps the tab open for the rest of the flow, no more orphan windows.If no Studio is running (you invoked /bitfab:setup outside an /bitfab:assistant session), the confirmation falls back to the standalone chromeless window as before.

TypeScript SDKPython SDK

TypeScript SDK v0.13.0, Python SDK v0.13.0

Annotate a closed trace from any process

The TypeScript and Python SDKs now expose a detached client.getTrace(id) handle that lets you add context, merge metadata, or set the session id on a trace after its root span has closed. The handle works from any process, thread, or agent that knows the trace id, with no shared in-memory state. Useful when a downstream worker or a forked AI agent needs to attach information to the original conversation’s trace.

// From any process, by trace id:
const trace = client.getTrace(traceId);
await trace.addContext({ refund_status: "approved" });
await trace.setMetadata({ region: "us-west" });
await trace.setSessionId("session_xyz");

Available in bitfab v0.13.0 for TypeScript and Python.

Plugins

Plugins v0.6.6

Smarter Studio session management

The assistant flow now reuses an existing Studio session instead of opening a new browser window each time. If the Studio becomes unresponsive (tab closed, page crashed), the agent detects this within 12 seconds and offers options to refresh the tab or open a fresh session.

PluginsDashboard

Plugins v0.6.5

Live activity progress in Studio

Studio now shows which phase the assistant is working on in real time. As the skill progresses through steps like identifying the trace function, building a dataset, or running experiments, the header displays the active phase name with a live elapsed timer. When one phase completes and the next begins, you see the previous phase’s duration before it transitions.If the agent disconnects or crashes mid-phase, the activity indicator automatically resets within 30 seconds instead of showing stale state indefinitely.

PluginsDashboard

Plugins v0.6.4

Open a trace plan from inside `/bitfab:assistant`

Ask the assistant to “open the trace plan for X” (or “show me what’s captured”) and it now routes your open Studio tab to that function’s most recent trace plan in place. The Studio shell stays mounted around the plan, so your agent session, header, and connection indicator persist across the navigation, and no new browser tab pops up. The canonical /trace-plan/[id] URL still works as a standalone shareable link outside Studio.

Dashboard

Click any trace while reviewing a dataset

Fixed a bug that blocked clicks while a trace detail was open. You can now switch traces or press Done without closing the open one first.

DashboardPlugins

Plugins v0.6.2

Redesigned trace planner

The trace planner now leads with what you actually need to know: a validation summary at the top that calls out anything blocking replay (live writes inside captured spans, missing samples, disconnected roots), then a flow diagram of the captured spans and a sample-trace preview of how the recorded trace will look in the viewer. The legacy two-pane tree picker is still there, tucked behind an Advanced selection toggle for power-users. Confirm and Cancel still flow through the same Cmd+Enter / Esc handoff, so muscle memory carries over.

PluginsDashboard

Plugins v0.6.1

Live agent connection indicator in Studio

Studio now shows a real-time connection status in the header. A green dot with “Agent connected” appears when the coding agent is actively polling, and transitions to a gray dot with “Awaiting agent” if the agent disconnects. The indicator updates instantly when the agent reconnects, with no page refresh needed.

Mutual presence detection for plugins

The agent plugin now receives browserConnected in its poll response, indicating whether a user has Studio open in the browser. This enables plugins to adapt their behavior based on whether someone is actively watching the session.

Plugins

Plugins v0.6.0

Studio is now the default assistant mode

The /assistant skill now opens Studio automatically on every invocation. You no longer need to pass a studio argument to get the companion browser surface. Studio is always there, from start to finish.

Studio opens directly at the relevant page

When you start in dataset or experiment mode (/assistant dataset <key> or /assistant experiment <key>), Studio now opens directly at that function’s datasets or experiments page instead of opening at the root and navigating after. This shaves a few seconds off each focused session and puts you in context immediately.

Dashboard

Signing out no longer lands on a blank page. You’re now redirected to the sign-in page, where you can immediately log back in or close the tab.

Plugins

Plugins v0.5.17

Session log capture fix and standalone opt-in

Session log capture now works correctly after opting in during setup. A configuration mismatch previously caused the plugin to silently skip session capture even when you’d consented, so no session data was being collected. You can also now toggle session log capture on or off by running /bitfab:setup session-logs, a standalone mode that doesn’t require authentication.

Dashboard

Trace viewer skips empty spans on open

Opening any trace now lands on the first span that has data instead of a blank trace root or an empty span. This applies across the dashboard: trace detail pages, the labeling panel, the experiments comparison view, the dataset detail panel, and the template preview studio. The hard template filter still hides non-matching spans; you just no longer have to scroll past empty ones to see meaningful content.

Dashboard

When you navigate between pages in Studio, the coding agent now receives real-time navigation events with the current path. This gives the agent immediate awareness of where you are in Studio, so it can tailor its responses and actions to the page you’re viewing without needing to ask.

PluginsDashboard

Plugins v0.5.16

When your coding agent opens Studio and you’re not signed in, you now see a branded sign-in page inside the Studio window instead of being redirected to the main Bitfab login. The session context persists across the sign-in flow, so the agent picks up exactly where it left off once you authenticate. The CLI receives real-time auth-required and authenticated events, letting it wait for sign-in without polling.

Plugins

Plugins v0.5.15

Accurate offline SDK update checks

The plugin’s session-start update check now always reports the correct latest SDK versions. Previously the baked version snapshot could lag behind by one release, causing the plugin to miss update notifications or report you were up to date when a newer SDK was available.

Plugins

Plugins v0.5.14

Hill-climb from existing labels in /bitfab:assistant

When you start a new dataset for a function that already has validated labels, the assistant now offers a Reuse option that seeds the dataset with those labels instead of starting from scratch. Pick Reuse when you’re spinning up a different cut for experimentation but want to keep the labeling work you already trust. Define and Open are still there for the from-scratch and broad-sample cases.

Replay verdicts persist with a coverage gate

After a replay in Phase 5, the assistant writes its pass/fail verdicts on the replay traces through a bundled script that verifies every replay trace got a verdict before moving on. Previously a verdict could die mid-session if the agent forgot to persist it; now the script enforces full coverage before continuing. If a trace is genuinely ambiguous, you can record it as an explicit skip rather than leaving it silently unverdicted.

PluginsDashboard

Plugins v0.5.13

Plugins surface which Bitfab org they’re writing to

The plugin MCP now flags which Bitfab org it reads and writes from, so you’ll catch mismatches between your project’s BITFAB_API_KEY and the org open in your Studio tab before traces land somewhere unexpected. Coding agents now call get_api_key_context at the start of a plugin MCP session, and again whenever you mention data you just wrote isn’t visible in Studio. The same tightening applies to the remote MCP server in the Dashboard for direct (non-plugin) callers.

Ruby SDK

Ruby SDK v0.12.0

Ruby SDK: skip child spans during replay

When you replay historical traces through client.replay(...), you can now have child spans return their recorded outputs instead of running real code. Three strategies control which children get short-circuited:

mock: "none" (default) reruns every child span as before.
mock: "all" returns historical output for every child.
mock: "marked" returns historical output only for spans declared with mock_on_replay: true, and runs everything else real.

class Pipeline
  include Bitfab::Traceable
  bitfab_function "pipeline"

  bitfab_span :classify, type: "llm", mock_on_replay: true
  def classify(text)
    # paid LLM call
  end

  bitfab_span :process, type: "agent"
  def process(text)
    classify(text)
  end
end

client.replay(Pipeline.new, :process,
  trace_function_key: "pipeline",
  mock: "marked")

Use mock: "marked" to iterate on agent logic without paying for the marked child calls on each replay. Use mock: "all" for the cheapest possible replay (only the root function runs real code). Brings the Ruby SDK to parity with the existing mock option in the Python and TypeScript SDKs.

Ruby SDK: fluent wrapper for shared trace function keys

client.get_function(key) returns a wrapper bound to that trace function key, so you can wrap multiple methods or classes without repeating the key on every call.

fn = Bitfab.client.get_function("openai")
fn.wrap(OpenAI::Client, :chat, name: "Chat", type: "llm")
fn.wrap(OpenAI::Client, :embeddings, name: "Embed", type: "llm")

Matches client.get_function in the Python SDK and client.getFunction in TypeScript.

DashboardPlugins

Accurate experiment counts with multi-label traces

Experiment pass/fail counts now correctly deduplicate traces that have labels from multiple sources (human review, approved agent, unapproved agent). Previously, a trace with both a human and an agent label could be double-counted in experiment totals. The viewer now picks the highest-priority label per trace: human labels take precedence over approved agent labels, which take precedence over unapproved ones.

Experiments auto-label replayed traces

When you run an experiment through the assistant, replayed traces now receive agent labels automatically. The experiment viewer shows pass/fail results immediately after a replay completes, without requiring a manual labeling step first.

PluginsTypeScript SDKPython SDK

Plugins v0.5.11, TypeScript SDK v0.12.2, Python SDK v0.12.1

Replay with mocks: shared-key spans return the correct output

Fixed an off-by-one in the replay mockTree when the function under test and one of its children share one traceFunctionKey (the canonical getFunction(key).withSpan(...) pattern). The marked child was returning the root’s historical output instead of its own. The mockTree is now keyed by (traceFunctionKey, spanName, callIndex), which also unblocks recursive same-key replays.

const summarize = bitfab.getFunction("summarize-thread")
const buildTranscript = summarize.withSpan(
  { name: "buildTranscript", mockOnReplay: true },
  async (id) => db.getTranscript(id),
)
const processSummarize = summarize.withSpan(
  { name: "processSummarize" },
  async (id) => generate(await buildTranscript(id)),
)
// `mock: "marked"` now returns buildTranscript's own historical
// transcript string, not processSummarize's full result object.

Mocked non-async Promise-returning functions stay Promises

If you wrap a function fetchX() { return fetch(...) } (no async, but returns a Promise) and mock it during replay, the mocked return is now a Promise, not a raw value. Downstream .then(...) callers no longer crash. Detected at wrap time.

`/bitfab:assistant` experiments auto-pick parallel or serial

Phase 5 of the assistant skill now checks whether subagent worktrees inherit bypass permissions before forking parallel experiments. If permissions.defaultMode: "bypassPermissions" is set in committed .claude/settings.json or ~/.claude/settings.json, experiments fork to worktree-isolated subagents; otherwise they run serially in the main agent. Cursor and Codex always run serial since they don’t support worktree-isolated subagent calls.

Dashboard

Organization switcher fix

Fixed the organization switcher dropdown not appearing in the header. After upgrading to Clerk v7, the switcher silently returned no memberships, making it impossible to switch between teams. The switcher now reliably shows all your organizations, with your personal workspace listed first and the rest sorted alphabetically.

PluginsDashboardPlugin Lib

Plugins v0.5.10

Live agent activity in Studio

The Studio home page now shows what your coding agent is doing in real time. While the assistant is working, the agent card highlights green and displays the current tool action (e.g., “Reading traces…”, “Creating grader…”). When the agent finishes or goes idle, the card fades back to its neutral state. Activity persists across page navigation within Studio, so you won’t lose track of the agent’s progress.

PluginsDashboardPlugin Lib

Plugins v0.5.9

Studio detects which coding agent opened it

When Studio is launched from Cursor or Codex, the UI now shows that agent’s logo and name instead of defaulting to Claude Code. The welcome page, header, and “Return to” button all reflect the agent that started the session.

PluginsTypeScript SDKDashboard

Plugins v0.5.8, TypeScript SDK v0.12.1

Replay failure handling in the assistant skill

/bitfab:assistant now separates infrastructure failures (missing DB rows, rejected writes) from real regressions during replay, and keeps unreplayable traces out of the pass-rate. When a child span fails environmentally, it suggests either flipping the span to mockOnReplay or pointing replay at the trace’s source environment.

Replay mocks return the correct child span’s output

Fixed ordering bugs in mock: "marked" that caused a marked child span to return a sibling’s historical output instead of its own. Upgrade to TypeScript SDK 0.12.1 if you’re using mock: "marked" on 0.12.0.

PluginsTypeScript SDKPython SDKDashboard

Plugins v0.5.7, TypeScript SDK v0.12.0, Python SDK v0.12.0

Mock child spans during replay

When you replay a recorded trace against new code, child spans sometimes fail locally for reasons unrelated to what you’re iterating on, like a paid API key you don’t have set, a flaky external service, or a production database row that isn’t seeded in your local environment. Replay now supports skipping those children and returning their recorded outputs instead, so the root function can still run.Pass a mock strategy to replay() to control it. "none" (default) runs every child for real. "all" returns the historical output for every descendant. "marked" only short-circuits descendants you’ve tagged at definition time, leaving everything else to run real, which is the iteration-friendly mode.Tag a span with mockOnReplay: true in TypeScript or mock_on_replay=True in Python:

bitfab.withSpan("fetch-article-from-db", { mockOnReplay: true }, async (id) => db.find(id))

@bitfab.span("fetch-article-from-db", mock_on_replay=True)
def fetch_article_from_db(article_id): ...

Then replay with mock: "marked" (TS) or mock="marked" (Python). The flagged child returns its recorded output and downstream spans run real code, so you can iterate on the analysis or formatting steps without standing up the upstream dependency.When the assistant skill is replaying a function and a child span fails environmentally, it’ll now suggest this fix directly. Full docs: TypeScript SDK and Python SDK reference under “Mocking child spans during replay”.

Plugins

Plugins v0.5.6

Reliable focus restoration for macOS terminals

When clicking “Return to coding agent” in Studio, focus now reliably returns to the correct terminal app. The previous approach could target the wrong window if the terminal’s environment was modified (common inside Claude Code). The plugin now identifies your terminal by walking the process tree to find the parent application. For iTerm2 users with multiple windows, focus targets the exact session pane.

PluginsDashboard

Plugins v0.5.5

Persistent Studio session for the assistant flow

Add studio to any /bitfab:assistant invocation (e.g., /bitfab:assistant studio) to keep a single Studio window open for the entire flow. Dataset review and experiment results open inside the same window instead of launching separate ones, so you stay in one place while iterating. Without the studio argument, the flow works exactly as before.

PluginsDashboard

Plugins v0.5.4

See which template renders each span at a glance

Iterating on the right template is faster when you can tell which one runs for the span you’re looking at. In the template preview, click or arrow-key through any span in the trace viewer and the matching card in the left rail lights up in that span’s color. That’s the template to edit.

Know when your coding agent is mid-edit

Stay out of the agent’s way and watch its work land in context. When your agent saves a template, the studio names who is editing (Claude Code, Cursor, or Codex), pulses the affected card in the rail, and outlines the exact region inside the rendered span, even on instant saves, so you don’t miss it.

PluginsDashboard

Plugins v0.5.2

Chat session capture

Bitfab plugins can now capture your coding-agent chat sessions and send them to the dashboard. Session capture is opt-in: enable it by setting BITFAB_CAPTURE_SESSIONS=true or adding "captureSessions": true to ~/.config/bitfab/config.json. Nothing is captured until you explicitly turn it on.Once enabled, sessions are only recorded after you invoke a Bitfab tool or slash command in the same conversation, so ordinary non-Bitfab conversations are never captured. Works across Claude Code, Cursor, and Codex.

Cross-platform focus restoration

When a plugin opens a browser window (OAuth login, Studio preview), focus now returns to your terminal or editor automatically on Linux and Windows. Previously this only worked on macOS. If platform tools aren’t available (e.g., Wayland on Linux), the handoff completes normally without focus restoration.

PluginsDashboard

Plugins v0.5.1

Studio connection errors surface immediately

When your coding agent opens the Studio preview, connection problems (expired API key, network timeout) are now caught before the browser window opens. Previously, errors could surface mid-session after you’d already started editing.

PluginsDashboard

Plugins v0.5.0

Click-to-target template editing

In the template preview studio, you can now click directly on a rendered span to tell your coding agent exactly which region you want changed. No more “make the user message smaller” guesswork: point at the element and describe the change.

Live preview auto-refresh

Templates saved in the studio now re-render in the preview automatically. Previously, you had to reload the page to see your changes.

Template reference for coding agents

The new get_template_reference MCP tool returns a catalog of every editable region in the standard template, so coding agents can discover what’s available without you having to describe it.

Dashboard

Template preview is faster on large functions

The template preview page loads significantly faster for functions with many spans. Pages that previously made dozens of parallel requests now resolve in a single batched call.

Dashboard

Template rendering page: template-first layout

The template rendering page now starts from the templates instead of starting from a trace. You pick a template and see exactly which spans it affects.The new three-column layout shows all templates for a function on the left, the current trace in the center (with non-matching spans dimmed), and affected spans across recent traces on the right. If the current trace has no spans for the selected template, the viewer auto-navigates to one that does.

PluginsDashboard

Plugins v0.4.15

API key context

Coding agents can now call get_api_key_context to find out which organization and environment their API key belongs to before sending traces. No more guesswork.

Organization: Acme, Inc. (5 members)
Your role: admin
Environment: production
API key: "Claude Plugin" (bf_c659...bfae)
User: Ankur Toshniwal <ankur@example.com>

Available in all three plugins (Claude Code, Cursor, Codex) as of v0.4.15.

API key descriptions

You can now add a description when creating API keys in the dashboard. Descriptions show up in the key list and are returned by get_api_key_context, so your coding agent can tell you which key it’s using without you having to check.

​Complete grader set on experiment panels

​Reliable grader reruns

​Live grading progress on trace rows

​Per-grader labeling on the trace list

​Related experiments on the dataset page

​Graders now have their own section

​Grader overview with dataset and experiment footprint

​Automatic code-change capture on replay

​Attach graders to a single replay

​Accurate experiment pass rates

​Honest experiment comparisons

​Graders on the experiment page

​Group and grade experiments from your coding agent

​Replays for up to 5,000 recent traces

​Guidance for creating datasets and experiments

​Paginated grader lists in coding agents

​Assistant runs your plan without pausing to ask

​Attach graders to experiments from your coding agent

​Experiments grouped by run

​Live dataset grader pass rates

​Experiment labeling shows only the run’s graders

​Grader definitions after saves

​Reliable Studio links

​Manage graders from a dataset

​Experiment graders run across every replay

​Framework labels on trace plans

​Consistent labels across trace lists

​Keep your place when switching functions

​Live grader results across review views

​More reliable dataset grader labeling

​Grader pass rates on trace lists

​Run graders without tuning

​Live dataset review updates

​Label traces while reviewing experiments

​Smoother dataset grader reruns

​Compare individual spans in the Diff view

​Clearer replay diffs

​Trace details open in a side panel

​Redesigned trace list rows

​Re-run graders on a dataset

​Label datasets grader-by-grader

​Reliable experiment history pagination

​Scroll through complete experiment histories

​Dataset graders now grade automatically

​Create and edit graders from your coding assistant

​Diff view for replay comparisons

​Assign graders to datasets from your coding assistant

​See experiment annotations in trace comparisons

​Keyboard navigation in dataset review and experiments

​Start setup with a specific request

​Lower tracing overhead in the Python SDK

​Browse past experiments from the dashboard

​Replay verdicts persist by the original trace

​Replay items rename source to original

​Replay verdicts are saved automatically

​Inject custom values into specific spans during replay

​Long trace results stay visible

​Read one span without loading the full trace

​Reliable nested traces from the first Node.js call

​Trace plan warnings for spans that won’t replay

​Structured Map and Set trace output

​More reliable dashboard startup

​Edit span templates without leaving the trace you’re viewing

​Fix flow reverts changes that cause regressions

​Choose where fixed traces get saved

​Trace plans stay available longer

​Session length in chat summaries

​Redesigned trace input and output view

​Collapsible JSON for trace payloads

​A clearer trace plan review

​The replay entry point always re-runs

​See which spans were mocked on a replay

​A replayability check before you accept a trace plan

​A faster path through instrumentation setup

​Studio opens in your normal browser

​Honest Studio launch reporting

​Automatic recovery when a Studio window never appears

​Animated replay explainer on the trace plan page

​Recover a stuck Studio session with bitfab login --force

​More reliable trace ingestion

Complete grader set on experiment panels

Reliable grader reruns

Live grading progress on trace rows

Per-grader labeling on the trace list

Related experiments on the dataset page

Graders now have their own section

Grader overview with dataset and experiment footprint

Automatic code-change capture on replay

Attach graders to a single replay

Accurate experiment pass rates

Honest experiment comparisons

Graders on the experiment page

Group and grade experiments from your coding agent

Replays for up to 5,000 recent traces

Guidance for creating datasets and experiments

Paginated grader lists in coding agents

Assistant runs your plan without pausing to ask

Attach graders to experiments from your coding agent

Experiments grouped by run

Live dataset grader pass rates

Experiment labeling shows only the run’s graders

Grader definitions after saves

Reliable Studio links

Manage graders from a dataset

Experiment graders run across every replay

Framework labels on trace plans

Consistent labels across trace lists

Keep your place when switching functions

Live grader results across review views

More reliable dataset grader labeling

Grader pass rates on trace lists

Run graders without tuning

Live dataset review updates

Label traces while reviewing experiments

Smoother dataset grader reruns

Compare individual spans in the Diff view

Clearer replay diffs

Trace details open in a side panel

Redesigned trace list rows

Re-run graders on a dataset

Label datasets grader-by-grader

Reliable experiment history pagination

Scroll through complete experiment histories

Dataset graders now grade automatically

Create and edit graders from your coding assistant

Diff view for replay comparisons

Assign graders to datasets from your coding assistant

See experiment annotations in trace comparisons

Keyboard navigation in dataset review and experiments

Start setup with a specific request

Lower tracing overhead in the Python SDK

Browse past experiments from the dashboard

Replay verdicts persist by the original trace

Replay items rename source to original

Replay verdicts are saved automatically

Inject custom values into specific spans during replay

Long trace results stay visible

Read one span without loading the full trace

Reliable nested traces from the first Node.js call

Trace plan warnings for spans that won’t replay

Structured Map and Set trace output

More reliable dashboard startup

Edit span templates without leaving the trace you’re viewing

Fix flow reverts changes that cause regressions

Choose where fixed traces get saved

Trace plans stay available longer

Session length in chat summaries

Redesigned trace input and output view

Collapsible JSON for trace payloads

A clearer trace plan review

The replay entry point always re-runs

See which spans were mocked on a replay

A replayability check before you accept a trace plan

A faster path through instrumentation setup

Studio opens in your normal browser

Honest Studio launch reporting

Automatic recovery when a Studio window never appears

Animated replay explainer on the trace plan page

Recover a stuck Studio session with `bitfab login --force`

More reliable trace ingestion