Claude Plugin - Bitfab

The Bitfab Claude Code plugin brings the full evaluation workflow into Claude Code. It provides MCP tools for trace inspection and diagnostics, slash commands for authentication, and automatic notifications — so you never have to leave your editor.

Installation

Run the CLI from your project directory:

npx bitfab-cli init --editor claude

This installs the Bitfab plugin, opens your browser to log in, and launches /bitfab:setup.

Manual installation (without the CLI)

Add the Bitfab plugin marketplace and install:

> /plugin marketplace add Project-White-Rabbit/bitfab-claude-plugin
> /plugin install bitfab
> /exit

Then restart Claude Code and run the setup command to authenticate and instrument your codebase:

claude --continue

> /bitfab:setup

What the Plugin Does

Automatic Setup

The /bitfab:setup command runs a multi-phase workflow:

Login — Opens your browser for OAuth authentication, saves credentials securely
Instrument + Replay (in parallel, per workflow) — Reads your codebase, finds all AI workflows (LLM calls, agents, AI-driven decisions), and presents them as a numbered list. You choose which to instrument — it adds tracing with minimal diffs, and at the same time generates a replay script so you can regression-test your trace functions against production data with one command

You can run individual phases:

> /bitfab:setup login        # Auth only
> /bitfab:setup instrument   # Trace instrumentation only
> /bitfab:setup replay       # Replay script creation only

The setup is interactive — it presents 2-5 concrete options per decision point with a recommended choice, so you stay in control throughout.

Assistant

The /bitfab:assistant command turns production traces into code improvements. Your agent will do the mechanical work and collaborate with you on three steps:

Build a dataset from production traces — search for failures, label them with expected outcomes
Experiment against that dataset — make isolated code changes, replay, compare results
Hill climb — repeat until the best change is found, then present results

Run it with an optional trace function key:

> /bitfab:assistant
> /bitfab:assistant order-processing

Building the Dataset

Your coding agent does the data wrangling — it searches production traces for failures, reads full inputs and outputs, and identifies edge cases. It then presents edge cases for your judgment: is this a failure (and what should the output be), correct, or irrelevant? This labeled dataset becomes the benchmark for all experiments. The plugin opens a rich UI for navigating and labeling the dataset, then brings you back to your coding agent so you stay in flow. You can label every trace yourself, or label a few and let the agent classify the rest based on the patterns you’ve established.

Running Experiments

The command reads your code, diagnoses failure patterns, and categorizes proposed changes:

Code fixes — deterministic bugs, bundled into one experiment as a foundation
Judgment-based fixes — prompt changes, search tuning, output formatting — each gets its own experiment
Infrastructure proposals — larger changes noted for future work, not experimented on

Independent experiments run in parallel — each in its own isolated subagent on a separate git worktree. Each subagent edits the code, runs the replay script against your labeled dataset, and compares new outputs to expected outcomes.

Results

After each round, you see which traces now match expected outcomes, which still diverge, and whether any regressions occurred. You decide whether to keep iterating or stop. The final summary shows pass rate improvement and all files changed — uncommitted in your working tree for review.

MCP Tools

The plugin registers MCP tools that Claude Code can call during conversations. These let you inspect traces, diagnose failures, and improve your code without leaving the editor.

`get_bitfab_api_key`

Retrieve your API key for environment variable configuration.

`list_trace_functions`

List all traced functions in your organization with stats at a glance — what’s being traced and how it’s performing.

"Show me all my traced functions"

`get_trace_function_diagnostics`

Deep-dive into failures for a specific trace function. Returns failure summaries, recent failures with trace IDs, and exact reasons why traces are failing.

"Why is my order-processing function failing?"

`search_traces`

Search and filter traces with full-text search, date ranges, status filters, and regex matching. Supports drill-down to narrow results progressively.

"Find all failed traces for order-processing from the last week"

`read_traces`

Read one or more traces by ID with full details — span summaries or complete input/output/reasoning/context/errors per span.

"Show me the full details of trace abc-123"

Slash Commands

Command	Description
`/bitfab:setup`	Full setup workflow — authenticate, instrument, create replay scripts
`/bitfab:setup login`	Auth only
`/bitfab:assistant`	Build a dataset from traces, experiment with code changes, improve pass rates
`/bitfab:assistant <key>`	Iterate on a specific trace function
`/bitfab:logout`	Remove saved credentials
`/bitfab:status`	Check auth status, plugin version, and available updates
`/bitfab:update`	Update the plugin to the latest version

Session Notifications

The plugin runs a hook on every session start and resume that checks:

Authentication: If you’re not logged in, it reminds you to run /bitfab:setup
Updates: If a new plugin version is available, it tells you how to update (or auto-updates if you’ve enabled it)

Example Workflows

Instrument a new project

> /bitfab:setup

The agent detects your project language, finds AI workflows, presents options, and instruments your chosen workflows — all interactively.

Diagnose and fix a failing function

Ask Claude Code naturally:

"My order-processing traces are failing. What's going wrong and can you fix it?"

The plugin calls get_trace_function_diagnostics to analyze failures, then search_traces and read_traces to inspect specific failing traces — and suggests code fixes directly.

Iterate on a trace function

> /bitfab:assistant memory-search

The agent finds failing traces, walks you through labeling them with expected outcomes, diagnoses the failure patterns in your code, then runs experiments — editing prompts or code, replaying against your labeled dataset, and reporting what improved. You stay in control at every decision point.

Replay after a code change

After updating a function, run your replay script to test against production data:

npx tsx scripts/replay.ts extraction --limit 20

Or ask Claude Code to do it for you — it can run the script and interpret the results.

Configuration

Credentials

Credentials are stored in ~/.config/bitfab/credentials.json (created by /bitfab:setup login with owner-readable permissions).

Environment Variables

Variable	Description
`BITFAB_API_KEY`	Override the stored API key

Troubleshooting

Not authenticated

If you see “Not authenticated” on session start:

Run /bitfab:setup login to authenticate via browser
Check that ~/.config/bitfab/credentials.json exists and contains your API key
If using an environment variable, verify BITFAB_API_KEY is set

MCP tools not available

If Claude Code can’t access the Bitfab tools:

Run /bitfab:status to check connection status
Try restarting Claude Code — the MCP server initializes on startup
Verify the plugin is installed: check /plugin list

Stale session

The plugin automatically detects and recovers from stale MCP sessions. If tools stop working mid-conversation, they’ll reconnect on the next call.

Plugin updates

Run /bitfab:status to check for updates, then /bitfab:update to install the latest version. Restart Claude Code after updating.

Documentation Index

​Installation

​What the Plugin Does

​Automatic Setup

​Assistant

​Building the Dataset

​Running Experiments

​Results

​MCP Tools

​get_bitfab_api_key

​list_trace_functions

​get_trace_function_diagnostics

​search_traces

​read_traces

​Slash Commands

​Session Notifications

​Example Workflows

​Instrument a new project

​Diagnose and fix a failing function

​Iterate on a trace function

​Replay after a code change

​Configuration

​Credentials

​Environment Variables

​Troubleshooting

​Not authenticated

​MCP tools not available

​Stale session

​Plugin updates