Labeling - Bitfab

Overview

Labeling lets you review traces and mark them with pass/fail outcomes or tags. Use it to:

Build evaluation datasets from real production data
Train automated graders by providing labeled examples
Track quality trends over time
Identify patterns in failures

Viewing Traces

Navigate to a trace function and open the Labeling view to see all captured traces.

Trace List

The list shows:

Column	Description
Status	Success or error
Duration	Execution time
Grader Results	Pass/fail outcomes from automated graders
Labels	Human-applied labels
Created	When the trace was recorded

Filtering

Filter traces by:

Status: Success or error
Date Range: Filter by time period
Grader Results: Pass or fail on specific graders
Labels: Filter by human-applied labels
Tags: Filter by assigned tags

Searching

Use the search bar to find traces by:

Input content
Output content
Error messages

Trace Details

Click on a trace to view details:

Input

The exact input passed to the function:

{
  "order_id": "order-123",
  "items": ["item1", "item2"]
}

Output

The output returned:

{
  "order_id": "order-123",
  "total": 20
}

Span Tree

When functions call other wrapped functions, you see a hierarchical view:

Trace: agent-workflow
├── Span: validate-input (guardrail)
├── Span: fetch-data (function)
└── Span: generate-response (llm)

Click on individual spans to see their inputs, outputs, context, and timing.

Applying Labels

Label traces to indicate quality:

Open a trace
Review the input, output, and span tree
Apply a pass/fail label or add tags

Labels feed into grader training — the more labeled examples you provide, the better automated graders perform.

Tagging

Organize traces with tags for filtering and dataset building:

Adding Tags

Open a trace
Click Add Tag
Select an existing tag or create a new one

Managing Tags

Navigate to Tags in the user menu to:

Create new tags
Edit tag names and colors
Archive unused tags

Building Evaluation Datasets

Use labeled traces to build evaluation datasets:

Filter to traces you want to include
Label traces with pass/fail outcomes
Apply tags to organize by feature, environment, or use case
Use these labeled traces to train and validate graders

Best Practices

Label regularly: Consistent labeling improves grader accuracy
Tag strategically: Use tags to organize by feature, environment, or failure mode
Label both passes and failures: Graders need examples of both to learn effectively
Start with the most impactful function: Focus labeling effort on the trace function with the highest volume or most critical failures

Documentation Index

​Overview

​Viewing Traces

​Trace List

​Filtering

​Searching

​Trace Details

​Input

​Output

​Span Tree

​Applying Labels

​Tagging

​Adding Tags

​Managing Tags

​Building Evaluation Datasets

​Best Practices