Skip to main content

Overview

Labeling lets you review traces and mark them with pass/fail outcomes or tags. Use it to:
  • Build evaluation datasets from real production data
  • Train automated graders by providing labeled examples
  • Track quality trends over time
  • Identify patterns in failures

Viewing Traces

Navigate to a trace function and open the Labeling view to see all captured traces.

Trace List

The list shows:
ColumnDescription
StatusSuccess or error
DurationExecution time
Grader ResultsPass/fail outcomes from automated graders
LabelsHuman-applied labels
CreatedWhen the trace was recorded

Filtering

Filter traces by:
  • Status: Success or error
  • Date Range: Filter by time period
  • Grader Results: Pass or fail on specific graders
  • Labels: Filter by human-applied labels
  • Tags: Filter by assigned tags

Searching

Use the search bar to find traces by:
  • Input content
  • Output content
  • Error messages

Trace Details

Click on a trace to view details:

Input

The exact input passed to the function:
{
  "order_id": "order-123",
  "items": ["item1", "item2"]
}

Output

The output returned:
{
  "order_id": "order-123",
  "total": 20
}

Span Tree

When functions call other wrapped functions, you see a hierarchical view:
Trace: agent-workflow
├── Span: validate-input (guardrail)
├── Span: fetch-data (function)
└── Span: generate-response (llm)
Click on individual spans to see their inputs, outputs, context, and timing.

Applying Labels

Label traces to indicate quality:
  1. Open a trace
  2. Review the input, output, and span tree
  3. Apply a pass/fail label or add tags
Labels feed into grader training — the more labeled examples you provide, the better automated graders perform.

Tagging

Organize traces with tags for filtering and dataset building:

Adding Tags

  1. Open a trace
  2. Click Add Tag
  3. Select an existing tag or create a new one

Managing Tags

Navigate to Tags in the user menu to:
  • Create new tags
  • Edit tag names and colors
  • Archive unused tags

Building Evaluation Datasets

Use labeled traces to build evaluation datasets:
  1. Filter to traces you want to include
  2. Label traces with pass/fail outcomes
  3. Apply tags to organize by feature, environment, or use case
  4. Use these labeled traces to train and validate graders

Best Practices

  • Label regularly: Consistent labeling improves grader accuracy
  • Tag strategically: Use tags to organize by feature, environment, or failure mode
  • Label both passes and failures: Graders need examples of both to learn effectively
  • Start with the most impactful function: Focus labeling effort on the trace function with the highest volume or most critical failures