Overview
Labeling lets you review traces and mark them with pass/fail outcomes or tags. Use it to:- Build evaluation datasets from real production data
- Train automated graders by providing labeled examples
- Track quality trends over time
- Identify patterns in failures
Viewing Traces
Navigate to a trace function and open the Labeling view to see all captured traces.Trace List
The list shows:| Column | Description |
|---|---|
| Status | Success or error |
| Duration | Execution time |
| Grader Results | Pass/fail outcomes from automated graders |
| Labels | Human-applied labels |
| Created | When the trace was recorded |
Filtering
Filter traces by:- Status: Success or error
- Date Range: Filter by time period
- Grader Results: Pass or fail on specific graders
- Labels: Filter by human-applied labels
- Tags: Filter by assigned tags
Searching
Use the search bar to find traces by:- Input content
- Output content
- Error messages
Trace Details
Click on a trace to view details:Input
The exact input passed to the function:Output
The output returned:Span Tree
When functions call other wrapped functions, you see a hierarchical view:Applying Labels
Label traces to indicate quality:- Open a trace
- Review the input, output, and span tree
- Apply a pass/fail label or add tags
Tagging
Organize traces with tags for filtering and dataset building:Adding Tags
- Open a trace
- Click Add Tag
- Select an existing tag or create a new one
Managing Tags
Navigate to Tags in the user menu to:- Create new tags
- Edit tag names and colors
- Archive unused tags
Building Evaluation Datasets
Use labeled traces to build evaluation datasets:- Filter to traces you want to include
- Label traces with pass/fail outcomes
- Apply tags to organize by feature, environment, or use case
- Use these labeled traces to train and validate graders
Best Practices
- Label regularly: Consistent labeling improves grader accuracy
- Tag strategically: Use tags to organize by feature, environment, or failure mode
- Label both passes and failures: Graders need examples of both to learn effectively
- Start with the most impactful function: Focus labeling effort on the trace function with the highest volume or most critical failures