List evaluations
GET /evaluations
List evaluations with optional filtering by pipeline, team, example set, status, or date range.
Authorizations
Section titled “Authorizations ”Parameters
Section titled “ Parameters ”Query Parameters
Section titled “Query Parameters ”Comma-separated list of evaluation IDs to filter by
Comma-separated list of evaluation IDs to filter by
Comma-separated list of team IDs to filter by
Comma-separated list of team IDs to filter by
Comma-separated list of pipeline IDs to filter by
Comma-separated list of pipeline IDs to filter by
Comma-separated list of example set IDs to filter by
Comma-separated list of example set IDs to filter by
Comma-separated list of statuses to filter by (pending, running, succeeded, failed, cancelled)
Comma-separated list of statuses to filter by (pending, running, succeeded, failed, cancelled)
Filter by user who created the evaluation
Filter by user who created the evaluation
Filter to evaluations created after this date
Filter to evaluations created after this date
Filter to evaluations created before this date
Filter to evaluations created before this date
Page number for pagination (1-indexed)
Page number for pagination (1-indexed)
Number of results per page (1-100, default: 20)
Number of results per page (1-100, default: 20)
Field to sort results by
Field to sort results by
Sort direction: ascending or descending
Sort direction: ascending or descending
Responses
Section titled “ Responses ”Evaluations retrieved successfully
object
Array of evaluations matching the query
object
Unique identifier for the evaluation (nanoid format)
ID of the team that owns this evaluation
ID of the pipeline being evaluated
Pipeline configuration snapshot at evaluation time
ID of the example set used for evaluation
Example set configuration snapshot at evaluation time
Evaluation method: ‘llm_judge’, ‘exact_match’, or ‘semantic’
Evaluator-specific configuration
How inputs/outputs are mapped: ‘explicit’
Slot-to-slot mappings between example set and pipeline
object
Maps example input slots to pipeline input slots
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Maps example output slots to pipeline output slots for comparison
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Status: ‘pending’, ‘running’, ‘succeeded’, ‘failed’, or ‘cancelled’
Total examples in evaluation
Number of passed examples
Number of failed examples
Number of examples with execution errors
Number of skipped examples
Overall evaluation score (0.0 to 1.0)
LLM-generated summary (deferred feature)
ID of the job processing this evaluation
Error message if evaluation failed
When evaluation started (ISO 8601)
When evaluation completed (ISO 8601)
When the evaluation was created (ISO 8601)
ID of the user who triggered the evaluation
Total number of evaluations matching the query (before pagination)
Current page number
Number of results per page
Bad Request - Validation error or invalid input
object
Unauthorized - Authentication required or invalid token
object
Forbidden - Insufficient permissions