Evaluate pipeline against example set
POST /pipelines/{pipelineId}/evaluate
Creates a pending evaluation and queues a job to run the pipeline against each example in the example set. The evaluation runs asynchronously - poll GET /evaluations/{id} to check status. Once status is ‘succeeded’, aggregate results are available.
Authorizations
Section titled “Authorizations ”Parameters
Section titled “ Parameters ”Path Parameters
Section titled “Path Parameters ”ID of the pipeline to evaluate
ID of the pipeline to evaluate
Request Body
Section titled “Request Body ”object
ID of the example set to evaluate against
Evaluation method (default: ‘llm_judge’)
Evaluator-specific configuration
Slot-to-slot mappings between example set inputs/outputs and pipeline inputs/outputs
object
Maps example input slots to pipeline input slots
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Maps example output slots to pipeline output slots for comparison
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Responses
Section titled “ Responses ”Evaluation initiated - poll for completion
object
Unique identifier for the evaluation (nanoid format)
ID of the team that owns this evaluation
ID of the pipeline being evaluated
Pipeline configuration snapshot at evaluation time
ID of the example set used for evaluation
Example set configuration snapshot at evaluation time
Evaluation method: ‘llm_judge’, ‘exact_match’, or ‘semantic’
Evaluator-specific configuration
How inputs/outputs are mapped: ‘explicit’
Slot-to-slot mappings between example set and pipeline
object
Maps example input slots to pipeline input slots
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Maps example output slots to pipeline output slots for comparison
object
Slot ID from the example set schema
Slot ID from the pipeline schema
Status: ‘pending’, ‘running’, ‘succeeded’, ‘failed’, or ‘cancelled’
Total examples in evaluation
Number of passed examples
Number of failed examples
Number of examples with execution errors
Number of skipped examples
Overall evaluation score (0.0 to 1.0)
LLM-generated summary (deferred feature)
ID of the job processing this evaluation
Error message if evaluation failed
When evaluation started (ISO 8601)
When evaluation completed (ISO 8601)
When the evaluation was created (ISO 8601)
ID of the user who triggered the evaluation
Bad Request - Validation error or invalid input
object
Unauthorized - Authentication required or invalid token
object
Forbidden - Insufficient permissions
object
Not Found - Resource does not exist