Evaluate pipeline against example set

POST

/pipelines/{pipelineId}/evaluate

Production API

Creates a pending evaluation and queues a job to run the pipeline against each example in the example set. The evaluation runs asynchronously - poll GET /evaluations/{id} to check status. Once status is ‘succeeded’, aggregate results are available.

Authorizations

bearerAuth
cookieAuth

Parameters

Path Parameters

pipelineId

required

ID of the pipeline to evaluate

string

ID of the pipeline to evaluate

Request Body

object

exampleSetId

required

ID of the example set to evaluate against

string

evaluatorType

Evaluation method (default: ‘llm_judge’)

string

Allowed values: llm_judge exact_match semantic

evaluatorConfig

Evaluator-specific configuration

nullable

mappingConfig

required

Slot-to-slot mappings between example set inputs/outputs and pipeline inputs/outputs

object

inputMappings

required

Maps example input slots to pipeline input slots

Array<object>

object

exampleSlotId

required

Slot ID from the example set schema

string

pipelineSlotId

required

Slot ID from the pipeline schema

string

outputMappings

required

Maps example output slots to pipeline output slots for comparison

Array<object>

object

exampleSlotId

required

Slot ID from the example set schema

string

pipelineSlotId

required

Slot ID from the pipeline schema

string

Responses

202

Evaluation initiated - poll for completion

object

evaluationId

required

Unique identifier for the evaluation (nanoid format)

string

teamId

required

ID of the team that owns this evaluation

string

pipelineId

required

ID of the pipeline being evaluated

string

pipelineConfigurationId

required

Pipeline configuration snapshot at evaluation time

string

exampleSetId

required

ID of the example set used for evaluation

string

exampleSetConfigurationId

required

Example set configuration snapshot at evaluation time

string

evaluatorType

required

Evaluation method: ‘llm_judge’, ‘exact_match’, or ‘semantic’

string

Allowed values: llm_judge exact_match semantic

evaluatorConfig

Evaluator-specific configuration

nullable

mappingType

required

How inputs/outputs are mapped: ‘explicit’

string

Allowed values: explicit

mappingConfig

required

Slot-to-slot mappings between example set and pipeline

object

inputMappings

required

Maps example input slots to pipeline input slots

Array<object>

object

exampleSlotId

required

Slot ID from the example set schema

string

pipelineSlotId

required

Slot ID from the pipeline schema

string

outputMappings

required

Maps example output slots to pipeline output slots for comparison

Array<object>

object

exampleSlotId

required

Slot ID from the example set schema

string

pipelineSlotId

required

Slot ID from the pipeline schema

string

status

required

Status: ‘pending’, ‘running’, ‘succeeded’, ‘failed’, or ‘cancelled’

string

Allowed values: pending running succeeded failed cancelled

totalExamples

required

Total examples in evaluation

number

nullable

passedCount

required

Number of passed examples

number

nullable

failedCount

required

Number of failed examples

number

nullable

errorCount

required

Number of examples with execution errors

number

nullable

skippedCount

required

Number of skipped examples

number

nullable

aggregateScore

required

Overall evaluation score (0.0 to 1.0)

number

nullable

summary

required

LLM-generated summary (deferred feature)

string

nullable

jobId

required

ID of the job processing this evaluation

string

nullable

errorMessage

required

Error message if evaluation failed

string

nullable

startedAt

required

When evaluation started (ISO 8601)

string format: date-time

nullable

completedAt

required

When evaluation completed (ISO 8601)

string format: date-time

nullable

createdAt

required

When the evaluation was created (ISO 8601)

string format: date-time

createdBy

required

ID of the user who triggered the evaluation

string

nullable

400

Bad Request - Validation error or invalid input

object

error

required

string

code

string

details

nullable

retryable

boolean

401

Unauthorized - Authentication required or invalid token

object

error

required

string

code

string

details

nullable

retryable

boolean

403

Forbidden - Insufficient permissions

object

error

required

string

code

string

details

nullable

retryable

boolean

404

Not Found - Resource does not exist

object

error

required

string

code

string

details

nullable

retryable

boolean