Skip to content

Citations

Citations enable attribution and provenance tracking for AI-generated outputs. When a pipeline execution completes, it includes outputCitations that map output fields to source documents, allowing you to show users exactly where information came from.

Citations are pointers to source data at various levels of granularity:

  • File-level - References an entire file
  • PDF Character Range - References specific text within a PDF (with page numbers)
  • CSV/XLSX Character Range - References specific cells or rows
  • File Chunk - References a semantic chunk from processed documents

Each citation includes metadata that can be resolved to displayable information like:

  • File name and metadata
  • Page numbers (for PDFs)
  • Bounding box coordinates (for PDFs)
  • Cited text excerpts
  • Source references (e.g., “data.csv!B5” for CSV cells)

When a pipeline execution succeeds, check the outputCitations field:

Get execution with citations

Terminal window
curl https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN \
-H "Authorization: Bearer $API_TOKEN"

Response includes outputCitations:

{
"executionId": "GkR8I6rHBms3W4Qfa2-FN",
"status": "succeeded",
"output": {
"answer": "The total amount of new purchases in December 2024 was $999.00"
},
"outputCitations": [
{
"outputPointer": "",
"citations": [
{
"type": "pdf_character_range",
"pdfFileExtractionId": "extr_abc123",
"charStart": 763,
"charEnd": 787
}
]
}
]
}

Each entry in outputCitations contains:

  • outputPointer - JSON Pointer (RFC 6901) to the output field. Empty string "" means the root output.
  • citations - Array of citation objects pointing to source documents
{
"type": "pdf_character_range",
"pdfFileExtractionId": "extr_abc123",
"charStart": 763,
"charEnd": 787
}

References a character range in the extracted PDF content. Can be resolved to page numbers and bounding boxes.

{
"type": "csv_character_range",
"csvFileExtractionId": "extr_def456",
"charStart": 500,
"charEnd": 550
}

References a character range in CSV content. Can be resolved to row/column references.

{
"type": "xlsx_character_range",
"xlsxFileExtractionId": "extr_ghi789",
"charStart": 1200,
"charEnd": 1250
}

References a character range in XLSX content. Can be resolved to sheet/row/column references.

{
"type": "file_chunk",
"fileChunkId": "chunk_xyz"
}

References a semantic chunk from processed documents.

{
"type": "file",
"fileId": "file_abc123"
}

References an entire file.

Raw citations contain pointers (like pdfFileExtractionId and character ranges) that need to be resolved to displayable data. Use the /citations/resolve endpoint:

Resolve citations

Terminal window
curl -X POST https://api.catalyzed.ai/citations/resolve \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"teamId": "ZkoDMyjZZsXo4VAO_nJLk",
"citations": [
{
"type": "pdf_character_range",
"pdfFileExtractionId": "extr_abc123",
"charStart": 763,
"charEnd": 787
}
]
}'

Response:

{
"resolvedCitations": [
{
"type": "pdf_character_range",
"original": {
"type": "pdf_character_range",
"pdfFileExtractionId": "extr_abc123",
"charStart": 763,
"charEnd": 787
},
"file": {
"fileId": "file_abc123",
"fileName": "credit-card-statement.pdf",
"fileSize": 2322,
"mimeType": "application/pdf"
},
"citedText": "CURRENT BALANCE: $999.00",
"pageNumber": 1,
"boundingBox": {
"x1": 100,
"y1": 200,
"x2": 300,
"y2": 220
}
}
]
}

Resolved citations include:

  • type - Citation type (same as original)
  • original - The raw citation that was resolved
  • file - File metadata (fileId, fileName, fileSize, mimeType)
  • citedText - The actual text from the source (for character range citations)
  • pageNumber - Page number (for PDF citations)
  • boundingBox - Coordinates on the page (for PDF citations, may be null)
  • chunkIndex - Chunk index (for file chunk citations)
  • sourceType - Source type (for file chunk citations)

Here’s a complete flow for displaying citations in your UI:

async function displayExecutionWithCitations(executionId: string) {
// 1. Get execution
const execution = await fetch(
`https://api.catalyzed.ai/pipeline-executions/${executionId}`,
{ headers: { Authorization: `Bearer ${apiToken}` } }
).then(r => r.json());
if (execution.status !== "succeeded") {
return;
}
// 2. Extract citations
const citations = execution.outputCitations.flatMap(
(oc) => oc.citations
);
if (citations.length === 0) {
// No citations to display
return;
}
// 3. Resolve citations
const { resolvedCitations } = await fetch(
"https://api.catalyzed.ai/citations/resolve",
{
method: "POST",
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
teamId: execution.teamId,
citations,
}),
}
).then(r => r.json());
// 4. Display in UI
for (const resolved of resolvedCitations) {
if (resolved.type === "error") {
console.error("Citation resolution failed:", resolved.error);
continue;
}
console.log(`Cited from: ${resolved.file.fileName}`);
if (resolved.type === "pdf_character_range") {
console.log(`Page ${resolved.pageNumber}: "${resolved.citedText}"`);
} else if (resolved.citedText) {
console.log(`Text: "${resolved.citedText}"`);
}
}
}

If a citation fails to resolve, the response includes an error citation:

{
"type": "error",
"original": { ... },
"error": {
"code": "CITATION_NOT_FOUND",
"message": "Extraction not found"
}
}

The /citations/resolve endpoint returns a 500 error if any citation fails to resolve (no partial success).

  • Maximum 100 citations per resolve request
  • At least 1 citation required
  1. Check for citations - Always check if outputCitations exists and has entries
  2. Resolve in batch - Collect all citations from an execution and resolve them in one request
  3. Handle errors - Check for error citations in the resolved response
  4. Display context - Show file names, page numbers, and cited text to users
  5. Link to sources - Provide links to download or view the original files

See the Citations API for complete endpoint documentation.