Citations
Citations enable attribution and provenance tracking for AI-generated outputs. When a pipeline execution completes, it includes outputCitations that map output fields to source documents, allowing you to show users exactly where information came from.
What Are Citations?
Section titled “What Are Citations?”Citations are pointers to source data at various levels of granularity:
- File-level - References an entire file
- PDF Character Range - References specific text within a PDF (with page numbers)
- CSV/XLSX Character Range - References specific cells or rows
- File Chunk - References a semantic chunk from processed documents
Each citation includes metadata that can be resolved to displayable information like:
- File name and metadata
- Page numbers (for PDFs)
- Bounding box coordinates (for PDFs)
- Cited text excerpts
- Source references (e.g., “data.csv!B5” for CSV cells)
Accessing Citations from Executions
Section titled “Accessing Citations from Executions”When a pipeline execution succeeds, check the outputCitations field:
Get execution with citations
curl https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN \ -H "Authorization: Bearer $API_TOKEN"const response = await fetch( "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN", { headers: { Authorization: `Bearer ${apiToken}` } });const execution = await response.json();
// Access citationsconst citations = execution.outputCitations;response = requests.get( "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN", headers={"Authorization": f"Bearer {api_token}"})execution = response.json()
# Access citationscitations = execution["outputCitations"]Response includes outputCitations:
{ "executionId": "GkR8I6rHBms3W4Qfa2-FN", "status": "succeeded", "output": { "answer": "The total amount of new purchases in December 2024 was $999.00" }, "outputCitations": [ { "outputPointer": "", "citations": [ { "type": "pdf_character_range", "pdfFileExtractionId": "extr_abc123", "charStart": 763, "charEnd": 787 } ] } ]}Citation Structure
Section titled “Citation Structure”Each entry in outputCitations contains:
outputPointer- JSON Pointer (RFC 6901) to the output field. Empty string""means the root output.citations- Array of citation objects pointing to source documents
Citation Types
Section titled “Citation Types”PDF Character Range
Section titled “PDF Character Range”{ "type": "pdf_character_range", "pdfFileExtractionId": "extr_abc123", "charStart": 763, "charEnd": 787}References a character range in the extracted PDF content. Can be resolved to page numbers and bounding boxes.
CSV Character Range
Section titled “CSV Character Range”{ "type": "csv_character_range", "csvFileExtractionId": "extr_def456", "charStart": 500, "charEnd": 550}References a character range in CSV content. Can be resolved to row/column references.
XLSX Character Range
Section titled “XLSX Character Range”{ "type": "xlsx_character_range", "xlsxFileExtractionId": "extr_ghi789", "charStart": 1200, "charEnd": 1250}References a character range in XLSX content. Can be resolved to sheet/row/column references.
File Chunk
Section titled “File Chunk”{ "type": "file_chunk", "fileChunkId": "chunk_xyz"}References a semantic chunk from processed documents.
File-Level
Section titled “File-Level”{ "type": "file", "fileId": "file_abc123"}References an entire file.
Resolving Citations
Section titled “Resolving Citations”Raw citations contain pointers (like pdfFileExtractionId and character ranges) that need to be resolved to displayable data. Use the /citations/resolve endpoint:
Resolve citations
curl -X POST https://api.catalyzed.ai/citations/resolve \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "citations": [ { "type": "pdf_character_range", "pdfFileExtractionId": "extr_abc123", "charStart": 763, "charEnd": 787 } ] }'// First, extract citations from outputCitationsconst execution = await fetch( "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN", { headers: { Authorization: `Bearer ${apiToken}` } }).then(r => r.json());
// Flatten citations from all output fieldsconst citations = execution.outputCitations.flatMap( (oc) => oc.citations);
// Resolve citationsconst resolveResponse = await fetch( "https://api.catalyzed.ai/citations/resolve", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ teamId: "ZkoDMyjZZsXo4VAO_nJLk", citations, }), });
const { resolvedCitations } = await resolveResponse.json();# First, extract citations from outputCitationsresponse = requests.get( "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN", headers={"Authorization": f"Bearer {api_token}"})execution = response.json()
# Flatten citations from all output fieldscitations = []for oc in execution["outputCitations"]: citations.extend(oc["citations"])
# Resolve citationsresolve_response = requests.post( "https://api.catalyzed.ai/citations/resolve", headers={"Authorization": f"Bearer {api_token}"}, json={ "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "citations": citations })resolved_citations = resolve_response.json()["resolvedCitations"]Response:
{ "resolvedCitations": [ { "type": "pdf_character_range", "original": { "type": "pdf_character_range", "pdfFileExtractionId": "extr_abc123", "charStart": 763, "charEnd": 787 }, "file": { "fileId": "file_abc123", "fileName": "credit-card-statement.pdf", "fileSize": 2322, "mimeType": "application/pdf" }, "citedText": "CURRENT BALANCE: $999.00", "pageNumber": 1, "boundingBox": { "x1": 100, "y1": 200, "x2": 300, "y2": 220 } } ]}Resolved Citation Fields
Section titled “Resolved Citation Fields”Resolved citations include:
type- Citation type (same as original)original- The raw citation that was resolvedfile- File metadata (fileId, fileName, fileSize, mimeType)citedText- The actual text from the source (for character range citations)pageNumber- Page number (for PDF citations)boundingBox- Coordinates on the page (for PDF citations, may be null)chunkIndex- Chunk index (for file chunk citations)sourceType- Source type (for file chunk citations)
Complete Example
Section titled “Complete Example”Here’s a complete flow for displaying citations in your UI:
async function displayExecutionWithCitations(executionId: string) { // 1. Get execution const execution = await fetch( `https://api.catalyzed.ai/pipeline-executions/${executionId}`, { headers: { Authorization: `Bearer ${apiToken}` } } ).then(r => r.json());
if (execution.status !== "succeeded") { return; }
// 2. Extract citations const citations = execution.outputCitations.flatMap( (oc) => oc.citations );
if (citations.length === 0) { // No citations to display return; }
// 3. Resolve citations const { resolvedCitations } = await fetch( "https://api.catalyzed.ai/citations/resolve", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ teamId: execution.teamId, citations, }), } ).then(r => r.json());
// 4. Display in UI for (const resolved of resolvedCitations) { if (resolved.type === "error") { console.error("Citation resolution failed:", resolved.error); continue; }
console.log(`Cited from: ${resolved.file.fileName}`); if (resolved.type === "pdf_character_range") { console.log(`Page ${resolved.pageNumber}: "${resolved.citedText}"`); } else if (resolved.citedText) { console.log(`Text: "${resolved.citedText}"`); } }}Error Handling
Section titled “Error Handling”If a citation fails to resolve, the response includes an error citation:
{ "type": "error", "original": { ... }, "error": { "code": "CITATION_NOT_FOUND", "message": "Extraction not found" }}The /citations/resolve endpoint returns a 500 error if any citation fails to resolve (no partial success).
Limits
Section titled “Limits”- Maximum 100 citations per resolve request
- At least 1 citation required
Best Practices
Section titled “Best Practices”- Check for citations - Always check if
outputCitationsexists and has entries - Resolve in batch - Collect all citations from an execution and resolve them in one request
- Handle errors - Check for error citations in the resolved response
- Display context - Show file names, page numbers, and cited text to users
- Link to sources - Provide links to download or view the original files
API Reference
Section titled “API Reference”See the Citations API for complete endpoint documentation.