Citations

Citations enable attribution and provenance tracking for AI-generated outputs. When a pipeline execution completes, it includes outputCitations that map output fields to source documents, allowing you to show users exactly where information came from.

What Are Citations?

Citations are pointers to source data at various levels of granularity:

File-level - References an entire file
PDF Character Range - References specific text within a PDF (with page numbers)
CSV/XLSX Character Range - References specific cells or rows
File Chunk - References a semantic chunk from processed documents

Each citation includes metadata that can be resolved to displayable information like:

File name and metadata
Page numbers (for PDFs)
Bounding box coordinates (for PDFs)
Cited text excerpts
Source references (e.g., “data.csv!B5” for CSV cells)

Accessing Citations from Executions

When a pipeline execution succeeds, check the outputCitations field:

Get execution with citations

curl https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN \
  -H "Authorization: Bearer $API_TOKEN"

const response = await fetch(
  "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN",
  { headers: { Authorization: `Bearer ${apiToken}` } }
);
const execution = await response.json();

// Access citations
const citations = execution.outputCitations;

response = requests.get(
    "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN",
    headers={"Authorization": f"Bearer {api_token}"}
)
execution = response.json()

# Access citations
citations = execution["outputCitations"]

Response includes outputCitations:

{
  "executionId": "GkR8I6rHBms3W4Qfa2-FN",
  "status": "succeeded",
  "output": {
    "answer": "The total amount of new purchases in December 2024 was $999.00"
  },
  "outputCitations": [
    {
      "outputPointer": "",
      "citations": [
        {
          "type": "pdf_character_range",
          "pdfFileExtractionId": "extr_abc123",
          "charStart": 763,
          "charEnd": 787
        }
      ]
    }
  ]
}

Citation Structure

Each entry in outputCitations contains:

outputPointer - JSON Pointer (RFC 6901) to the output field. Empty string "" means the root output.
citations - Array of citation objects pointing to source documents

Citation Types

PDF Character Range

{
  "type": "pdf_character_range",
  "pdfFileExtractionId": "extr_abc123",
  "charStart": 763,
  "charEnd": 787
}

References a character range in the extracted PDF content. Can be resolved to page numbers and bounding boxes.

CSV Character Range

{
  "type": "csv_character_range",
  "csvFileExtractionId": "extr_def456",
  "charStart": 500,
  "charEnd": 550
}

References a character range in CSV content. Can be resolved to row/column references.

XLSX Character Range

{
  "type": "xlsx_character_range",
  "xlsxFileExtractionId": "extr_ghi789",
  "charStart": 1200,
  "charEnd": 1250
}

References a character range in XLSX content. Can be resolved to sheet/row/column references.

File Chunk

{
  "type": "file_chunk",
  "fileChunkId": "chunk_xyz"
}

References a semantic chunk from processed documents.

File-Level

{
  "type": "file",
  "fileId": "file_abc123"
}

References an entire file.

Resolving Citations

Raw citations contain pointers (like pdfFileExtractionId and character ranges) that need to be resolved to displayable data. Use the /citations/resolve endpoint:

Resolve citations

curl -X POST https://api.catalyzed.ai/citations/resolve \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "teamId": "ZkoDMyjZZsXo4VAO_nJLk",
    "citations": [
      {
        "type": "pdf_character_range",
        "pdfFileExtractionId": "extr_abc123",
        "charStart": 763,
        "charEnd": 787
      }
    ]
  }'

// First, extract citations from outputCitations
const execution = await fetch(
  "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN",
  { headers: { Authorization: `Bearer ${apiToken}` } }
).then(r => r.json());

// Flatten citations from all output fields
const citations = execution.outputCitations.flatMap(
  (oc) => oc.citations
);

// Resolve citations
const resolveResponse = await fetch(
  "https://api.catalyzed.ai/citations/resolve",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      teamId: "ZkoDMyjZZsXo4VAO_nJLk",
      citations,
    }),
  }
);

const { resolvedCitations } = await resolveResponse.json();

# First, extract citations from outputCitations
response = requests.get(
    "https://api.catalyzed.ai/pipeline-executions/GkR8I6rHBms3W4Qfa2-FN",
    headers={"Authorization": f"Bearer {api_token}"}
)
execution = response.json()

# Flatten citations from all output fields
citations = []
for oc in execution["outputCitations"]:
    citations.extend(oc["citations"])

# Resolve citations
resolve_response = requests.post(
    "https://api.catalyzed.ai/citations/resolve",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "teamId": "ZkoDMyjZZsXo4VAO_nJLk",
        "citations": citations
    }
)
resolved_citations = resolve_response.json()["resolvedCitations"]

Response:

{
  "resolvedCitations": [
    {
      "type": "pdf_character_range",
      "original": {
        "type": "pdf_character_range",
        "pdfFileExtractionId": "extr_abc123",
        "charStart": 763,
        "charEnd": 787
      },
      "file": {
        "fileId": "file_abc123",
        "fileName": "credit-card-statement.pdf",
        "fileSize": 2322,
        "mimeType": "application/pdf"
      },
      "citedText": "CURRENT BALANCE: $999.00",
      "pageNumber": 1,
      "boundingBox": {
        "x1": 100,
        "y1": 200,
        "x2": 300,
        "y2": 220
      }
    }
  ]
}

Resolved Citation Fields

Resolved citations include:

type - Citation type (same as original)
original - The raw citation that was resolved
file - File metadata (fileId, fileName, fileSize, mimeType)
citedText - The actual text from the source (for character range citations)
pageNumber - Page number (for PDF citations)
boundingBox - Coordinates on the page (for PDF citations, may be null)
chunkIndex - Chunk index (for file chunk citations)
sourceType - Source type (for file chunk citations)

Complete Example

Here’s a complete flow for displaying citations in your UI:

async function displayExecutionWithCitations(executionId: string) {
  // 1. Get execution
  const execution = await fetch(
    `https://api.catalyzed.ai/pipeline-executions/${executionId}`,
    { headers: { Authorization: `Bearer ${apiToken}` } }
  ).then(r => r.json());

  if (execution.status !== "succeeded") {
    return;
  }

  // 2. Extract citations
  const citations = execution.outputCitations.flatMap(
    (oc) => oc.citations
  );

  if (citations.length === 0) {
    // No citations to display
    return;
  }

  // 3. Resolve citations
  const { resolvedCitations } = await fetch(
    "https://api.catalyzed.ai/citations/resolve",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${apiToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        teamId: execution.teamId,
        citations,
      }),
    }
  ).then(r => r.json());

  // 4. Display in UI
  for (const resolved of resolvedCitations) {
    if (resolved.type === "error") {
      console.error("Citation resolution failed:", resolved.error);
      continue;
    }

    console.log(`Cited from: ${resolved.file.fileName}`);
    if (resolved.type === "pdf_character_range") {
      console.log(`Page ${resolved.pageNumber}: "${resolved.citedText}"`);
    } else if (resolved.citedText) {
      console.log(`Text: "${resolved.citedText}"`);
    }
  }
}

Error Handling

If a citation fails to resolve, the response includes an error citation:

{
  "type": "error",
  "original": { ... },
  "error": {
    "code": "CITATION_NOT_FOUND",
    "message": "Extraction not found"
  }
}

The /citations/resolve endpoint returns a 500 error if any citation fails to resolve (no partial success).

Limits

Maximum 100 citations per resolve request
At least 1 citation required

Best Practices

Check for citations - Always check if outputCitations exists and has entries
Resolve in batch - Collect all citations from an execution and resolve them in one request
Handle errors - Check for error citations in the resolved response
Display context - Show file names, page numbers, and cited text to users
Link to sources - Provide links to download or view the original files

API Reference

See the Citations API for complete endpoint documentation.