Skip to content

Get extracted content from processed file

GET
/files/{fileId}/extracted-content

Returns the extracted content from file processing. For PDFs, returns markdown text and optionally structured extraction. For XLSX/CSV, returns TOON-formatted content.

fileId
required

Unique identifier of the file to retrieve

string

Unique identifier of the file to retrieve

Extracted content found

object
type
required

File type category

string
Allowed values: pdf docx xlsx csv text unsupported
extractionId
required

Unique identifier of the extraction record, or null if not processed

string
nullable
content
required

Extracted content, or null if not processed or extraction failed

object
markdown

Extracted markdown/text content (PDF, DOCX)

string
totalPages

Total number of pages in the PDF

number
totalBlocks

Total number of text blocks in the PDF

number
totalParagraphs

Total number of paragraphs (DOCX)

number
totalTables

Total number of tables (DOCX)

number
toon

TOON-formatted content (XLSX/CSV)

string
totalSheets

Total number of sheets (XLSX)

number
totalRows

Total number of rows

number
csvMetadata

CSV metadata (delimiter, encoding, etc.)

nullable
sheetsMetadata

XLSX sheets metadata

nullable
content

Raw text content (text files)

string
contentType

Content type (text/markdown or text/plain)

string
charCount

Character count (text files)

number
lineCount

Line count (text files)

number
structuredExtraction
required

Structured extraction (PDF only), or null if not available

object
extractionId
required

Unique identifier of the structured extraction

string
extractionStatus
required

Status: ‘complete’ (all sections succeeded), ‘partial’ (some failed but below threshold), ‘failed’ (too many sections failed)

string
nullable
Allowed values: complete partial failed
sections

Document sections/hierarchy

nullable
schemas

Per-section JSON schemas (null for failed sections)

nullable
extractedData

Per-section extracted data (null for failed sections)

nullable
metadata
required

Metadata about extraction including error details for failed sections

object
totalSections
required

Total number of sections in the document

number
sectionsSucceeded
required

Number of successfully extracted sections

number
sectionsFailed
required

Number of failed sections

number
failedContentRatio
required

Ratio of failed content (0.0 to 1.0)

number
failedSectionIndices
required

Indices of failed sections

Array<number>
sectionErrors
required

Error details for failed sections

Array<object>
object
index
required
number
stage
required
string
message
required
string
contentCoverageThreshold
required

Threshold used for failure determination (0.3)

number
processorVersion
required

Version of the processor used

string
nullable
processedAt
required

When the extraction was performed

string format: date-time
nullable
isEmpty
required

Whether the extracted content is empty

boolean

Bad Request - Validation error or invalid input

object
error
required
string
code
string
details
nullable
retryable
boolean

Unauthorized - Authentication required or invalid token

object
error
required
string
code
string
details
nullable
retryable
boolean

Forbidden - Insufficient permissions

object
error
required
string
code
string
details
nullable
retryable
boolean

Not Found - Resource does not exist

object
error
required
string
code
string
details
nullable
retryable
boolean