Get extracted content from processed file

Authorizations

bearerAuth
cookieAuth

Path Parameters

fileId

required

Unique identifier of the file to retrieve

string

Unique identifier of the file to retrieve

Query Parameters

includeSourceMap

Include source map data for character-to-source position mapping. Returns a discriminated union on mappingType: ‘spatial’ (PDF with bboxes), ‘structural’ (DOCX with XPath), or ‘tabular’ (XLSX/CSV with cell refs).

string

default: false

Allowed values: true false

Include source map data for character-to-source position mapping. Returns a discriminated union on mappingType: ‘spatial’ (PDF with bboxes), ‘structural’ (DOCX with XPath), or ‘tabular’ (XLSX/CSV with cell refs).

200

Extracted content found

object

type

required

File type category

string

Allowed values: pdf docx xlsx csv text unsupported

extractionId

required

Unique identifier of the extraction record, or null if not processed

string

nullable

content

required

Extracted content, or null if not processed or extraction failed

object

markdown

Extracted markdown/text content (PDF, DOCX)

string

totalPages

Total number of pages in the PDF

number

totalBlocks

Total number of text blocks in the PDF

number

totalParagraphs

Total number of paragraphs (DOCX)

number

totalTables

Total number of tables (DOCX)

number

toon

TOON-formatted content (XLSX/CSV)

string

totalSheets

Total number of sheets (XLSX)

number

totalRows

Total number of rows

number

csvMetadata

CSV metadata (delimiter, encoding, etc.)

nullable

sheetsMetadata

XLSX sheets metadata

nullable

content

Raw text content (text files)

string

contentType

Content type (text/markdown or text/plain)

string

charCount

Character count (text files)

number

lineCount

Line count (text files)

number

structuredExtraction

required

Structured extraction (PDF only), or null if not available

object

extractionId

required

Unique identifier of the structured extraction

string

extractionStatus

required

Status: ‘complete’ (all sections succeeded), ‘partial’ (some failed but below threshold), ‘failed’ (too many sections failed)

string

nullable

Allowed values: complete partial failed

sections

Document sections/hierarchy

nullable

schemas

Per-section JSON schemas (null for failed sections)

nullable

extractedData

Per-section extracted data (null for failed sections)

nullable

metadata

required

Metadata about extraction including error details for failed sections

object

totalSections

required

Total number of sections in the document

number

sectionsSucceeded

required

Number of successfully extracted sections

number

sectionsFailed

required

Number of failed sections

number

failedContentRatio

required

Ratio of failed content (0.0 to 1.0)

number

failedSectionIndices

required

Indices of failed sections

Array<number>

sectionErrors

required

Error details for failed sections

Array<object>

object

index

required

number

stage

required

string

message

required

string

contentCoverageThreshold

required

Threshold used for failure determination (0.3)

number

sourceMap

Any of:

object

mappingType

required

string

Allowed values: spatial

spans

required

Array<object>

object

spanId

required

Unique span/block identifier from the PDF parser

string

charStart

required

Start character offset in markdown (inclusive)

integer

charEnd

required

End character offset in markdown (exclusive)

integer

pageId

required

0-indexed page number

integer

blockType

Block type (Text, SectionHeader, ListItem, Table, etc.)

string

bbox

Bounding box coordinates on the page

object

page_id

required

integer

x1

required

number

y1

required

number

x2

required

number

y2

required

number

totalSpans

required

integer

processorVersion

required

Version of the processor used

string

nullable

processedAt

required

When the extraction was performed

string format: date-time

nullable

isEmpty

required

Whether the extracted content is empty

boolean

400

Bad Request - Validation error or invalid input

object

error

required

string

code

string

details

nullable

retryable

boolean

401

Unauthorized - Authentication required or invalid token

object

error

required

string

code

string

details

nullable

retryable

boolean

403

Forbidden - Insufficient permissions

object

error

required

string

code

string

details

nullable

retryable

boolean

404

Not Found - Resource does not exist

object

error

required

string

code

string

details

nullable

retryable

boolean

Get extracted content from processed file

Authorizations

Parameters

Path Parameters

Query Parameters

Responses

200

400

401

403

404