Tables
Tables are where your data lives. Each table has a defined schema and supports SQL queries, indexes, and schema evolution.
Creating a Table
Section titled “Creating a Table”Tables are created within a dataset:
Create a table
curl -X POST https://api.catalyzed.ai/dataset-tables \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tableName": "orders", "description": "Customer orders", "fields": [ {"name": "order_id", "arrowType": "utf8", "nullable": false}, {"name": "customer_id", "arrowType": "utf8", "nullable": false}, {"name": "amount", "arrowType": "float64", "nullable": false}, {"name": "status", "arrowType": "utf8", "nullable": false}, {"name": "created_at", "arrowType": "timestamp", "nullable": false} ], "primaryKeyColumns": ["order_id"] }'const response = await fetch("https://api.catalyzed.ai/dataset-tables", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ datasetId: "HoIEJNIPiQIy6TjVRxjwz", tableName: "orders", description: "Customer orders", fields: [ { name: "order_id", arrowType: "utf8", nullable: false }, { name: "customer_id", arrowType: "utf8", nullable: false }, { name: "amount", arrowType: "float64", nullable: false }, { name: "status", arrowType: "utf8", nullable: false }, { name: "created_at", arrowType: "timestamp", nullable: false }, ], primaryKeyColumns: ["order_id"], }),});response = requests.post( "https://api.catalyzed.ai/dataset-tables", headers={"Authorization": f"Bearer {api_token}"}, json={ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tableName": "orders", "description": "Customer orders", "fields": [ {"name": "order_id", "arrowType": "utf8", "nullable": False}, {"name": "customer_id", "arrowType": "utf8", "nullable": False}, {"name": "amount", "arrowType": "float64", "nullable": False}, {"name": "status", "arrowType": "utf8", "nullable": False}, {"name": "created_at", "arrowType": "timestamp", "nullable": False} ], "primaryKeyColumns": ["order_id"] })Supported Data Types
Section titled “Supported Data Types”Catalyzed uses Apache Arrow data types. Type names are case-insensitive. Here’s a quick summary:
| Category | Types |
|---|---|
| Integer | int8, int16, int32, int64, uint8, uint16, uint32, uint64 |
| Floating Point | float16, float32 (float), float64 (double) |
| String | utf8 (string), largeutf8 |
| Binary | binary, largebinary |
| Boolean | bool (boolean) |
| Date/Time | date32, date64, timestamp, timestamp[s], timestamp[ms], timestamp[us], timestamp[ns] |
| Other | null, list<T> |
For the complete reference with value ranges, type coercion rules, aliases, and usage guidance, see the Data Types page.
Writing Data
Section titled “Writing Data”Write Rows
Section titled “Write Rows”Write data to a table using the /rows endpoint. The write mode is specified as a query parameter, and the request body is a JSON array of row objects.
Insert rows
curl -X POST "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '[ { "order_id": "ORD-001", "customer_id": "CUST-100", "amount": 99.99, "status": "completed", "created_at": "2024-01-15T10:30:00Z" }, { "order_id": "ORD-002", "customer_id": "CUST-101", "amount": 149.50, "status": "pending", "created_at": "2024-01-15T14:45:00Z" } ]'await fetch( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify([ { order_id: "ORD-001", customer_id: "CUST-100", amount: 99.99, status: "completed", created_at: "2024-01-15T10:30:00Z", }, { order_id: "ORD-002", customer_id: "CUST-101", amount: 149.50, status: "pending", created_at: "2024-01-15T14:45:00Z", }, ]), });requests.post( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append", headers={"Authorization": f"Bearer {api_token}"}, json=[ { "order_id": "ORD-001", "customer_id": "CUST-100", "amount": 99.99, "status": "completed", "created_at": "2024-01-15T10:30:00Z" }, { "order_id": "ORD-002", "customer_id": "CUST-101", "amount": 149.50, "status": "pending", "created_at": "2024-01-15T14:45:00Z" } ])Query Parameters
Section titled “Query Parameters”| Parameter | Required | Description |
|---|---|---|
mode | Yes | Write operation mode (see below) |
idempotency_key | No | Unique key for exactly-once write semantics |
skip_validation | No | Skip schema validation for faster writes |
Write Modes
Section titled “Write Modes”The mode query parameter controls how data is written:
| Mode | Description |
|---|---|
append | Insert new rows without duplicate checking (fastest) |
upsert | Insert new rows or update existing by primary key |
overwrite | Replace all existing data in the table |
delete | Delete rows by primary key |
Upsert Example
Section titled “Upsert Example”Update existing rows or insert new ones based on primary key:
curl -X POST "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=upsert" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '[{"order_id": "ORD-001", "status": "shipped", "amount": 99.99}]'Delete Example
Section titled “Delete Example”Delete rows by primary key values:
curl -X POST "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=delete" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '["ORD-001", "ORD-002"]'For tables with composite primary keys, pass objects:
[{"order_id": "ORD-001", "tenant_id": "T1"}, {"order_id": "ORD-002", "tenant_id": "T1"}]Arrow IPC Format
Section titled “Arrow IPC Format”For high-performance data ingestion, send data in Apache Arrow IPC format instead of JSON. This is ideal for:
- Large batch inserts (millions of rows)
- Direct integration with pandas, Polars, or DuckDB
- Avoiding JSON serialization overhead
Write with Arrow IPC
# Arrow IPC data must be generated programmaticallycurl -X POST "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append" \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/vnd.apache.arrow.stream" \ --data-binary @data.arrowimport * as arrow from "apache-arrow";
// Create Arrow tableconst table = arrow.tableFromArrays({ order_id: ["ORD-001", "ORD-002"], customer_id: ["CUST-100", "CUST-101"], amount: [99.99, 149.5], status: ["completed", "pending"],});
// Serialize to IPC stream formatconst ipcBytes = arrow.tableToIPC(table, "stream");
// Send to APIawait fetch( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/vnd.apache.arrow.stream", }, body: ipcBytes, });import pyarrow as paimport requests
# Create Arrow table from pandas DataFramedf = pd.DataFrame({ "order_id": ["ORD-001", "ORD-002"], "customer_id": ["CUST-100", "CUST-101"], "amount": [99.99, 149.50], "status": ["completed", "pending"],})table = pa.Table.from_pandas(df)
# Serialize to IPC stream formatsink = pa.BufferOutputStream()with pa.ipc.new_stream(sink, table.schema) as writer: writer.write_table(table)arrow_bytes = sink.getvalue().to_pybytes()
# Send to APIrequests.post( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/rows?mode=append", headers={ "Authorization": f"Bearer {api_token}", "Content-Type": "application/vnd.apache.arrow.stream" }, data=arrow_bytes)Querying Data
Section titled “Querying Data”Use the /queries endpoint to query your tables with SQL. You can query a single table or join multiple tables together.
Query table data
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM orders WHERE status = '\''completed'\'' ORDER BY created_at DESC LIMIT 10", "tables": { "orders": "KzaMsfA0LSw_Ld0KyaXIS" } }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: "SELECT * FROM orders WHERE status = 'completed' ORDER BY created_at DESC LIMIT 10", tables: { orders: "KzaMsfA0LSw_Ld0KyaXIS", }, }),});const data = await response.json();response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": "SELECT * FROM orders WHERE status = 'completed' ORDER BY created_at DESC LIMIT 10", "tables": { "orders": "KzaMsfA0LSw_Ld0KyaXIS" } })data = response.json()The tables parameter maps table names used in your SQL to their table IDs. This works the same way whether you’re querying one table or joining multiple tables.
Query Response
Section titled “Query Response”{ "queryId": "qry_abc123", "columns": [ {"name": "order_id", "type": "Utf8"}, {"name": "customer_id", "type": "Utf8"}, {"name": "amount", "type": "Float64"}, {"name": "status", "type": "Utf8"}, {"name": "created_at", "type": "Timestamp(Microsecond, Some(\"UTC\"))"} ], "rows": [ {"order_id": "ORD-001", "customer_id": "CUST-100", "amount": 99.99, "status": "completed", "created_at": "2024-01-15T10:30:00Z"}, {"order_id": "ORD-002", "customer_id": "CUST-101", "amount": 149.50, "status": "completed", "created_at": "2024-01-15T09:15:00Z"} ], "rowCount": 2, "truncated": false, "stats": { "executionTimeMs": 42, "planningTimeMs": 5, "bytesScanned": 1024, "rowsScanned": 100 }}The stats field is included when includeStats: true is passed in the request. A usage field with detailed I/O metrics is also returned for billing purposes.
Joining Tables
Section titled “Joining Tables”Query across multiple tables by including them in the tables mapping:
Join multiple tables
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT o.order_id, c.name, o.amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id", "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: `SELECT o.order_id, c.name, o.amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id`, tables: { orders: "Ednc5U676CO4hn-FqsXeA", customers: "6fTBbbj4uv8TVMVh0gVch", }, }),});response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": """ SELECT o.order_id, c.name, o.amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id """, "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } })See the Querying Data guide for more SQL examples and best practices.
Table Schema
Section titled “Table Schema”Get Schema
Section titled “Get Schema”Get table schema
curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/schema \ -H "Authorization: Bearer $API_TOKEN"const response = await fetch( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/schema", { headers: { Authorization: `Bearer ${apiToken}` } });const schema = await response.json();response = requests.get( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/schema", headers={"Authorization": f"Bearer {api_token}"})schema = response.json()Schema Versioning
Section titled “Schema Versioning”Tables track schema versions. Each modification creates a new version:
curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/schema/versions \ -H "Authorization: Bearer $API_TOKEN"See the Schema Management guide for migration details.
System Columns
Section titled “System Columns”Every table has an internal _rowid system column — a stable, unique identifier for each row assigned by the storage engine. This column is excluded from SELECT * and schema endpoints by default to keep query results focused on your data.
When you need row-level tracking (e.g., lineage, citations, or deduplication), you can opt in by setting includeRowId: true on the table binding. See the Querying Data guide for usage details.
Indexes
Section titled “Indexes”Indexes improve query performance for filtered columns.
Create an Index
Section titled “Create an Index”Create an index
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "indexName": "idx_customer_id", "columnName": "customer_id", "indexType": "btree" }'await fetch("https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ indexName: "idx_customer_id", columnName: "customer_id", indexType: "btree", }),});requests.post( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes", headers={"Authorization": f"Bearer {api_token}"}, json={ "indexName": "idx_customer_id", "columnName": "customer_id", "indexType": "btree" })Index Types
Section titled “Index Types”| Type | Use Case |
|---|---|
btree | Equality and range queries on scalar columns |
ivf_pq | Vector similarity search (ANN) |
ivf_hnsw_pq | High-recall vector search |
ivf_hnsw_sq | Memory-efficient vector search |
List Indexes
Section titled “List Indexes”curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \ -H "Authorization: Bearer $API_TOKEN"Drop an Index
Section titled “Drop an Index”curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes/idx_customer_id \ -H "Authorization: Bearer $API_TOKEN"Table Operations
Section titled “Table Operations”Get Execution Plan
Section titled “Get Execution Plan”Preview how a query will execute using the /queries/explain endpoint:
curl -X POST https://api.catalyzed.ai/queries/explain \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM orders WHERE customer_id = '\''CUST-100'\''", "tables": {"orders": "KzaMsfA0LSw_Ld0KyaXIS"} }'Compact Table
Section titled “Compact Table”Optimize storage by merging small files:
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/compact \ -H "Authorization: Bearer $API_TOKEN"Compute Statistics
Section titled “Compute Statistics”Update table statistics for query optimization:
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/statistics \ -H "Authorization: Bearer $API_TOKEN"Deleting a Table
Section titled “Deleting a Table”curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA \ -H "Authorization: Bearer $API_TOKEN"API Reference
Section titled “API Reference”See the Dataset Tables API for complete endpoint documentation.