Managing Tables
Tables hold your data within a dataset. Each table has a defined schema, supports multiple ingestion formats, and provides SQL querying. This guide covers the full table lifecycle.
Creating a Table
Section titled “Creating a Table”A table requires a parent dataset, a name, and at least one field:
Create a table
curl -X POST https://api.catalyzed.ai/dataset-tables \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tableName": "orders", "description": "Customer orders", "fields": [ {"name": "order_id", "arrowType": "utf8", "nullable": false}, {"name": "customer_id", "arrowType": "utf8", "nullable": false}, {"name": "amount", "arrowType": "float64", "nullable": false}, {"name": "status", "arrowType": "utf8", "nullable": false}, {"name": "created_at", "arrowType": "timestamp", "nullable": false} ], "primaryKeyColumns": ["order_id"] }'const response = await fetch("https://api.catalyzed.ai/dataset-tables", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ datasetId: "HoIEJNIPiQIy6TjVRxjwz", tableName: "orders", description: "Customer orders", fields: [ { name: "order_id", arrowType: "utf8", nullable: false }, { name: "customer_id", arrowType: "utf8", nullable: false }, { name: "amount", arrowType: "float64", nullable: false }, { name: "status", arrowType: "utf8", nullable: false }, { name: "created_at", arrowType: "timestamp", nullable: false }, ], primaryKeyColumns: ["order_id"], }),});const table = await response.json();response = requests.post( "https://api.catalyzed.ai/dataset-tables", headers={"Authorization": f"Bearer {api_token}"}, json={ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tableName": "orders", "description": "Customer orders", "fields": [ {"name": "order_id", "arrowType": "utf8", "nullable": False}, {"name": "customer_id", "arrowType": "utf8", "nullable": False}, {"name": "amount", "arrowType": "float64", "nullable": False}, {"name": "status", "arrowType": "utf8", "nullable": False}, {"name": "created_at", "arrowType": "timestamp", "nullable": False} ], "primaryKeyColumns": ["order_id"] })table = response.json()Response:
{ "tableId": "Ednc5U676CO4hn-FqsXeA", "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tableName": "orders", "description": "Customer orders", "createdAt": "2024-01-15T10:30:00Z", "updatedAt": "2024-01-15T10:30:00Z"}Table Properties
Section titled “Table Properties”| Property | Type | Required | Description |
|---|---|---|---|
datasetId | string | Yes | Parent dataset ID |
tableName | string | Yes | Table name (1-255 characters, unique per dataset) |
description | string | No | Optional description |
fields | array | Yes | Column definitions (at least 1). See Data Types |
primaryKeyColumns | string[] | No | Columns forming the primary key |
Designing Schemas
Section titled “Designing Schemas”Choosing Data Types
Section titled “Choosing Data Types”See the Data Types reference for the full list. Common patterns:
| Use Case | Recommended Type | Why |
|---|---|---|
| IDs, names, labels | utf8 | Flexible, no size constraints |
| Counts, quantities | int64 | Wide range, no overflow risk |
| Prices, measurements | float64 | Double precision avoids rounding |
| Yes/no flags | bool | Compact, queryable |
| Dates and times | timestamp | Microsecond default, supports formatting |
| Tags, categories | list<utf8> | Variable-length arrays |
| Embeddings | list<float32> | Efficient for vector search |
Primary Key Design
Section titled “Primary Key Design”Primary keys are required for upsert and delete write modes. Choose based on your data:
Single column — when one field uniquely identifies a row:
"primaryKeyColumns": ["order_id"]Composite key — when uniqueness requires multiple columns:
"primaryKeyColumns": ["tenant_id", "order_id"]Listing and Filtering Tables
Section titled “Listing and Filtering Tables”List tables in a dataset
curl "https://api.catalyzed.ai/dataset-tables?datasetIds=HoIEJNIPiQIy6TjVRxjwz&orderBy=tableName&orderDirection=asc" \ -H "Authorization: Bearer $API_TOKEN"const params = new URLSearchParams({ datasetIds: "HoIEJNIPiQIy6TjVRxjwz", orderBy: "tableName", orderDirection: "asc",});
const response = await fetch(`https://api.catalyzed.ai/dataset-tables?${params}`, { headers: { Authorization: `Bearer ${apiToken}` },});const { tables, total } = await response.json();response = requests.get( "https://api.catalyzed.ai/dataset-tables", params={ "datasetIds": "HoIEJNIPiQIy6TjVRxjwz", "orderBy": "tableName", "orderDirection": "asc" }, headers={"Authorization": f"Bearer {api_token}"})result = response.json()tables = result["tables"]Filter Parameters
Section titled “Filter Parameters”| Parameter | Type | Description |
|---|---|---|
datasetTableIds | string | Comma-separated table IDs |
datasetIds | string | Comma-separated dataset IDs |
tableName | string | Partial match on table name |
managed | boolean | Filter by managed status |
page | number | Page number, 1-indexed (default: 1) |
pageSize | number | Results per page, 1-100 (default: 20) |
orderBy | string | Sort by: createdAt, tableName, updatedAt |
orderDirection | string | asc or desc |
Updating a Table
Section titled “Updating a Table”Update a table’s name or description:
Update table
curl -X PUT https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "tableName": "customer_orders", "description": "Customer orders with shipping info" }'const response = await fetch( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA", { method: "PUT", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ tableName: "customer_orders", description: "Customer orders with shipping info", }), });const updated = await response.json();response = requests.put( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA", headers={"Authorization": f"Bearer {api_token}"}, json={ "tableName": "customer_orders", "description": "Customer orders with shipping info" })updated = response.json()Ingestion Formats
Section titled “Ingestion Formats”Tables accept data in four formats. Choose based on your use case:
| Format | Content-Type | Best For |
|---|---|---|
| JSON | application/json | Simple integrations, small batches (under 1,000 rows) |
| CSV | text/csv | Spreadsheet exports, human-readable data |
| Parquet | application/parquet | Large datasets, columnar analytics tools |
| Arrow IPC | application/vnd.apache.arrow.stream | Maximum performance, typed data pipelines |
CSV and Parquet data is automatically coerced to match the table schema. Column name matching is case-insensitive for CSV.
See Ingesting Data for detailed examples of each format and write mode.
Table Statistics
Section titled “Table Statistics”Get row counts and activity metrics for a table:
curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/stats \ -H "Authorization: Bearer $API_TOKEN"Response:
{ "tableId": "Ednc5U676CO4hn-FqsXeA", "rowCount": 15000, "totalQueries": 142, "totalIngests": 28, "lastQueryAt": "2025-01-15T14:30:00Z", "lastIngestAt": "2025-01-15T06:00:00Z"}Usage Timeseries
Section titled “Usage Timeseries”Track query and ingestion activity over time:
curl "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/usage/timeseries?granularity=day&startDate=2025-01-01&endDate=2025-01-31" \ -H "Authorization: Bearer $API_TOKEN"Table Maintenance
Section titled “Table Maintenance”Compaction
Section titled “Compaction”After many write operations, a table may accumulate small storage fragments. Compaction merges these for better query performance:
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/compact \ -H "Authorization: Bearer $API_TOKEN"Run compaction periodically on tables with frequent appends or upserts.
Compute Statistics
Section titled “Compute Statistics”Update internal statistics used by the query optimizer. Run this after large data loads:
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/statistics \ -H "Authorization: Bearer $API_TOKEN"Indexes
Section titled “Indexes”Indexes speed up queries on frequently filtered columns.
Create an Index
Section titled “Create an Index”Create a btree index
curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "indexName": "idx_customer_id", "columnName": "customer_id", "indexType": "btree" }'const response = await fetch( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ indexName: "idx_customer_id", columnName: "customer_id", indexType: "btree", }), });// Returns 202 - index is built asynchronouslyconst { operationId } = await response.json();response = requests.post( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes", headers={"Authorization": f"Bearer {api_token}"}, json={ "indexName": "idx_customer_id", "columnName": "customer_id", "indexType": "btree" })# Returns 202 - index is built asynchronouslyresult = response.json()Index creation returns 202 Accepted — the index is built asynchronously.
Index Types
Section titled “Index Types”| Type | Use Case |
|---|---|
btree | Equality and range queries on scalar columns |
ivf_pq | Vector similarity search (ANN) |
ivf_flat | Exact vector search (slower, higher recall) |
List Indexes
Section titled “List Indexes”curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \ -H "Authorization: Bearer $API_TOKEN"Response includes index status (pending, built, failed), creation time, and error messages if applicable.
Drop an Index
Section titled “Drop an Index”curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes/idx_customer_id \ -H "Authorization: Bearer $API_TOKEN"You can also manage indexes through schema migrations.
Deleting a Table
Section titled “Deleting a Table”Delete table
curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA \ -H "Authorization: Bearer $API_TOKEN"await fetch("https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA", { method: "DELETE", headers: { Authorization: `Bearer ${apiToken}` },});requests.delete( "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA", headers={"Authorization": f"Bearer {api_token}"})Returns 204 No Content on success.
Error Handling
Section titled “Error Handling”Common errors when working with tables:
| Error Code | Status | Cause |
|---|---|---|
DATASET_NOT_FOUND | 404 | Parent dataset doesn’t exist |
TABLE_NOT_FOUND | 404 | Table ID doesn’t exist |
TABLE_NAME_ALREADY_EXISTS | 409 | Table name already used in this dataset |
MISSING_COLUMNS | 400 | Ingested data is missing required columns |
COERCION_FAILED | 400 | Data values can’t be converted to schema types |
FILE_TOO_LARGE | 413 | Request body exceeds 100MB limit |
Best Practices
Section titled “Best Practices”- Define primary keys upfront if you plan to use upsert or delete modes
- Use
utf8for IDs — avoids integer overflow issues with large identifiers - Make columns nullable by default — easier to evolve the schema later
- Run compaction after bulk data loads to optimize query performance
- Name tables descriptively —
customer_ordersis better thantbl1 - Keep related tables in one dataset — enables cross-table joins and unified management
Next Steps
Section titled “Next Steps”- Data Types - Full reference for all supported column types
- Ingesting Data - Write data in JSON, CSV, Parquet, or Arrow IPC
- Schema Management - Add columns, change types, manage indexes
- Querying Data - SQL queries across one or more tables