Managing Tables

Tables hold your data within a dataset. Each table has a defined schema, supports multiple ingestion formats, and provides SQL querying. This guide covers the full table lifecycle.

Creating a Table

A table requires a parent dataset, a name, and at least one field:

Create a table

curl -X POST https://api.catalyzed.ai/dataset-tables \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "datasetId": "HoIEJNIPiQIy6TjVRxjwz",
    "tableName": "orders",
    "description": "Customer orders",
    "fields": [
      {"name": "order_id", "arrowType": "utf8", "nullable": false},
      {"name": "customer_id", "arrowType": "utf8", "nullable": false},
      {"name": "amount", "arrowType": "float64", "nullable": false},
      {"name": "status", "arrowType": "utf8", "nullable": false},
      {"name": "created_at", "arrowType": "timestamp", "nullable": false}
    ],
    "primaryKeyColumns": ["order_id"]
  }'

const response = await fetch("https://api.catalyzed.ai/dataset-tables", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiToken}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    datasetId: "HoIEJNIPiQIy6TjVRxjwz",
    tableName: "orders",
    description: "Customer orders",
    fields: [
      { name: "order_id", arrowType: "utf8", nullable: false },
      { name: "customer_id", arrowType: "utf8", nullable: false },
      { name: "amount", arrowType: "float64", nullable: false },
      { name: "status", arrowType: "utf8", nullable: false },
      { name: "created_at", arrowType: "timestamp", nullable: false },
    ],
    primaryKeyColumns: ["order_id"],
  }),
});
const table = await response.json();

response = requests.post(
    "https://api.catalyzed.ai/dataset-tables",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "datasetId": "HoIEJNIPiQIy6TjVRxjwz",
        "tableName": "orders",
        "description": "Customer orders",
        "fields": [
            {"name": "order_id", "arrowType": "utf8", "nullable": False},
            {"name": "customer_id", "arrowType": "utf8", "nullable": False},
            {"name": "amount", "arrowType": "float64", "nullable": False},
            {"name": "status", "arrowType": "utf8", "nullable": False},
            {"name": "created_at", "arrowType": "timestamp", "nullable": False}
        ],
        "primaryKeyColumns": ["order_id"]
    }
)
table = response.json()

Response:

{
  "tableId": "Ednc5U676CO4hn-FqsXeA",
  "datasetId": "HoIEJNIPiQIy6TjVRxjwz",
  "tableName": "orders",
  "description": "Customer orders",
  "createdAt": "2024-01-15T10:30:00Z",
  "updatedAt": "2024-01-15T10:30:00Z"
}

Table Properties

Property	Type	Required	Description
`datasetId`	string	Yes	Parent dataset ID
`tableName`	string	Yes	Table name (1-255 characters, unique per dataset)
`description`	string	No	Optional description
`fields`	array	Yes	Column definitions (at least 1). See Data Types
`primaryKeyColumns`	string[]	No	Columns forming the primary key

Designing Schemas

Choosing Data Types

See the Data Types reference for the full list. Common patterns:

Use Case	Recommended Type	Why
IDs, names, labels	`utf8`	Flexible, no size constraints
Counts, quantities	`int64`	Wide range, no overflow risk
Prices, measurements	`float64`	Double precision avoids rounding
Yes/no flags	`bool`	Compact, queryable
Dates and times	`timestamp`	Microsecond default, supports formatting
Tags, categories	`list<utf8>`	Variable-length arrays
Embeddings	`list<float32>`	Efficient for vector search

Primary Key Design

Primary keys are required for upsert and delete write modes. Choose based on your data:

Single column — when one field uniquely identifies a row:

"primaryKeyColumns": ["order_id"]

Composite key — when uniqueness requires multiple columns:

"primaryKeyColumns": ["tenant_id", "order_id"]

Listing and Filtering Tables

List tables in a dataset

curl "https://api.catalyzed.ai/dataset-tables?datasetIds=HoIEJNIPiQIy6TjVRxjwz&orderBy=tableName&orderDirection=asc" \
  -H "Authorization: Bearer $API_TOKEN"

const params = new URLSearchParams({
  datasetIds: "HoIEJNIPiQIy6TjVRxjwz",
  orderBy: "tableName",
  orderDirection: "asc",
});

const response = await fetch(`https://api.catalyzed.ai/dataset-tables?${params}`, {
  headers: { Authorization: `Bearer ${apiToken}` },
});
const { tables, total } = await response.json();

response = requests.get(
    "https://api.catalyzed.ai/dataset-tables",
    params={
        "datasetIds": "HoIEJNIPiQIy6TjVRxjwz",
        "orderBy": "tableName",
        "orderDirection": "asc"
    },
    headers={"Authorization": f"Bearer {api_token}"}
)
result = response.json()
tables = result["tables"]

Filter Parameters

Parameter	Type	Description
`datasetTableIds`	string	Comma-separated table IDs
`datasetIds`	string	Comma-separated dataset IDs
`tableName`	string	Partial match on table name
`managed`	boolean	Filter by managed status
`page`	number	Page number, 1-indexed (default: 1)
`pageSize`	number	Results per page, 1-100 (default: 20)
`orderBy`	string	Sort by: `createdAt`, `tableName`, `updatedAt`
`orderDirection`	string	`asc` or `desc`

Updating a Table

Update a table’s name or description:

Update table

curl -X PUT https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "tableName": "customer_orders",
    "description": "Customer orders with shipping info"
  }'

const response = await fetch(
  "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA",
  {
    method: "PUT",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      tableName: "customer_orders",
      description: "Customer orders with shipping info",
    }),
  }
);
const updated = await response.json();

response = requests.put(
    "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "tableName": "customer_orders",
        "description": "Customer orders with shipping info"
    }
)
updated = response.json()

Ingestion Formats

Tables accept data in four formats. Choose based on your use case:

Format	Content-Type	Best For
JSON	`application/json`	Simple integrations, small batches (under 1,000 rows)
CSV	`text/csv`	Spreadsheet exports, human-readable data
Parquet	`application/parquet`	Large datasets, columnar analytics tools
Arrow IPC	`application/vnd.apache.arrow.stream`	Maximum performance, typed data pipelines

CSV and Parquet data is automatically coerced to match the table schema. Column name matching is case-insensitive for CSV.

See Ingesting Data for detailed examples of each format and write mode.

Table Statistics

Get row counts and activity metrics for a table:

curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/stats \
  -H "Authorization: Bearer $API_TOKEN"

Response:

{
  "tableId": "Ednc5U676CO4hn-FqsXeA",
  "rowCount": 15000,
  "totalQueries": 142,
  "totalIngests": 28,
  "lastQueryAt": "2025-01-15T14:30:00Z",
  "lastIngestAt": "2025-01-15T06:00:00Z"
}

Usage Timeseries

Track query and ingestion activity over time:

curl "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/usage/timeseries?granularity=day&startDate=2025-01-01&endDate=2025-01-31" \
  -H "Authorization: Bearer $API_TOKEN"

Table Maintenance

Compaction

After many write operations, a table may accumulate small storage fragments. Compaction merges these for better query performance:

curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/compact \
  -H "Authorization: Bearer $API_TOKEN"

Run compaction periodically on tables with frequent appends or upserts.

Compute Statistics

Update internal statistics used by the query optimizer. Run this after large data loads:

curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/statistics \
  -H "Authorization: Bearer $API_TOKEN"

Indexes

Indexes speed up queries on frequently filtered columns.

Create an Index

Create a btree index

curl -X POST https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "indexName": "idx_customer_id",
    "columnName": "customer_id",
    "indexType": "btree"
  }'

const response = await fetch(
  "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      indexName: "idx_customer_id",
      columnName: "customer_id",
      indexType: "btree",
    }),
  }
);
// Returns 202 - index is built asynchronously
const { operationId } = await response.json();

response = requests.post(
    "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "indexName": "idx_customer_id",
        "columnName": "customer_id",
        "indexType": "btree"
    }
)
# Returns 202 - index is built asynchronously
result = response.json()

Index creation returns 202 Accepted — the index is built asynchronously.

Index Types

Type	Use Case
`btree`	Equality and range queries on scalar columns
`ivf_pq`	Vector similarity search (ANN)
`ivf_flat`	Exact vector search (slower, higher recall)

List Indexes

curl https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes \
  -H "Authorization: Bearer $API_TOKEN"

Response includes index status (pending, built, failed), creation time, and error messages if applicable.

Drop an Index

curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA/indexes/idx_customer_id \
  -H "Authorization: Bearer $API_TOKEN"

You can also manage indexes through schema migrations.

Deleting a Table

Delete table

curl -X DELETE https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA \
  -H "Authorization: Bearer $API_TOKEN"

await fetch("https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA", {
  method: "DELETE",
  headers: { Authorization: `Bearer ${apiToken}` },
});

requests.delete(
    "https://api.catalyzed.ai/dataset-tables/Ednc5U676CO4hn-FqsXeA",
    headers={"Authorization": f"Bearer {api_token}"}
)

Returns 204 No Content on success.

Error Handling

Common errors when working with tables:

Error Code	Status	Cause
`DATASET_NOT_FOUND`	404	Parent dataset doesn’t exist
`TABLE_NOT_FOUND`	404	Table ID doesn’t exist
`TABLE_NAME_ALREADY_EXISTS`	409	Table name already used in this dataset
`MISSING_COLUMNS`	400	Ingested data is missing required columns
`COERCION_FAILED`	400	Data values can’t be converted to schema types
`FILE_TOO_LARGE`	413	Request body exceeds 100MB limit

Best Practices

Define primary keys upfront if you plan to use upsert or delete modes
Use utf8 for IDs — avoids integer overflow issues with large identifiers
Make columns nullable by default — easier to evolve the schema later
Run compaction after bulk data loads to optimize query performance
Name tables descriptively — customer_orders is better than tbl1
Keep related tables in one dataset — enables cross-table joins and unified management

Next Steps

Data Types - Full reference for all supported column types
Ingesting Data - Write data in JSON, CSV, Parquet, or Arrow IPC
Schema Management - Add columns, change types, manage indexes
Querying Data - SQL queries across one or more tables