Skip to content

Schema Management

Tables in Catalyzed support schema evolution. You can add columns, modify types, and track schema versions over time.

Every table maintains a history of schema changes. Each modification creates a new version:

v1: Initial schema (id, name, email)
v2: Added column (id, name, email, phone)
v3: Changed type (id, name, email, phone_number)

Get current schema

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"schemaVersionId": "sch_abc123",
"versionNumber": 3,
"arrowSchema": "...",
"fields": [
{"name": "id", "dataType": "string", "nullable": false},
{"name": "name", "dataType": "string", "nullable": false},
{"name": "email", "dataType": "string", "nullable": true},
{"name": "phone_number", "dataType": "string", "nullable": true}
],
"primaryKeyColumns": ["id"],
"appliedAt": "2024-03-15T14:30:00Z"
}

List schema versions

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"versions": [
{
"schemaVersionId": "sch_abc123",
"versionNumber": 3,
"appliedAt": "2024-03-15T14:30:00Z"
},
{
"schemaVersionId": "sch_abc122",
"versionNumber": 2,
"appliedAt": "2024-02-10T09:15:00Z"
},
{
"schemaVersionId": "sch_abc121",
"versionNumber": 1,
"appliedAt": "2024-01-15T10:00:00Z"
}
]
}

Migrations allow you to evolve your schema in a controlled way.

OperationDescription
add_columnAdd a new column
drop_columnRemove a column
rename_columnRename a column
change_column_typeChange column data type
alter_column_nullabilityMake column nullable or non-nullable
set_primary_keySet primary key columns
drop_primary_keyRemove primary key

Add column migration

Terminal window
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"operations": [
{
"type": "add_column",
"params": {
"name": "phone",
"dataType": "string",
"nullable": true
}
}
]
}'
{
"operations": [
{
"type": "drop_column",
"params": {
"name": "legacy_field"
}
}
]
}
{
"operations": [
{
"type": "rename_column",
"params": {
"oldName": "phone",
"newName": "phone_number"
}
}
]
}
{
"operations": [
{
"type": "change_column_type",
"params": {
"name": "amount",
"oldType": "int32",
"newType": "float64"
}
}
]
}

Type conversion rules:

  • string → any type (if values are parseable)
  • int32int64, float32, float64, string
  • float32float64, string
  • timestampstring, date
{
"operations": [
{
"type": "alter_column_nullability",
"params": {
"name": "email",
"nullable": false
}
}
]
}
{
"operations": [
{
"type": "set_primary_key",
"params": {
"columns": ["id"],
"enforced": true
}
}
]
}
{
"operations": [
{
"type": "drop_primary_key",
"params": {}
}
]
}

Apply multiple changes in a single migration:

{
"operations": [
{
"type": "add_column",
"params": {"name": "updated_at", "dataType": "timestamp", "nullable": true}
},
{
"type": "rename_column",
"params": {"oldName": "created", "newName": "created_at"}
},
{
"type": "drop_column",
"params": {"name": "deprecated_field"}
}
]
}

Operations are applied in order.

Before applying a migration, validate it:

Plan migration (dry run)

Terminal window
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"operations": [
{"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": true}}
]
}'

Response:

{
"isValid": true,
"validationErrors": [],
"estimatedDurationMs": 1500,
"indexImpacts": [],
"requiresRewrite": false
}
{
"isValid": false,
"validationErrors": [
"Column 'email' already exists",
"Cannot change type from 'timestamp' to 'int64'"
],
"estimatedDurationMs": null,
"indexImpacts": [],
"requiresRewrite": false
}

Migrations run as background operations. List migrations for a table:

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"migrations": [
{
"migrationId": "cWBPHwTmJ-MaDFmei1WKF",
"status": "completed",
"fromVersionNumber": 1,
"toVersionNumber": 2,
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:30:45Z"
}
],
"total": 1,
"page": 1,
"pageSize": 20
}

Run the plan endpoint before applying migrations to catch errors early.

When adding columns, make them nullable initially:

{
"type": "add_column",
"params": {"name": "new_field", "dataType": "string", "nullable": true}
}

Then backfill data, then make non-nullable if needed.

These operations may break existing queries:

  • Dropping columns
  • Renaming columns
  • Changing column types

Coordinate with API consumers before applying.

Choose clear, descriptive names:

  • created_at instead of created
  • customer_id instead of cid
  • total_amount instead of amt

Track migrations in your application’s version control alongside code changes.

Indexes may need rebuilding after schema changes. Check index status:

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
-H "Authorization: Bearer $API_TOKEN"

To rebuild an index, drop and recreate it:

Terminal window
# Drop the index
curl -X DELETE https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes/idx_name \
-H "Authorization: Bearer $API_TOKEN"
# Recreate the index
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"indexName": "idx_name", "columnName": "column_name", "indexType": "btree"}'