Skip to content

Schema Management

Tables in Catalyzed support schema evolution. You can add columns, modify types, and track schema versions over time.

Every table maintains a history of schema changes. Each modification creates a new version:

v1: Initial schema (id, name, email)
v2: Added column (id, name, email, phone)
v3: Changed type (id, name, email, phone_number)

Get current schema

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"schemaVersionId": "sch_abc123",
"versionNumber": 3,
"arrowSchema": "...",
"fields": [
{"name": "id", "dataType": "string", "nullable": false},
{"name": "name", "dataType": "string", "nullable": false},
{"name": "email", "dataType": "string", "nullable": true},
{"name": "phone_number", "dataType": "string", "nullable": true}
],
"primaryKeyColumns": ["id"],
"appliedAt": "2024-03-15T14:30:00Z"
}

List schema versions

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"versions": [
{
"schemaVersionId": "sch_abc123",
"versionNumber": 3,
"appliedAt": "2024-03-15T14:30:00Z"
},
{
"schemaVersionId": "sch_abc122",
"versionNumber": 2,
"appliedAt": "2024-02-10T09:15:00Z"
},
{
"schemaVersionId": "sch_abc121",
"versionNumber": 1,
"appliedAt": "2024-01-15T10:00:00Z"
}
]
}

Migrations allow you to evolve your schema in a controlled way.

OperationDescription
add_columnAdd a new column
drop_columnRemove a column
rename_columnRename a column
change_column_typeChange column data type
alter_column_nullabilityMake column nullable or non-nullable
set_primary_keySet primary key columns
drop_primary_keyRemove primary key
add_scalar_indexAdd a btree or bitmap index
drop_scalar_indexRemove a scalar index
add_vector_indexAdd a vector similarity index
drop_vector_indexRemove a vector index

Add column migration

Terminal window
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"operations": [
{
"type": "add_column",
"params": {
"name": "phone",
"dataType": "string",
"nullable": true
}
}
]
}'
{
"operations": [
{
"type": "drop_column",
"params": {
"name": "legacy_field"
}
}
]
}
{
"operations": [
{
"type": "rename_column",
"params": {
"oldName": "phone",
"newName": "phone_number"
}
}
]
}
{
"operations": [
{
"type": "change_column_type",
"params": {
"name": "amount",
"oldType": "int32",
"newType": "float64"
}
}
]
}

Type conversion rules:

  • string → any type (if values are parseable)
  • int32int64, float32, float64, string
  • float32float64, string
  • timestampstring, date
{
"operations": [
{
"type": "alter_column_nullability",
"params": {
"name": "email",
"nullable": false
}
}
]
}
{
"operations": [
{
"type": "set_primary_key",
"params": {
"columns": ["id"],
"enforced": true
}
}
]
}
{
"operations": [
{
"type": "drop_primary_key",
"params": {}
}
]
}

Apply multiple changes in a single migration:

{
"operations": [
{
"type": "add_column",
"params": {"name": "updated_at", "dataType": "timestamp", "nullable": true}
},
{
"type": "rename_column",
"params": {"oldName": "created", "newName": "created_at"}
},
{
"type": "drop_column",
"params": {"name": "deprecated_field"}
}
]
}

Operations are applied in order.

Before applying a migration, validate it:

Plan migration (dry run)

Terminal window
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"operations": [
{"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": true}}
]
}'

Response:

{
"isValid": true,
"validationErrors": [],
"estimatedDurationMs": 1500,
"indexImpacts": [],
"requiresRewrite": false
}
{
"isValid": false,
"validationErrors": [
"Column 'email' already exists",
"Cannot change type from 'timestamp' to 'int64'"
],
"estimatedDurationMs": null,
"indexImpacts": [],
"requiresRewrite": false
}

Migrations run as background operations. List migrations for a table:

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
-H "Authorization: Bearer $API_TOKEN"

Response:

{
"migrations": [
{
"migrationId": "cWBPHwTmJ-MaDFmei1WKF",
"status": "completed",
"fromVersionNumber": 1,
"toVersionNumber": 2,
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:30:45Z"
}
],
"total": 1,
"page": 1,
"pageSize": 20
}

Run the plan endpoint before applying migrations to catch errors early.

When adding columns, make them nullable initially:

{
"type": "add_column",
"params": {"name": "new_field", "dataType": "string", "nullable": true}
}

Then backfill data, then make non-nullable if needed.

These operations may break existing queries:

  • Dropping columns
  • Renaming columns
  • Changing column types

Coordinate with API consumers before applying.

Choose clear, descriptive names:

  • created_at instead of created
  • customer_id instead of cid
  • total_amount instead of amt

Track migrations in your application’s version control alongside code changes.

You can also create and drop indexes as migration operations. This is an alternative to using the /indexes endpoint directly.

Create a btree or bitmap index on one or more columns:

{
"operations": [
{
"type": "add_scalar_index",
"params": {
"name": "idx_customer_status",
"columns": ["customer_id", "status"],
"indexType": "btree"
}
}
]
}
Index TypeUse Case
btreeEquality and range queries (=, >, <, BETWEEN)
bitmapLow-cardinality columns (status, category, boolean)
{
"operations": [
{
"type": "drop_scalar_index",
"params": {
"name": "idx_customer_status"
}
}
]
}

Create a vector similarity index for ANN (approximate nearest neighbor) search:

{
"operations": [
{
"type": "add_vector_index",
"params": {
"name": "idx_embedding",
"column": "embedding",
"config": {
"indexType": "ivf_pq",
"metric": "cosine",
"dimensions": 384,
"numPartitions": 256,
"numSubVectors": 16
}
}
}
]
}
TypeDescription
ivf_pqIVF with product quantization — good balance of speed and recall
ivf_hnsw_pqIVF + HNSW with PQ — higher recall, more memory
hnswHierarchical navigable small world — fast search, highest memory usage
flatExact search — no approximation, slower on large datasets
MetricDescription
cosineCosine similarity (most common for text embeddings)
l2Euclidean distance
dotDot product similarity
hammingHamming distance (binary vectors)
{
"operations": [
{
"type": "drop_vector_index",
"params": {
"name": "idx_embedding"
}
}
]
}

Indexes may need rebuilding after schema changes. Check index status:

Terminal window
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
-H "Authorization: Bearer $API_TOKEN"

To rebuild an index, drop and recreate it:

Terminal window
# Drop the index
curl -X DELETE https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes/idx_name \
-H "Authorization: Bearer $API_TOKEN"
# Recreate the index
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"indexName": "idx_name", "columnName": "column_name", "indexType": "btree"}'