Schema Management
Tables in Catalyzed support schema evolution. You can add columns, modify types, and track schema versions over time.
Schema Versioning
Section titled “Schema Versioning”Every table maintains a history of schema changes. Each modification creates a new version:
v1: Initial schema (id, name, email)v2: Added column (id, name, email, phone)v3: Changed type (id, name, email, phone_number)View Current Schema
Section titled “View Current Schema”Get current schema
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema \ -H "Authorization: Bearer $API_TOKEN"const response = await fetch( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema", { headers: { Authorization: `Bearer ${apiToken}` } });const schema = await response.json();response = requests.get( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema", headers={"Authorization": f"Bearer {api_token}"})schema = response.json()Response:
{ "schemaVersionId": "sch_abc123", "versionNumber": 3, "arrowSchema": "...", "fields": [ {"name": "id", "dataType": "string", "nullable": false}, {"name": "name", "dataType": "string", "nullable": false}, {"name": "email", "dataType": "string", "nullable": true}, {"name": "phone_number", "dataType": "string", "nullable": true} ], "primaryKeyColumns": ["id"], "appliedAt": "2024-03-15T14:30:00Z"}View Schema History
Section titled “View Schema History”List schema versions
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions \ -H "Authorization: Bearer $API_TOKEN"const response = await fetch( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions", { headers: { Authorization: `Bearer ${apiToken}` } });const versions = await response.json();response = requests.get( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions", headers={"Authorization": f"Bearer {api_token}"})versions = response.json()Response:
{ "versions": [ { "schemaVersionId": "sch_abc123", "versionNumber": 3, "appliedAt": "2024-03-15T14:30:00Z" }, { "schemaVersionId": "sch_abc122", "versionNumber": 2, "appliedAt": "2024-02-10T09:15:00Z" }, { "schemaVersionId": "sch_abc121", "versionNumber": 1, "appliedAt": "2024-01-15T10:00:00Z" } ]}Schema Migrations
Section titled “Schema Migrations”Migrations allow you to evolve your schema in a controlled way.
Supported Operations
Section titled “Supported Operations”| Operation | Description |
|---|---|
add_column | Add a new column |
drop_column | Remove a column |
rename_column | Rename a column |
change_column_type | Change column data type |
alter_column_nullability | Make column nullable or non-nullable |
set_primary_key | Set primary key columns |
drop_primary_key | Remove primary key |
add_scalar_index | Add a btree or bitmap index |
drop_scalar_index | Remove a scalar index |
add_vector_index | Add a vector similarity index |
drop_vector_index | Remove a vector index |
Add a Column
Section titled “Add a Column”Add column migration
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "operations": [ { "type": "add_column", "params": { "name": "phone", "dataType": "string", "nullable": true } } ] }'await fetch("https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ operations: [ { type: "add_column", params: { name: "phone", dataType: "string", nullable: true, }, }, ], }),});requests.post( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations", headers={"Authorization": f"Bearer {api_token}"}, json={ "operations": [ { "type": "add_column", "params": { "name": "phone", "dataType": "string", "nullable": True } } ] })Drop a Column
Section titled “Drop a Column”{ "operations": [ { "type": "drop_column", "params": { "name": "legacy_field" } } ]}Rename a Column
Section titled “Rename a Column”{ "operations": [ { "type": "rename_column", "params": { "oldName": "phone", "newName": "phone_number" } } ]}Change Column Type
Section titled “Change Column Type”{ "operations": [ { "type": "change_column_type", "params": { "name": "amount", "oldType": "int32", "newType": "float64" } } ]}Type conversion rules:
string→ any type (if values are parseable)int32→int64,float32,float64,stringfloat32→float64,stringtimestamp→string,date
Alter Nullability
Section titled “Alter Nullability”{ "operations": [ { "type": "alter_column_nullability", "params": { "name": "email", "nullable": false } } ]}Set Primary Key
Section titled “Set Primary Key”{ "operations": [ { "type": "set_primary_key", "params": { "columns": ["id"], "enforced": true } } ]}Drop Primary Key
Section titled “Drop Primary Key”{ "operations": [ { "type": "drop_primary_key", "params": {} } ]}Multiple Operations
Section titled “Multiple Operations”Apply multiple changes in a single migration:
{ "operations": [ { "type": "add_column", "params": {"name": "updated_at", "dataType": "timestamp", "nullable": true} }, { "type": "rename_column", "params": {"oldName": "created", "newName": "created_at"} }, { "type": "drop_column", "params": {"name": "deprecated_field"} } ]}Operations are applied in order.
Planning Migrations
Section titled “Planning Migrations”Before applying a migration, validate it:
Plan migration (dry run)
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "operations": [ {"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": true}} ] }'const response = await fetch( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ operations: [ { type: "add_column", params: { name: "phone", dataType: "string", nullable: true } }, ], }), });const plan = await response.json();response = requests.post( "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan", headers={"Authorization": f"Bearer {api_token}"}, json={ "operations": [ {"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": True}} ] })plan = response.json()Response:
{ "isValid": true, "validationErrors": [], "estimatedDurationMs": 1500, "indexImpacts": [], "requiresRewrite": false}Validation Errors
Section titled “Validation Errors”{ "isValid": false, "validationErrors": [ "Column 'email' already exists", "Cannot change type from 'timestamp' to 'int64'" ], "estimatedDurationMs": null, "indexImpacts": [], "requiresRewrite": false}Migration Status
Section titled “Migration Status”Migrations run as background operations. List migrations for a table:
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \ -H "Authorization: Bearer $API_TOKEN"Response:
{ "migrations": [ { "migrationId": "cWBPHwTmJ-MaDFmei1WKF", "status": "completed", "fromVersionNumber": 1, "toVersionNumber": 2, "createdAt": "2024-01-15T10:30:00Z", "completedAt": "2024-01-15T10:30:45Z" } ], "total": 1, "page": 1, "pageSize": 20}Best Practices
Section titled “Best Practices”1. Always Plan First
Section titled “1. Always Plan First”Run the plan endpoint before applying migrations to catch errors early.
2. Add Columns as Nullable
Section titled “2. Add Columns as Nullable”When adding columns, make them nullable initially:
{ "type": "add_column", "params": {"name": "new_field", "dataType": "string", "nullable": true}}Then backfill data, then make non-nullable if needed.
3. Avoid Breaking Changes
Section titled “3. Avoid Breaking Changes”These operations may break existing queries:
- Dropping columns
- Renaming columns
- Changing column types
Coordinate with API consumers before applying.
4. Use Descriptive Column Names
Section titled “4. Use Descriptive Column Names”Choose clear, descriptive names:
created_atinstead ofcreatedcustomer_idinstead ofcidtotal_amountinstead ofamt
5. Document Changes
Section titled “5. Document Changes”Track migrations in your application’s version control alongside code changes.
Index Migrations
Section titled “Index Migrations”You can also create and drop indexes as migration operations. This is an alternative to using the /indexes endpoint directly.
Add a Scalar Index
Section titled “Add a Scalar Index”Create a btree or bitmap index on one or more columns:
{ "operations": [ { "type": "add_scalar_index", "params": { "name": "idx_customer_status", "columns": ["customer_id", "status"], "indexType": "btree" } } ]}| Index Type | Use Case |
|---|---|
btree | Equality and range queries (=, >, <, BETWEEN) |
bitmap | Low-cardinality columns (status, category, boolean) |
Drop a Scalar Index
Section titled “Drop a Scalar Index”{ "operations": [ { "type": "drop_scalar_index", "params": { "name": "idx_customer_status" } } ]}Add a Vector Index
Section titled “Add a Vector Index”Create a vector similarity index for ANN (approximate nearest neighbor) search:
{ "operations": [ { "type": "add_vector_index", "params": { "name": "idx_embedding", "column": "embedding", "config": { "indexType": "ivf_pq", "metric": "cosine", "dimensions": 384, "numPartitions": 256, "numSubVectors": 16 } } } ]}Vector Index Types
Section titled “Vector Index Types”| Type | Description |
|---|---|
ivf_pq | IVF with product quantization — good balance of speed and recall |
ivf_hnsw_pq | IVF + HNSW with PQ — higher recall, more memory |
hnsw | Hierarchical navigable small world — fast search, highest memory usage |
flat | Exact search — no approximation, slower on large datasets |
Distance Metrics
Section titled “Distance Metrics”| Metric | Description |
|---|---|
cosine | Cosine similarity (most common for text embeddings) |
l2 | Euclidean distance |
dot | Dot product similarity |
hamming | Hamming distance (binary vectors) |
Drop a Vector Index
Section titled “Drop a Vector Index”{ "operations": [ { "type": "drop_vector_index", "params": { "name": "idx_embedding" } } ]}Index Management After Schema Changes
Section titled “Index Management After Schema Changes”Indexes may need rebuilding after schema changes. Check index status:
curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \ -H "Authorization: Bearer $API_TOKEN"To rebuild an index, drop and recreate it:
# Drop the indexcurl -X DELETE https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes/idx_name \ -H "Authorization: Bearer $API_TOKEN"
# Recreate the indexcurl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{"indexName": "idx_name", "columnName": "column_name", "indexType": "btree"}'Next Steps
Section titled “Next Steps”- Tables - Full table operations reference
- Querying Data - SQL syntax and examples