Schema Management

Tables in Catalyzed support schema evolution. You can add columns, modify types, and track schema versions over time.

Schema Versioning

Every table maintains a history of schema changes. Each modification creates a new version:

v1: Initial schema (id, name, email)
v2: Added column (id, name, email, phone)
v3: Changed type (id, name, email, phone_number)

View Current Schema

Get current schema

curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema \
  -H "Authorization: Bearer $API_TOKEN"

const response = await fetch(
  "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema",
  { headers: { Authorization: `Bearer ${apiToken}` } }
);
const schema = await response.json();

response = requests.get(
    "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema",
    headers={"Authorization": f"Bearer {api_token}"}
)
schema = response.json()

Response:

{
  "schemaVersionId": "sch_abc123",
  "versionNumber": 3,
  "arrowSchema": "...",
  "fields": [
    {"name": "id", "dataType": "string", "nullable": false},
    {"name": "name", "dataType": "string", "nullable": false},
    {"name": "email", "dataType": "string", "nullable": true},
    {"name": "phone_number", "dataType": "string", "nullable": true}
  ],
  "primaryKeyColumns": ["id"],
  "appliedAt": "2024-03-15T14:30:00Z"
}

View Schema History

List schema versions

curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions \
  -H "Authorization: Bearer $API_TOKEN"

const response = await fetch(
  "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions",
  { headers: { Authorization: `Bearer ${apiToken}` } }
);
const versions = await response.json();

response = requests.get(
    "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/schema/versions",
    headers={"Authorization": f"Bearer {api_token}"}
)
versions = response.json()

Response:

{
  "versions": [
    {
      "schemaVersionId": "sch_abc123",
      "versionNumber": 3,
      "appliedAt": "2024-03-15T14:30:00Z"
    },
    {
      "schemaVersionId": "sch_abc122",
      "versionNumber": 2,
      "appliedAt": "2024-02-10T09:15:00Z"
    },
    {
      "schemaVersionId": "sch_abc121",
      "versionNumber": 1,
      "appliedAt": "2024-01-15T10:00:00Z"
    }
  ]
}

Schema Migrations

Migrations allow you to evolve your schema in a controlled way.

Supported Operations

Operation	Description
`add_column`	Add a new column
`drop_column`	Remove a column
`rename_column`	Rename a column
`change_column_type`	Change column data type
`alter_column_nullability`	Make column nullable or non-nullable
`set_primary_key`	Set primary key columns
`drop_primary_key`	Remove primary key
`add_scalar_index`	Add a btree or bitmap index
`drop_scalar_index`	Remove a scalar index
`add_vector_index`	Add a vector similarity index
`drop_vector_index`	Remove a vector index

Add a Column

Add column migration

curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "operations": [
      {
        "type": "add_column",
        "params": {
          "name": "phone",
          "dataType": "string",
          "nullable": true
        }
      }
    ]
  }'

await fetch("https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiToken}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    operations: [
      {
        type: "add_column",
        params: {
          name: "phone",
          dataType: "string",
          nullable: true,
        },
      },
    ],
  }),
});

requests.post(
    "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "operations": [
            {
                "type": "add_column",
                "params": {
                    "name": "phone",
                    "dataType": "string",
                    "nullable": True
                }
            }
        ]
    }
)

Drop a Column

{
  "operations": [
    {
      "type": "drop_column",
      "params": {
        "name": "legacy_field"
      }
    }
  ]
}

Rename a Column

{
  "operations": [
    {
      "type": "rename_column",
      "params": {
        "oldName": "phone",
        "newName": "phone_number"
      }
    }
  ]
}

Change Column Type

{
  "operations": [
    {
      "type": "change_column_type",
      "params": {
        "name": "amount",
        "oldType": "int32",
        "newType": "float64"
      }
    }
  ]
}

Type conversion rules:

string → any type (if values are parseable)
int32 → int64, float32, float64, string
float32 → float64, string
timestamp → string, date

Alter Nullability

{
  "operations": [
    {
      "type": "alter_column_nullability",
      "params": {
        "name": "email",
        "nullable": false
      }
    }
  ]
}

Set Primary Key

{
  "operations": [
    {
      "type": "set_primary_key",
      "params": {
        "columns": ["id"],
        "enforced": true
      }
    }
  ]
}

Drop Primary Key

{
  "operations": [
    {
      "type": "drop_primary_key",
      "params": {}
    }
  ]
}

Multiple Operations

Apply multiple changes in a single migration:

{
  "operations": [
    {
      "type": "add_column",
      "params": {"name": "updated_at", "dataType": "timestamp", "nullable": true}
    },
    {
      "type": "rename_column",
      "params": {"oldName": "created", "newName": "created_at"}
    },
    {
      "type": "drop_column",
      "params": {"name": "deprecated_field"}
    }
  ]
}

Operations are applied in order.

Planning Migrations

Before applying a migration, validate it:

Plan migration (dry run)

curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "operations": [
      {"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": true}}
    ]
  }'

const response = await fetch(
  "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiToken}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      operations: [
        { type: "add_column", params: { name: "phone", dataType: "string", nullable: true } },
      ],
    }),
  }
);
const plan = await response.json();

response = requests.post(
    "https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations/plan",
    headers={"Authorization": f"Bearer {api_token}"},
    json={
        "operations": [
            {"type": "add_column", "params": {"name": "phone", "dataType": "string", "nullable": True}}
        ]
    }
)
plan = response.json()

Response:

{
  "isValid": true,
  "validationErrors": [],
  "estimatedDurationMs": 1500,
  "indexImpacts": [],
  "requiresRewrite": false
}

Validation Errors

{
  "isValid": false,
  "validationErrors": [
    "Column 'email' already exists",
    "Cannot change type from 'timestamp' to 'int64'"
  ],
  "estimatedDurationMs": null,
  "indexImpacts": [],
  "requiresRewrite": false
}

Migration Status

Migrations run as background operations. List migrations for a table:

curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/migrations \
  -H "Authorization: Bearer $API_TOKEN"

Response:

{
  "migrations": [
    {
      "migrationId": "cWBPHwTmJ-MaDFmei1WKF",
      "status": "completed",
      "fromVersionNumber": 1,
      "toVersionNumber": 2,
      "createdAt": "2024-01-15T10:30:00Z",
      "completedAt": "2024-01-15T10:30:45Z"
    }
  ],
  "total": 1,
  "page": 1,
  "pageSize": 20
}

Best Practices

1. Always Plan First

Run the plan endpoint before applying migrations to catch errors early.

2. Add Columns as Nullable

When adding columns, make them nullable initially:

{
  "type": "add_column",
  "params": {"name": "new_field", "dataType": "string", "nullable": true}
}

Then backfill data, then make non-nullable if needed.

3. Avoid Breaking Changes

These operations may break existing queries:

Dropping columns
Renaming columns
Changing column types

Coordinate with API consumers before applying.

4. Use Descriptive Column Names

Choose clear, descriptive names:

created_at instead of created
customer_id instead of cid
total_amount instead of amt

5. Document Changes

Track migrations in your application’s version control alongside code changes.

Index Migrations

You can also create and drop indexes as migration operations. This is an alternative to using the /indexes endpoint directly.

Add a Scalar Index

Create a btree or bitmap index on one or more columns:

{
  "operations": [
    {
      "type": "add_scalar_index",
      "params": {
        "name": "idx_customer_status",
        "columns": ["customer_id", "status"],
        "indexType": "btree"
      }
    }
  ]
}

Index Type	Use Case
`btree`	Equality and range queries (`=`, `>`, `<`, `BETWEEN`)
`bitmap`	Low-cardinality columns (status, category, boolean)

Drop a Scalar Index

{
  "operations": [
    {
      "type": "drop_scalar_index",
      "params": {
        "name": "idx_customer_status"
      }
    }
  ]
}

Add a Vector Index

Create a vector similarity index for ANN (approximate nearest neighbor) search:

{
  "operations": [
    {
      "type": "add_vector_index",
      "params": {
        "name": "idx_embedding",
        "column": "embedding",
        "config": {
          "indexType": "ivf_pq",
          "metric": "cosine",
          "dimensions": 384,
          "numPartitions": 256,
          "numSubVectors": 16
        }
      }
    }
  ]
}

Vector Index Types

Type	Description
`ivf_pq`	IVF with product quantization — good balance of speed and recall
`ivf_hnsw_pq`	IVF + HNSW with PQ — higher recall, more memory
`hnsw`	Hierarchical navigable small world — fast search, highest memory usage
`flat`	Exact search — no approximation, slower on large datasets

Distance Metrics

Metric	Description
`cosine`	Cosine similarity (most common for text embeddings)
`l2`	Euclidean distance
`dot`	Dot product similarity
`hamming`	Hamming distance (binary vectors)

Drop a Vector Index

{
  "operations": [
    {
      "type": "drop_vector_index",
      "params": {
        "name": "idx_embedding"
      }
    }
  ]
}

Index Management After Schema Changes

Indexes may need rebuilding after schema changes. Check index status:

curl https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
  -H "Authorization: Bearer $API_TOKEN"

To rebuild an index, drop and recreate it:

# Drop the index
curl -X DELETE https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes/idx_name \
  -H "Authorization: Bearer $API_TOKEN"

# Recreate the index
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \
  -H "Authorization: Bearer $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"indexName": "idx_name", "columnName": "column_name", "indexType": "btree"}'

Next Steps

Tables - Full table operations reference
Querying Data - SQL syntax and examples