Managing Datasets
Datasets are logical containers that group related tables together within a team. This guide covers the full lifecycle of managing datasets.
Creating a Dataset
Section titled “Creating a Dataset”Create a dataset by providing a team ID, a name, and optionally a description, tags, and metadata:
Create a dataset
curl -X POST https://api.catalyzed.ai/datasets \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "name": "Sales Analytics", "description": "Sales data for analytics and reporting", "tags": ["sales", "analytics", "production"], "metadata": { "owner": "data-team", "source": "salesforce", "refreshFrequency": "daily" } }'const response = await fetch("https://api.catalyzed.ai/datasets", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ teamId: "ZkoDMyjZZsXo4VAO_nJLk", name: "Sales Analytics", description: "Sales data for analytics and reporting", tags: ["sales", "analytics", "production"], metadata: { owner: "data-team", source: "salesforce", refreshFrequency: "daily", }, }),});const dataset = await response.json();response = requests.post( "https://api.catalyzed.ai/datasets", headers={"Authorization": f"Bearer {api_token}"}, json={ "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "name": "Sales Analytics", "description": "Sales data for analytics and reporting", "tags": ["sales", "analytics", "production"], "metadata": { "owner": "data-team", "source": "salesforce", "refreshFrequency": "daily" } })dataset = response.json()Response:
{ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "name": "Sales Analytics", "description": "Sales data for analytics and reporting", "tags": ["sales", "analytics", "production"], "metadata": { "owner": "data-team", "source": "salesforce", "refreshFrequency": "daily" }, "managed": false, "createdAt": "2024-01-15T10:30:00Z", "updatedAt": "2024-01-15T10:30:00Z"}Dataset Properties
Section titled “Dataset Properties”| Property | Type | Required | Description |
|---|---|---|---|
teamId | string | Yes | Team that owns this dataset |
name | string | Yes | Human-readable name (1-255 characters, unique per team) |
description | string | No | Optional description |
tags | string[] | No | Tags for filtering and organization (defaults to []) |
metadata | object | No | Arbitrary key-value metadata (defaults to {}) |
Listing and Filtering Datasets
Section titled “Listing and Filtering Datasets”List datasets with optional filters, pagination, and sorting:
List datasets with filters
curl "https://api.catalyzed.ai/datasets?teamIds=ZkoDMyjZZsXo4VAO_nJLk&tags=production&orderBy=name&orderDirection=asc&page=1&pageSize=10" \ -H "Authorization: Bearer $API_TOKEN"const params = new URLSearchParams({ teamIds: "ZkoDMyjZZsXo4VAO_nJLk", tags: "production", orderBy: "name", orderDirection: "asc", page: "1", pageSize: "10",});
const response = await fetch(`https://api.catalyzed.ai/datasets?${params}`, { headers: { Authorization: `Bearer ${apiToken}` },});const { datasets, total, page, pageSize } = await response.json();response = requests.get( "https://api.catalyzed.ai/datasets", params={ "teamIds": "ZkoDMyjZZsXo4VAO_nJLk", "tags": "production", "orderBy": "name", "orderDirection": "asc", "page": 1, "pageSize": 10 }, headers={"Authorization": f"Bearer {api_token}"})result = response.json()datasets = result["datasets"]Filter Parameters
Section titled “Filter Parameters”| Parameter | Type | Description |
|---|---|---|
teamIds | string | Comma-separated team IDs |
datasetIds | string | Comma-separated dataset IDs |
name | string | Partial match on dataset name (case-insensitive) |
tags | string | Comma-separated tags — returns datasets matching any of the provided tags |
managed | boolean | Filter by managed status (true for system-managed, false for user-created) |
page | number | Page number, 1-indexed (default: 1) |
pageSize | number | Results per page, 1-100 (default: 20) |
orderBy | string | Sort by: createdAt, name, updatedAt, description |
orderDirection | string | asc or desc |
Response Format
Section titled “Response Format”{ "datasets": [ { "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "teamId": "ZkoDMyjZZsXo4VAO_nJLk", "name": "Sales Analytics", "description": "Sales data for analytics and reporting", "tags": ["sales", "analytics", "production"], "metadata": {}, "managed": false, "createdAt": "2024-01-15T10:30:00Z", "updatedAt": "2024-01-15T10:30:00Z" } ], "total": 1, "page": 1, "pageSize": 10}Getting a Dataset
Section titled “Getting a Dataset”Get dataset by ID
curl https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz \ -H "Authorization: Bearer $API_TOKEN"const response = await fetch( "https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", { headers: { Authorization: `Bearer ${apiToken}` } });const dataset = await response.json();response = requests.get( "https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", headers={"Authorization": f"Bearer {api_token}"})dataset = response.json()Table Statistics
Section titled “Table Statistics”Get row counts and table metadata for all tables in a dataset:
curl https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz/table-stats \ -H "Authorization: Bearer $API_TOKEN"Response:
{ "datasetId": "HoIEJNIPiQIy6TjVRxjwz", "tables": [ {"tableId": "Ednc5U676CO4hn-FqsXeA", "tableName": "orders", "rowCount": 15000} ], "totalRows": 15000, "tableCount": 1}Updating a Dataset
Section titled “Updating a Dataset”Update a dataset’s name, description, tags, or metadata:
Update dataset
curl -X PUT https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "Sales Analytics v2", "description": "Updated sales data with 2025 metrics", "tags": ["sales", "analytics", "production", "2025"], "metadata": { "owner": "data-team", "source": "salesforce", "refreshFrequency": "hourly" } }'const response = await fetch( "https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", { method: "PUT", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ name: "Sales Analytics v2", description: "Updated sales data with 2025 metrics", tags: ["sales", "analytics", "production", "2025"], metadata: { owner: "data-team", source: "salesforce", refreshFrequency: "hourly", }, }), });const updated = await response.json();response = requests.put( "https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", headers={"Authorization": f"Bearer {api_token}"}, json={ "name": "Sales Analytics v2", "description": "Updated sales data with 2025 metrics", "tags": ["sales", "analytics", "production", "2025"], "metadata": { "owner": "data-team", "source": "salesforce", "refreshFrequency": "hourly" } })updated = response.json()All fields are optional — only include the fields you want to change. Tags and metadata are replaced entirely (not merged), so include the complete set when updating.
Deleting a Dataset
Section titled “Deleting a Dataset”Delete dataset
curl -X DELETE https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz \ -H "Authorization: Bearer $API_TOKEN"await fetch("https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", { method: "DELETE", headers: { Authorization: `Bearer ${apiToken}` },});requests.delete( "https://api.catalyzed.ai/datasets/HoIEJNIPiQIy6TjVRxjwz", headers={"Authorization": f"Bearer {api_token}"})Returns 204 No Content on success.
Organizing Datasets
Section titled “Organizing Datasets”By Domain
Section titled “By Domain”Group tables by business domain to keep related data together:
Patent Research (dataset) ├── patents (table) ├── inventors (table) └── citations (table)
Financial Filings (dataset) ├── sec_filings (table) ├── companies (table) └── financial_metrics (table)By Environment
Section titled “By Environment”Separate production, staging, and development data:
Customer Data - Production (dataset) └── customers (table)
Customer Data - Staging (dataset) └── customers (table)
Customer Data - Development (dataset) └── customers (table)Use tags to identify environments: tags: ["production"] or tags: ["staging"].
Using Tags for Filtering
Section titled “Using Tags for Filtering”Tags enable quick filtering across datasets. Common tagging patterns:
- Environment:
production,staging,development - Domain:
sales,patents,compliance - Status:
active,archived,deprecated - Data source:
salesforce,snowflake,manual-upload
Filter by tags in list requests:
# Find all production sales datasetscurl "https://api.catalyzed.ai/datasets?teamIds=...&tags=production,sales" \ -H "Authorization: Bearer $API_TOKEN"Using Metadata
Section titled “Using Metadata”Metadata stores structured information about a dataset as key-value pairs. Useful for tracking data lineage, ownership, and operational context:
{ "metadata": { "owner": "data-engineering", "source": "salesforce-api", "refreshFrequency": "daily", "lastRefresh": "2025-01-15T00:00:00Z", "pii": "true", "retentionDays": "365" }}Best Practices
Section titled “Best Practices”- Use descriptive names —
Customer Orders 2025is better thandata_v3 - Tag consistently — agree on a tagging taxonomy across your team
- Set metadata at creation — easier to organize from the start than retroactively
- One dataset per domain — avoid mixing unrelated tables in the same dataset
- Keep names unique and meaningful — names must be unique per team, so include enough context to distinguish similar datasets
Next Steps
Section titled “Next Steps”- Managing Tables - Create and manage tables within datasets
- Ingesting Data - Write data into your tables
- Datasets (Concept) - Understand the data hierarchy