Querying Data
Catalyzed supports standard SQL for querying your data. This guide covers common query patterns and best practices.
Executing Queries
Section titled “Executing Queries”Use the /queries endpoint to run SQL queries against your tables. The tables parameter maps the table names used in your SQL to their table IDs.
Basic SELECT query
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM users LIMIT 10", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: "SELECT * FROM users LIMIT 10", tables: { users: "KzaMsfA0LSw_Ld0KyaXIS" }, }),});const data = await response.json();response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": "SELECT * FROM users LIMIT 10", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} })data = response.json()Joining Tables
Section titled “Joining Tables”Query across multiple tables by including them all in the tables mapping:
JOIN across tables
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id", "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: `SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id`, tables: { orders: "Ednc5U676CO4hn-FqsXeA", customers: "6fTBbbj4uv8TVMVh0gVch", }, }),});response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": """ SELECT o.order_id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id """, "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" } })SQL Syntax Reference
Section titled “SQL Syntax Reference”SELECT Clauses
Section titled “SELECT Clauses”-- Select specific columnsSELECT name, email, created_at FROM users
-- Select all columnsSELECT * FROM users
-- Column aliasesSELECT name AS customer_name, email AS contact FROM users
-- ExpressionsSELECT name, price * quantity AS total FROM ordersFiltering (WHERE)
Section titled “Filtering (WHERE)”-- Basic comparisonsSELECT * FROM users WHERE status = 'active'SELECT * FROM orders WHERE amount > 100SELECT * FROM orders WHERE created_at >= '2024-01-01'
-- Multiple conditionsSELECT * FROM orders WHERE status = 'active' AND amount > 100SELECT * FROM orders WHERE status = 'active' OR status = 'pending'
-- NULL checksSELECT * FROM users WHERE deleted_at IS NULLSELECT * FROM orders WHERE notes IS NOT NULL
-- IN operatorSELECT * FROM orders WHERE status IN ('active', 'pending', 'review')
-- LIKE pattern matchingSELECT * FROM users WHERE email LIKE '%@example.com'SELECT * FROM users WHERE name LIKE 'John%'
-- BETWEEN rangeSELECT * FROM orders WHERE amount BETWEEN 100 AND 500Sorting (ORDER BY)
Section titled “Sorting (ORDER BY)”-- Ascending (default)SELECT * FROM orders ORDER BY created_at
-- DescendingSELECT * FROM orders ORDER BY created_at DESC
-- Multiple columnsSELECT * FROM orders ORDER BY status ASC, created_at DESC
-- With NULLS handlingSELECT * FROM orders ORDER BY completed_at NULLS LASTPagination (LIMIT/OFFSET)
Section titled “Pagination (LIMIT/OFFSET)”-- First 10 rowsSELECT * FROM users LIMIT 10
-- Skip first 20, get next 10SELECT * FROM users LIMIT 10 OFFSET 20
-- Recommended: use ORDER BY with paginationSELECT * FROM users ORDER BY id LIMIT 10 OFFSET 20Aggregations
Section titled “Aggregations”-- CountSELECT COUNT(*) FROM ordersSELECT COUNT(DISTINCT customer_id) FROM orders
-- Sum, Average, Min, MaxSELECT SUM(amount) FROM ordersSELECT AVG(amount) FROM orders WHERE status = 'completed'SELECT MIN(created_at), MAX(created_at) FROM orders
-- Group bySELECT status, COUNT(*) as count, SUM(amount) as totalFROM ordersGROUP BY status
-- Having (filter groups)SELECT customer_id, COUNT(*) as order_countFROM ordersGROUP BY customer_idHAVING COUNT(*) > 5-- INNER JOIN (only matching rows)SELECT o.*, c.nameFROM orders oJOIN customers c ON o.customer_id = c.id
-- LEFT JOIN (all from left, matching from right)SELECT c.name, o.order_idFROM customers cLEFT JOIN orders o ON c.id = o.customer_id
-- Multiple joinsSELECT o.order_id, c.name, p.product_nameFROM orders oJOIN customers c ON o.customer_id = c.idJOIN products p ON o.product_id = p.idSubqueries
Section titled “Subqueries”-- Subquery in WHERESELECT * FROM ordersWHERE customer_id IN ( SELECT id FROM customers WHERE status = 'premium')
-- Subquery in FROMSELECT avg_amount, COUNT(*) as customer_countFROM ( SELECT customer_id, AVG(amount) as avg_amount FROM orders GROUP BY customer_id) customer_averagesGROUP BY avg_amountCommon Table Expressions (CTEs)
Section titled “Common Table Expressions (CTEs)”WITH high_value_customers AS ( SELECT customer_id, SUM(amount) as total_spent FROM orders GROUP BY customer_id HAVING SUM(amount) > 10000)SELECT c.name, hvc.total_spentFROM high_value_customers hvcJOIN customers c ON hvc.customer_id = c.idORDER BY hvc.total_spent DESCWindow Functions
Section titled “Window Functions”-- Row numberSELECT name, amount, ROW_NUMBER() OVER (ORDER BY amount DESC) as rankFROM orders
-- Partition bySELECT customer_id, order_date, amount, SUM(amount) OVER (PARTITION BY customer_id) as customer_totalFROM orders
-- Running totalSELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) as running_totalFROM ordersQuery Response Format
Section titled “Query Response Format”Queries return results in a structured JSON format:
{ "queryId": "qry_abc123", "columns": [ {"name": "id", "type": "Utf8"}, {"name": "name", "type": "Utf8"}, {"name": "email", "type": "Utf8"}, {"name": "created_at", "type": "Timestamp(Microsecond, Some(\"UTC\"))"} ], "rows": [ ], "rowCount": 2, "truncated": false}Query Options
Section titled “Query Options”Limit Results
Section titled “Limit Results”Prevent accidentally returning too many rows:
{ "sql": "SELECT * FROM users", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"}, "maxRows": 1000}Get Execution Plan
Section titled “Get Execution Plan”Preview how a query will execute without running it:
curl -X POST https://api.catalyzed.ai/queries/explain \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM users WHERE status = '\''active'\''", "tables": {"users": "KzaMsfA0LSw_Ld0KyaXIS"} }'Performance Tips
Section titled “Performance Tips”1. Use Specific Columns
Section titled “1. Use Specific Columns”-- Good: Select only needed columnsSELECT id, name, status FROM users
-- Avoid: Select all columnsSELECT * FROM users2. Filter Early
Section titled “2. Filter Early”-- Good: Filter in WHERESELECT * FROM orders WHERE created_at > '2024-01-01'
-- Avoid: Filter in application codeSELECT * FROM orders -- then filter in code3. Use LIMIT
Section titled “3. Use LIMIT”-- Good: Always limit during developmentSELECT * FROM orders WHERE status = 'active' LIMIT 100
-- Avoid: Unbounded queriesSELECT * FROM orders4. Create Indexes
Section titled “4. Create Indexes”For columns frequently used in WHERE clauses:
curl -X POST https://api.catalyzed.ai/dataset-tables/KzaMsfA0LSw_Ld0KyaXIS/indexes \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "indexName": "idx_status", "columnName": "status", "indexType": "btree" }'5. Use Appropriate Data Types
Section titled “5. Use Appropriate Data Types”- Use
timestampfor dates, not strings - Use numeric types (
int64,float64) for numbers, not strings - This enables efficient comparisons and aggregations
Error Handling
Section titled “Error Handling”Syntax Errors
Section titled “Syntax Errors”{ "error": "QUERY_ERROR", "message": "SQL syntax error at position 45: unexpected token 'FORM'"}Missing Table Bindings
Section titled “Missing Table Bindings”{ "error": "VALIDATION_ERROR", "message": "Table 'orders' referenced in SQL but not found in tables mapping"}Timeout
Section titled “Timeout”Long-running queries may timeout. Use filters and limits to reduce data scanned.
SQL Translation
Section titled “SQL Translation”Convert natural language queries to SQL using the /queries/translate endpoint. The translator understands table schemas, column types, and supports semantic search functions.
Translate natural language to SQL
curl -X POST https://api.catalyzed.ai/queries/translate \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" }, "query": "Find customers who placed orders over $1000 last month", "includeExplanation": true }'const response = await fetch("https://api.catalyzed.ai/queries/translate", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ tables: { orders: "Ednc5U676CO4hn-FqsXeA", customers: "6fTBbbj4uv8TVMVh0gVch", }, query: "Find customers who placed orders over $1000 last month", includeExplanation: true, }),});const { sql, explanation, confidence } = await response.json();response = requests.post( "https://api.catalyzed.ai/queries/translate", headers={"Authorization": f"Bearer {api_token}"}, json={ "tables": { "orders": "Ednc5U676CO4hn-FqsXeA", "customers": "6fTBbbj4uv8TVMVh0gVch" }, "query": "Find customers who placed orders over $1000 last month", "includeExplanation": True })result = response.json()sql = result["sql"]explanation = result.get("explanation")confidence = result["confidence"]Response:
{ "sql": "SELECT DISTINCT c.name, c.email FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.amount > 1000 AND o.created_at >= date_trunc('month', CURRENT_DATE - INTERVAL '1 month')", "confidence": 0.92, "explanation": "This query joins customers and orders tables, filters for orders over $1000, and restricts to the previous month using date_trunc."}Translation Features
Section titled “Translation Features”- Schema Awareness - Understands table structures and column types
- Semantic Search Support - Recognizes
knn_search(),knn_cosine(),knn_l2(), andtext_to_embedding()functions - SQL Validation - Optionally validates generated SQL syntax
- Confidence Scores - Returns confidence level (0.0-1.0) for the translation
- Explanations - Optional explanation of the generated SQL
Request Options
Section titled “Request Options”| Option | Type | Description |
|---|---|---|
tables | object | Map of table aliases (as used in SQL) to table IDs |
query | string | Natural language query to translate |
includeExplanation | boolean | Include explanation of generated SQL (default: false) |
validateSyntax | boolean | Validate SQL syntax before returning (default: true) |
Example: Semantic Search Query
Section titled “Example: Semantic Search Query”{ "tables": { "products": "tbl_abc123" }, "query": "Find products similar to wireless headphones", "includeExplanation": true}Generated SQL:
SELECT * FROM knn_cosine( 'products', 'embedding', text_to_embedding('wireless headphones'), 10)See the SQL Translation API for complete endpoint documentation.
Vector Search
Section titled “Vector Search”Catalyzed also supports semantic similarity search using vector embeddings. Use knn_search() to find similar documents, products, or any embedded content:
SELECT * FROM knn_cosine( 'products', 'embedding', text_to_embedding('wireless headphones'), 10)See the Vector Search Guide for complete documentation on:
knn_search(),knn_cosine(), andknn_l2()functionstext_to_embedding()for query-time embeddings- Filtered vector search
- Distance metrics and thresholds
Next Steps
Section titled “Next Steps”- Vector Search - Semantic similarity search with embeddings
- Tables - Learn about table schemas and indexes
- Schema Management - Evolve your schemas safely