Full-Text Search
Full-text search (FTS) enables BM25-based keyword matching using inverted indexes. Unlike vector search which matches by meaning, FTS excels at exact term matching, technical identifiers, and boolean queries.
Overview
Section titled “Overview”Full-text search is powered by:
fts_search()- BM25 keyword search over text columns- Inverted indexes - Persistent indexes for fast term lookup on large datasets
- JIT indexing - Ephemeral in-RAM indexes for ad-hoc exploration without pre-built indexes
fts_search Function
Section titled “fts_search Function”fts_search(table, column, query_text, limit, [filter], [enable_jit_index])| Parameter | Type | Required | Description |
|---|---|---|---|
table | string | Yes | Table name containing the text column |
column | string | Yes | Column name with text content |
query_text | string | Yes | Search query (supports boolean operators) |
limit | integer | Yes | Maximum number of results to return |
filter | string | No | SQL WHERE clause for pre-filtering (pass NULL to skip) |
enable_jit_index | boolean | No | Enable JIT indexing for tables without a persistent inverted index (default: false) |
Returns: All columns from the source table plus a _score column with BM25 relevance scores (higher = better match).
Basic Full-Text Search
Section titled “Basic Full-Text Search”Basic FTS
curl -X POST https://api.catalyzed.ai/queries \ -H "Authorization: Bearer $API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "sql": "SELECT * FROM fts_search('\''documents'\'', '\''content'\'', '\''machine learning'\'', 20)", "tables": {"documents": "KzaMsfA0LSw_Ld0KyaXIS"} }'const response = await fetch("https://api.catalyzed.ai/queries", { method: "POST", headers: { Authorization: `Bearer ${apiToken}`, "Content-Type": "application/json", }, body: JSON.stringify({ sql: `SELECT * FROM fts_search( 'documents', 'content', 'machine learning', 20 )`, tables: { documents: "KzaMsfA0LSw_Ld0KyaXIS" }, }),});response = requests.post( "https://api.catalyzed.ai/queries", headers={"Authorization": f"Bearer {api_token}"}, json={ "sql": """ SELECT * FROM fts_search( 'documents', 'content', 'machine learning', 20 ) """, "tables": {"documents": "KzaMsfA0LSw_Ld0KyaXIS"} })Response includes all source columns plus _score:
{ "columns": [ {"name": "doc_id", "type": "Utf8"}, {"name": "content", "type": "Utf8"}, {"name": "_score", "type": "Float32"} ], "rows": [ {"doc_id": "doc_123", "content": "Introduction to machine learning...", "_score": 8.42}, {"doc_id": "doc_456", "content": "Machine learning algorithms...", "_score": 7.89} ]}Filtered Full-Text Search
Section titled “Filtered Full-Text Search”Combine keyword search with attribute filters using the filter parameter:
SELECT * FROM fts_search( 'documents', 'content', 'machine learning', 20, 'category = ''tech'' AND published_at > ''2024-01-01''')The filter is applied before the text search, ensuring efficient execution when scalar columns are indexed.
JIT Indexing
Section titled “JIT Indexing”For smaller datasets (up to 500K rows), you can skip index creation entirely by passing enable_jit_index = true as the 6th parameter. This builds an ephemeral in-RAM index at query time:
-- Search without a pre-built index (pass NULL for filter)SELECT * FROM fts_search('documents', 'content', 'machine learning', 20, NULL, true)
-- JIT index with a filterSELECT * FROM fts_search('documents', 'content', 'machine learning', 20, 'category = ''tech''', true)JIT indexing is useful for prototyping, ad-hoc exploration, and smaller tables where maintaining a persistent index isn’t justified. For large datasets or frequent queries, create a persistent inverted index for better performance.
Creating Inverted Indexes
Section titled “Creating Inverted Indexes”For larger datasets or frequent queries, create an inverted index on the text column:
-- Create inverted index with default configurationCREATE INDEX documents_content_ftsON documents (content)USING inverted;Index Configuration Options
Section titled “Index Configuration Options”| Option | Default | Description |
|---|---|---|
base_tokenizer | simple | Tokenization strategy (simple, whitespace) |
with_stemming | true | Enable word stemming (“running” matches “run”) |
remove_stop_words | true | Remove common words (“the”, “a”, “is”) |
lowercase | true | Normalize to lowercase for case-insensitive search |
with_position | true | Track positions for phrase search |
Example with custom configuration:
await datasetTableOperations.createIndex(tableId, { indexType: "inverted", columnName: "content", indexName: "content_fts", baseTokenizer: "simple", withStemming: true, removeStopWords: true, lowercase: true, withPosition: true,});Full-Text Search with JOINs
Section titled “Full-Text Search with JOINs”Combine FTS results with standard SQL operations:
-- Find matching documents and their authorsSELECT d.title, d._score, a.name as authorFROM fts_search( 'documents', 'content', 'patent infringement', 20) dJOIN authors a ON d.author_id = a.idORDER BY d._score DESC-- Aggregate FTS results by categorySELECT category, COUNT(*) as matches, AVG(_score) as avg_relevanceFROM fts_search( 'articles', 'body', 'climate change policy', 100)GROUP BY categoryORDER BY avg_relevance DESCWhen to Use FTS vs Vector Search
Section titled “When to Use FTS vs Vector Search”| Use Case | FTS (BM25) | Vector Search |
|---|---|---|
| Exact term matching | Best | Poor |
| Technical identifiers (APIs, SKUs) | Best | Poor |
| Boolean queries (AND/OR/NOT) | Supported | Not supported |
| Conceptual similarity | Poor | Best |
| Synonyms and paraphrasing | No | Automatic |
| Multilingual matching | Limited | Good |
Next Steps
Section titled “Next Steps”- Vector Search - Semantic similarity search with embeddings
- Querying Data - Standard SQL query reference
- Tables - Table schemas and configuration