Data Types

Catalyzed uses Apache Arrow data types for table schemas. Type names are case-insensitive and normalized to lowercase.

Quick Reference

Category	Types
Integer	`int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32`, `uint64`
Floating Point	`float16`, `float32` (`float`), `float64` (`double`)
String	`utf8` (`string`), `largeutf8` (`large_utf8`, `largestring`)
Binary	`binary`, `largebinary` (`large_binary`)
Boolean	`bool` (`boolean`)
Date	`date32`, `date64`
Timestamp	`timestamp`, `timestamp[s]`, `timestamp[ms]`, `timestamp[us]`, `timestamp[ns]`
Array	`list<T>`, `fixed_size_list[N]<T>`
Other	`null`

Numeric Types

Signed Integers

Type	Size	Range
`int8`	1 byte	-128 to 127
`int16`	2 bytes	-32,768 to 32,767
`int32`	4 bytes	-2,147,483,648 to 2,147,483,647
`int64`	8 bytes	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Unsigned Integers

Type	Size	Range
`uint8`	1 byte	0 to 255
`uint16`	2 bytes	0 to 65,535
`uint32`	4 bytes	0 to 4,294,967,295
`uint64`	8 bytes	0 to 18,446,744,073,709,551,615

Floating Point

Type	Aliases	Size	Precision
`float16`		2 bytes	Half precision (~3 decimal digits)
`float32`	`float`	4 bytes	Single precision (~7 decimal digits)
`float64`	`double`	8 bytes	Double precision (~15 decimal digits)

Numeric Type Coercion

When ingesting data as CSV or string values, numeric types are automatically coerced:

Input	int32/int64	uint32	float32/float64
`"12345"`	12345	12345	12345.0
`"-9999"`	-9999	Error	-9999.0
`"123.45"`	123 (truncated)	500 (truncated)	123.45
`"1e2"`	100	100	100.0
`"null"`	NULL (if nullable)	NULL (if nullable)	NULL (if nullable)

String and Binary Types

Type	Aliases	Description
`utf8`	`string`	UTF-8 encoded text (up to 2GB per value)
`largeutf8`	`large_utf8`, `largestring`	Large UTF-8 text (>2GB per value)
`binary`		Raw bytes (up to 2GB per value)
`largebinary`	`large_binary`	Large raw bytes (>2GB per value)

Use utf8 for most text fields. Only use largeutf8 if individual values may exceed 2GB.

Boolean Type

Type	Aliases
`bool`	`boolean`

Boolean Coercion

When ingesting as CSV or string values, these are automatically coerced to boolean:

Input	Result
`"true"`, `"1"`, `"YES"`, `"y"`	`true`
`"false"`, `"0"`	`false`
`"null"`, `""`	`NULL` (if nullable)

Date Types

Type	Description	Storage
`date32`	Date as days since Unix epoch (1970-01-01)	4 bytes
`date64`	Date as milliseconds since Unix epoch	8 bytes

When querying date columns, raw values return as epoch numbers (e.g., 19737). Use to_char() to format them as readable strings:

SELECT to_char(CAST(my_date AS DATE), '%Y-%m-%d') AS formatted_date
FROM my_table

Timestamp Type

Type	Precision	Description
`timestamp`	Microseconds (default)	Datetime with microsecond precision
`timestamp[s]`	Seconds	Datetime with second precision
`timestamp[ms]`	Milliseconds	Datetime with millisecond precision
`timestamp[us]`	Microseconds	Same as `timestamp`
`timestamp[ns]`	Nanoseconds	Datetime with nanosecond precision

You can also specify a timezone: timestamp[us, tz=UTC].

Timestamp Input Formats

Timestamps accept multiple input formats:

Format	Example	Notes
ISO 8601 with time	`"2025-01-20T14:30:00Z"`	Recommended format
ISO 8601 date only	`"2025-01-20"`	Time defaults to 00:00:00
Unix microseconds	`1737381000000000`	Numeric value interpreted by schema precision

When using numeric values, the interpretation depends on the column’s precision:

timestamp[s]: value is seconds since epoch
timestamp[ms]: value is milliseconds since epoch
timestamp[us] / timestamp: value is microseconds since epoch
timestamp[ns]: value is nanoseconds since epoch

Formatting Timestamps in Queries

Use to_char() with Chrono strftime format strings:

-- Date only
SELECT to_char(created_at, '%Y-%m-%d') AS date
FROM events

-- Date and time
SELECT to_char(created_at, '%Y-%m-%d %H:%M:%S') AS datetime
FROM events

Null Type

Type	Description
`null`	All values are null

Rarely used directly. Primarily exists for schema compatibility.

Array Types

Type	Description
`list<T>`	Variable-length array of type T
`fixed_size_list[N]<T>`	Fixed-length array of exactly N elements of type T

Variable-Length Lists

Use list<T> when array lengths vary between rows:

{"name": "tags", "arrowType": "list<utf8>", "nullable": true}
{"name": "scores", "arrowType": "list<float64>", "nullable": true}
{"name": "matrix", "arrowType": "list<list<int32>>", "nullable": true}

Fixed-Size Lists

Use fixed_size_list[N]<T> for vector embeddings and other fixed-dimension arrays. The dimension N must match across all rows and is required for vector search (KNN):

{"name": "embedding", "arrowType": "fixed_size_list[384]<float32>", "nullable": true}
{"name": "coords_3d", "arrowType": "fixed_size_list[3]<float64>", "nullable": true}

Type Aliases

Multiple names map to the same underlying type. All aliases are interchangeable:

Alias	Canonical Type
`string`	`utf8`
`boolean`	`bool`
`float`	`float32`
`double`	`float64`
`large_utf8`	`largeutf8`
`largestring`	`largeutf8`
`large_binary`	`largebinary`

Type Conversion Compatibility

When changing a column’s type via schema migration, these conversions are supported:

From	To
`utf8` (string)	Any type (if values are parseable)
`int32`	`int64`, `float32`, `float64`, `utf8`
`float32`	`float64`, `utf8`
`timestamp`	`utf8`, `date32`, `date64`

Field Definitions

When creating a table, each column is defined as a field with these properties:

Property	Type	Required	Description
`name`	string	Yes	Column name (must be non-empty)
`arrowType`	string	Yes	One of the data types above
`nullable`	boolean	No	Whether the column accepts NULL values (default: `true`)
`metadata`	object	No	Key-value string pairs for column-level metadata

Example field definitions:

[
  {"name": "id", "arrowType": "utf8", "nullable": false},
  {"name": "count", "arrowType": "int64", "nullable": false},
  {"name": "score", "arrowType": "float64", "nullable": true},
  {"name": "active", "arrowType": "bool", "nullable": false},
  {"name": "created_at", "arrowType": "timestamp", "nullable": false},
  {"name": "tags", "arrowType": "list<utf8>", "nullable": true},
  {"name": "embedding", "arrowType": "fixed_size_list[384]<float32>", "nullable": true}
]

Next Steps

Tables - Create tables with these data types
Schema Management - Evolve column types over time
Ingesting Data - Write data with automatic type coercion