Skip to content

Data Types

Catalyzed uses Apache Arrow data types for table schemas. Type names are case-insensitive and normalized to lowercase.

CategoryTypes
Integerint8, int16, int32, int64, uint8, uint16, uint32, uint64
Floating Pointfloat16, float32 (float), float64 (double)
Stringutf8 (string), largeutf8 (large_utf8, largestring)
Binarybinary, largebinary (large_binary)
Booleanbool (boolean)
Datedate32, date64
Timestamptimestamp, timestamp[s], timestamp[ms], timestamp[us], timestamp[ns]
Arraylist<T>, fixed_size_list[N]<T>
Othernull
TypeSizeRange
int81 byte-128 to 127
int162 bytes-32,768 to 32,767
int324 bytes-2,147,483,648 to 2,147,483,647
int648 bytes-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
TypeSizeRange
uint81 byte0 to 255
uint162 bytes0 to 65,535
uint324 bytes0 to 4,294,967,295
uint648 bytes0 to 18,446,744,073,709,551,615
TypeAliasesSizePrecision
float162 bytesHalf precision (~3 decimal digits)
float32float4 bytesSingle precision (~7 decimal digits)
float64double8 bytesDouble precision (~15 decimal digits)

When ingesting data as CSV or string values, numeric types are automatically coerced:

Inputint32/int64uint32float32/float64
"12345"123451234512345.0
"-9999"-9999Error-9999.0
"123.45"123 (truncated)500 (truncated)123.45
"1e2"100100100.0
"null"NULL (if nullable)NULL (if nullable)NULL (if nullable)
TypeAliasesDescription
utf8stringUTF-8 encoded text (up to 2GB per value)
largeutf8large_utf8, largestringLarge UTF-8 text (>2GB per value)
binaryRaw bytes (up to 2GB per value)
largebinarylarge_binaryLarge raw bytes (>2GB per value)

Use utf8 for most text fields. Only use largeutf8 if individual values may exceed 2GB.

TypeAliases
boolboolean

When ingesting as CSV or string values, these are automatically coerced to boolean:

InputResult
"true", "1", "YES", "y"true
"false", "0"false
"null", ""NULL (if nullable)
TypeDescriptionStorage
date32Date as days since Unix epoch (1970-01-01)4 bytes
date64Date as milliseconds since Unix epoch8 bytes
TypePrecisionDescription
timestampMicroseconds (default)Datetime with microsecond precision
timestamp[s]SecondsDatetime with second precision
timestamp[ms]MillisecondsDatetime with millisecond precision
timestamp[us]MicrosecondsSame as timestamp
timestamp[ns]NanosecondsDatetime with nanosecond precision

You can also specify a timezone: timestamp[us, tz=UTC].

Timestamps accept multiple input formats:

FormatExampleNotes
ISO 8601 with time"2025-01-20T14:30:00Z"Recommended format
ISO 8601 date only"2025-01-20"Time defaults to 00:00:00
Unix microseconds1737381000000000Numeric value interpreted by schema precision

When using numeric values, the interpretation depends on the column’s precision:

  • timestamp[s]: value is seconds since epoch
  • timestamp[ms]: value is milliseconds since epoch
  • timestamp[us] / timestamp: value is microseconds since epoch
  • timestamp[ns]: value is nanoseconds since epoch

Use to_char() with Chrono strftime format strings:

-- Date only
SELECT to_char(created_at, '%Y-%m-%d') AS date
FROM events
-- Date and time
SELECT to_char(created_at, '%Y-%m-%d %H:%M:%S') AS datetime
FROM events
TypeDescription
nullAll values are null

Rarely used directly. Primarily exists for schema compatibility.

TypeDescription
list<T>Variable-length array of type T
fixed_size_list[N]<T>Fixed-length array of exactly N elements of type T

Use list<T> when array lengths vary between rows:

{"name": "tags", "arrowType": "list<utf8>", "nullable": true}
{"name": "scores", "arrowType": "list<float64>", "nullable": true}
{"name": "matrix", "arrowType": "list<list<int32>>", "nullable": true}

Use fixed_size_list[N]<T> for vector embeddings and other fixed-dimension arrays. The dimension N must match across all rows and is required for vector search (KNN):

{"name": "embedding", "arrowType": "fixed_size_list[384]<float32>", "nullable": true}
{"name": "coords_3d", "arrowType": "fixed_size_list[3]<float64>", "nullable": true}

Multiple names map to the same underlying type. All aliases are interchangeable:

AliasCanonical Type
stringutf8
booleanbool
floatfloat32
doublefloat64
large_utf8largeutf8
largestringlargeutf8
large_binarylargebinary

When changing a column’s type via schema migration, these conversions are supported:

FromTo
utf8 (string)Any type (if values are parseable)
int32int64, float32, float64, utf8
float32float64, utf8
timestamputf8, date32, date64

When creating a table, each column is defined as a field with these properties:

PropertyTypeRequiredDescription
namestringYesColumn name (must be non-empty)
arrowTypestringYesOne of the data types above
nullablebooleanNoWhether the column accepts NULL values (default: true)
metadataobjectNoKey-value string pairs for column-level metadata

Example field definitions:

[
{"name": "id", "arrowType": "utf8", "nullable": false},
{"name": "count", "arrowType": "int64", "nullable": false},
{"name": "score", "arrowType": "float64", "nullable": true},
{"name": "active", "arrowType": "bool", "nullable": false},
{"name": "created_at", "arrowType": "timestamp", "nullable": false},
{"name": "tags", "arrowType": "list<utf8>", "nullable": true},
{"name": "embedding", "arrowType": "fixed_size_list[384]<float32>", "nullable": true}
]