Catalyzed uses Apache Arrow data types for table schemas. Type names are case-insensitive and normalized to lowercase.
Category Types Integer int8, int16, int32, int64, uint8, uint16, uint32, uint64Floating Point float16, float32 (float), float64 (double)String utf8 (string), largeutf8 (large_utf8, largestring)Binary binary, largebinary (large_binary)Boolean bool (boolean)Date date32, date64Timestamp timestamp, timestamp[s], timestamp[ms], timestamp[us], timestamp[ns]Array list<T>, fixed_size_list[N]<T>Other null
Type Size Range int81 byte -128 to 127 int162 bytes -32,768 to 32,767 int324 bytes -2,147,483,648 to 2,147,483,647 int648 bytes -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Type Size Range uint81 byte 0 to 255 uint162 bytes 0 to 65,535 uint324 bytes 0 to 4,294,967,295 uint648 bytes 0 to 18,446,744,073,709,551,615
Type Aliases Size Precision float162 bytes Half precision (~3 decimal digits) float32float4 bytes Single precision (~7 decimal digits) float64double8 bytes Double precision (~15 decimal digits)
When ingesting data as CSV or string values, numeric types are automatically coerced:
Input int32/int64 uint32 float32/float64 "12345"12345 12345 12345.0 "-9999"-9999 Error -9999.0 "123.45"123 (truncated) 500 (truncated) 123.45 "1e2"100 100 100.0 "null"NULL (if nullable) NULL (if nullable) NULL (if nullable)
Type Aliases Description utf8stringUTF-8 encoded text (up to 2GB per value) largeutf8large_utf8, largestringLarge UTF-8 text (>2GB per value) binaryRaw bytes (up to 2GB per value) largebinarylarge_binaryLarge raw bytes (>2GB per value)
Use utf8 for most text fields. Only use largeutf8 if individual values may exceed 2GB.
When ingesting as CSV or string values, these are automatically coerced to boolean:
Input Result "true", "1", "YES", "y"true"false", "0"false"null", ""NULL (if nullable)
Type Description Storage date32Date as days since Unix epoch (1970-01-01) 4 bytes date64Date as milliseconds since Unix epoch 8 bytes
Type Precision Description timestampMicroseconds (default) Datetime with microsecond precision timestamp[s]Seconds Datetime with second precision timestamp[ms]Milliseconds Datetime with millisecond precision timestamp[us]Microseconds Same as timestamp timestamp[ns]Nanoseconds Datetime with nanosecond precision
You can also specify a timezone: timestamp[us, tz=UTC].
Timestamps accept multiple input formats:
Format Example Notes ISO 8601 with time "2025-01-20T14:30:00Z"Recommended format ISO 8601 date only "2025-01-20"Time defaults to 00:00:00 Unix microseconds 1737381000000000Numeric value interpreted by schema precision
When using numeric values, the interpretation depends on the column’s precision:
timestamp[s]: value is seconds since epoch
timestamp[ms]: value is milliseconds since epoch
timestamp[us] / timestamp: value is microseconds since epoch
timestamp[ns]: value is nanoseconds since epoch
Use to_char() with Chrono strftime format strings:
SELECT to_char(created_at, ' %Y-%m-%d ' ) AS date
SELECT to_char(created_at, ' %Y-%m-%d %H:%M:%S ' ) AS datetime
Type Description nullAll values are null
Rarely used directly. Primarily exists for schema compatibility.
Type Description list<T>Variable-length array of type T fixed_size_list[N]<T>Fixed-length array of exactly N elements of type T
Use list<T> when array lengths vary between rows:
{ "name" : " tags " , "arrowType" : " list<utf8> " , "nullable" : true }
{ "name" : " scores " , "arrowType" : " list<float64> " , "nullable" : true }
{ "name" : " matrix " , "arrowType" : " list<list<int32>> " , "nullable" : true }
Use fixed_size_list[N]<T> for vector embeddings and other fixed-dimension arrays. The dimension N must match across all rows and is required for vector search (KNN):
{ "name" : " embedding " , "arrowType" : " fixed_size_list[384]<float32> " , "nullable" : true }
{ "name" : " coords_3d " , "arrowType" : " fixed_size_list[3]<float64> " , "nullable" : true }
Multiple names map to the same underlying type. All aliases are interchangeable:
Alias Canonical Type stringutf8booleanboolfloatfloat32doublefloat64large_utf8largeutf8largestringlargeutf8large_binarylargebinary
When changing a column’s type via schema migration , these conversions are supported:
From To utf8 (string)Any type (if values are parseable) int32int64, float32, float64, utf8float32float64, utf8timestamputf8, date32, date64
When creating a table, each column is defined as a field with these properties:
Property Type Required Description namestring Yes Column name (must be non-empty) arrowTypestring Yes One of the data types above nullableboolean No Whether the column accepts NULL values (default: true) metadataobject No Key-value string pairs for column-level metadata
Example field definitions:
{ "name" : " id " , "arrowType" : " utf8 " , "nullable" : false },
{ "name" : " count " , "arrowType" : " int64 " , "nullable" : false },
{ "name" : " score " , "arrowType" : " float64 " , "nullable" : true },
{ "name" : " active " , "arrowType" : " bool " , "nullable" : false },
{ "name" : " created_at " , "arrowType" : " timestamp " , "nullable" : false },
{ "name" : " tags " , "arrowType" : " list<utf8> " , "nullable" : true },
{ "name" : " embedding " , "arrowType" : " fixed_size_list[384]<float32> " , "nullable" : true }