A table-valued function (TVF) that reads data from Avro files stored in Amazon S3. The function can use either a location object (recommended) or direct credentials to access the data. READ_AVRO returns a table with data from the specified Avro file and supports all Avro data types.
READ_AVRO supports only binary encoded Avro files (typically with the .avro extension). JSON-encoded Avro data is not supported.
Syntax
-- Using location object (recommended)
READ_AVRO (
LOCATION => 'location_name'
[, PATTERN => <pattern>]
)
|
-- Using static credentials
READ_AVRO (
URL => <file_url>
[, AWS_ACCESS_KEY_ID => <aws_access_key_id>]
[, AWS_SECRET_ACCESS_KEY => <aws_secret_access_key>]
[, AWS_SESSION_TOKEN => <aws_session_token>]
[, AWS_ROLE_ARN => <aws_role_arn>]
[, AWS_ROLE_EXTERNAL_ID => <aws_role_external_id>]
)
Parameters
| Parameter | Description | Supported input types |
|---|
LOCATION | The name of a location object that contains the Amazon S3 URL and credentials. Firebolt recommends using LOCATION to store credentials for authentication. LOCATION must be specified as a string literal (e.g., LOCATION => 'my_location'). Unlike URL, it cannot be used as a positional parameter. For a comprehensive guide, see LOCATION objects. | TEXT |
PATTERN | When using LOCATION, an optional glob pattern to filter files within the location’s URL path. The pattern is applied relative to the location’s base path. For example, PATTERN => 'week_1/*.avro' will match all .avro files in the week_1 subdirectory. | TEXT |
URL | The location of the Amazon S3 bucket containing your files. The expected format is s3://{bucket_name}/{full_file_path_glob_pattern}. | TEXT |
AWS_ACCESS_KEY_ID | The AWS access key ID. | TEXT |
AWS_SECRET_ACCESS_KEY | The AWS secret access key. | TEXT |
AWS_SESSION_TOKEN | The AWS session token. | TEXT |
AWS_ROLE_ARN | The AWS role ARN. | TEXT |
AWS_ROLE_EXTERNAL_ID | The AWS role external ID. | TEXT |
- When using static credentials:
- The
URL can be passed as either the first positional parameter or a named parameter
- If you provide either
AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY, you must provide both
- Providing an AWS session token is optional
- Credentials are not required for accessing public buckets
Return Type
The result is a table with data from the Avro files. Columns are read and parsed using their inferred data types based on the Avro schema. All data types are inferred as nullable.
Avro Data Type Mapping
READ_AVRO supports all Avro data types with the following mappings:
| Avro Type | Firebolt Type | Notes |
|---|
null | TEXT (nullable) | Standalone null columns are supported and handled as nullable text columns that are always null. Within unions, null specifies nullability for the resulting type |
boolean | BOOLEAN | |
int | INT | |
long | BIGINT | |
float | REAL | |
double | DOUBLE | |
bytes | BYTEA | |
string | TEXT | |
record | STRUCT | Nested structure with named fields |
enum | TEXT | Enum values are converted to their string representation |
array | ARRAY | |
map | ARRAY<STRUCT<key TEXT, value TYPE>> | Maps are converted to arrays of key-value structs |
union | STRUCT or single type | Single-type unions and dual-type unions with null are inferred as single nullable types; multi-type unions become structs |
fixed | BYTEA | Fixed-length byte arrays |
Special Handling
Maps: Avro maps are converted to arrays of structs with two fields:
key: The map key (always TEXT)
value: The map value (type depends on the Avro map value type)
Unions: Avro union handling depends on the number and types of union members:
- Single-type unions (e.g.,
["string"]): Inferred as the single type directly
- Dual-type unions with null (e.g.,
["null", "string"]): Inferred as a single nullable type (TEXT in this case)
- Multi-type unions (e.g.,
["string", "int", "record"]): Converted to structs with nullable fields for each possible type
For multi-type unions converted to structs, field naming follows this convention:
- First occurrence: Uses the Avro type name without suffix (e.g.,
string, int, record)
- Subsequent occurrences: Adds suffix
_<i> starting from _1 (e.g., record_1, record_2, etc.)
For example, a union with multiple record types would create fields named: record, record_1, record_2.
Enums: Avro enums are converted to their string representation.
Null types: The null type is supported both as a standalone column and within unions. Standalone null columns are handled as nullable text columns that are always null. Within unions, null specifies the nullability of the resulting type itself (e.g., ["null", "string"] creates a nullable text field, and ["null", "string", "int"] creates a nullable struct where the entire struct can be null).
Best practices
Firebolt recommends using a LOCATION object to store credentials for authentication.
When using READ_AVRO(), the URL parameter in the location should contain only Avro files (see location table-valued functions).
Examples
Example 1: Simple Avro file
The following code example reads from a simple Avro file with basic data types:
SELECT *
FROM READ_AVRO(
URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/simple.avro'
)
LIMIT 3;
Returns
| id | name | score | active |
|---|
| 1 | Alice | 95.5 | true |
| 2 | Bob | 87.2 | false |
| 3 | Charlie | 92.8 | true |
Example 2: Avro file with maps
This example shows how Avro maps are converted to arrays of key-value structs:
SELECT
user_id,
preferences -- map converted to array of key-value structs
FROM READ_AVRO(
URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/map_as_record.avro'
)
LIMIT 2;
Returns
| user_id | preferences |
|---|
| 101 | [{"key": "theme", "value": "dark"}, {"key": "language", "value": "en"}] |
| 102 | [{"key": "theme", "value": "light"}, {"key": "notifications", "value": "enabled"}] |
Example 3: Avro file with unions
This example demonstrates how Avro unions are converted to structs with nullable fields:
SELECT
id,
contact_info -- union converted to struct with nullable fields
FROM READ_AVRO(
URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/union_array_struct.avro'
)
LIMIT 2;
Returns
| id | contact_info |
|---|
| 1 | {"string": "email@example.com", "record": null} |
| 2 | {"string": null, "record": {"phone": "555-1234", "address": "123 Main St"}} |
Example 4: Using location object
The following code example uses a LOCATION object to store credentials for authentication:
SELECT *
FROM READ_AVRO(
LOCATION => 'my_avro_location'
)
LIMIT 5;
Example 5: Using location object with pattern
This example shows how to use the PATTERN parameter with a location object to filter specific files:
CREATE LOCATION firebolt_sample_avro WITH
SOURCE = AMAZON_S3
URL = 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/';
SELECT
$source_file_name, *
FROM READ_AVRO(
location => 'firebolt_sample_avro', pattern => 'sample_directory_*/*.avro'
);
Returns
| $source_file_name | name | age | email |
|---|
| help_center_assets/firebolt_sample_avro/sample_directory_1/simple.avro | Alice | 30 | alice@example.com |
| help_center_assets/firebolt_sample_avro/sample_directory_1/simple.avro | Bob | 25 | bob@example.com |
| help_center_assets/firebolt_sample_avro/sample_directory_1/simple.avro | Charlie | 35 | charlie@example.com |
This reads only the Avro files matching the pattern sample_directory_*/*.avro within the location’s base path.
Using URL
- The
URL can be passed as either the first positional parameter or a named parameter. For example, the following two queries will both read the same file:
SELECT * FROM READ_AVRO('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/simple.avro');
SELECT * FROM READ_AVRO(URL => 's3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/simple.avro');
- The
url can represent a single file or a glob pattern. If a glob pattern is used, all files matching the pattern will be read. A special column $source_file_name can be used to identify the source file of each row in the result set:
SELECT *, $source_file_name FROM READ_AVRO('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/*.avro')
When using glob patterns, the wildcard (*) can only be used at the end of the path. You can use it with any text before or after, such as *.avro, date=2025*.avro, or data_*.avro.
The pattern will recursively match files in all subdirectories. For example:
SELECT * FROM READ_AVRO('s3://firebolt-publishing-public/help_center_assets/firebolt_sample_avro/*.avro')
will read all Avro files in the firebolt_sample_avro directory and all of its subdirectories.