SQL External Table Functions
SQL External Table Functions are built-in table functions used to read data from a source external to the database. The table functions are used in SQL statements where external data is used in-place of a regular SQL table or subquery. The location of the data files can be on the local file system, or remote, such as Amazon S3.
The result set of a table function is a temporary table that exists during SQL statement execution. This table can be queried further, joined with other tables, or used in any other way that a regular table can be used.
External Data Source
The following file formats are supported:
-
CSV
-
Parquet
If the source data is not formatted correctly, the data may not be imported as expected, or may fail completely.
Local File Access Configuration
An external data directory must be configured before accessing local files.
Once configured, only files located within this directory may be read using the READ_CSV and READ_PARQUET table functions.
The external data directory can be configured using the external-data-dir database option.
The path must be an absolute path to a directory on the local file system.
An external data directory can be configured for each TE.
Remote File Access Configuration
Remote files can be read directly via HTTPS or Amazon S3. Accessing files via HTTPS does not require any configuration. Accessing files located on Amazon S3 requires configuration of credentials.
The following S3 credentials are required:
-
S3_ACCESS_KEY_ID
-
S3_REGION
-
S3_SECRET_ACCESS_KEY
Use SET to configure the Access Key ID, Region, and Secret Access Key.
For more information, see SET.
Amazon S3 credentials can be configured for each connection. The pseudo table SYSTEM.CONNECTIONPROPERTIES can be used to check if credentials are configured.
READ_CSV
READ_CSV returns the data in the specified CSV file.
It takes only one parameter, the relative or absolute path to the CSV file.
The path can be local or remote.
The path to the file can be relative to external-data-dir, or absolute.
If path is absolute, it must be located within the external-data-dir.
| The CSV source file must be valid UTF-8. |
The data in the CSV file must be of type BOOLEAN, BIGINT, DOUBLE, TIME, DATE, TIMESTAMP, or VARCHAR. Any other data type will be interpreted as VARCHAR.
READ_PARQUET
READ_PARQUET returns the data in the specified parquet file.
It takes only one parameter, the relative or absolute path to the parquet file.
The path can be local or remote.
The path to the file can be relative to external-data-dir, or absolute.
If path is absolute, it must be located within the external-data-dir.
Syntax
READ_PARQUET('path/to/the/parquet/file.parquet')
READ_PARQUET('s3://path/to/the/file.parquet')
READ_PARQUET('https://path/to/the/file.parquet')
Example
SELECT
PRICE, PARKING, FURNISHINGSTATUS
FROM
READ_PARQUET('docs/test/houseprices.parquet')
WHERE
AREA > 12000
ORDER BY PRICE;
PRICE PARKING FURNISHINGSTATUS
-------- -------- -----------------
3500000 0 unfurnished
4900000 2 furnished
5943000 2 semi-furnished
6790000 2 furnished
6930000 1 furnished
9800000 2 furnished
10150000 0 unfurnished