Amazon S3 File Uploader
Amazon Simple Storage Solution, or s3 storage can be used for exporting files the following file types:
- json
- csv
- compressed csv
- avro
- parquet
Prerequisites
Exports to S3 require that you create an Amazon AWS Credential key in the Key Editor with access to the storage target.
Parameters
- clear_files boolean
- optional
- Flag to indicate whether the bucket to which data is uploaded should be cleared before uploading data. By default this value is set to “true”
- compression string
- optional
- Indicates the compression method. Accepted values include
gzip
. If utilized, the configured file extension in thedestination
parameter must end with.gz
- destination string
- required
- The path to the location where the file will be uploaded. Our standard for S3 is:
s3://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_NN.csv.gz
- The following date patterns may be used to fill in the current date automatically.
-
2022-06-01
(YYYY-MM-DD)
-
2022-6-1
(single digit month and date)
-
2022-Jun-1
(three letter month abbreviation)
-
2022-June-1
(full month name)
- The string
UUID
will be replaced with an eight digit alphanumeric string to ensure the uniqueness of the filename. It may be used in conjunction with date formatting such thats3://switchboard-example/fileYYYYMMDD_UUID.csv.gz
will be formatted as../file20220727_30FG06et.csv.gz
- We strongly recommend that if you are going to use these two patterns together you use the date first so that you can sort the files by date.
- format string
- required
- The format data should be uploaded as. Allowed values incude
csv
,json
,parquet
, andavro
- headers boolean
- optional
- Indicates whether or not a header row should be written to the destination file (for csv files)
- single_partition boolean
- optional
- Ensure uploader creates a single file when file uploads are greater than 100Mb. Maximum single filesize is 5Gb.
- primary_source_name string
- required
- The name of the initial download from which data is being uploaded. Required for uploads where date a pattern is utilized in the naming convention.
- test_destination string
- optional
- The location to write data for the purpose of test runs — unless this is specified, test data will not be uploaded. Note that date based variables cannot be used in a test destination name.
Switchboard Script Syntax
upload example_s3_report to {
type: "s3";
key: "my_aws_key";
// Our standard for s3 storage is:
// s3://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_NN.csv.gz
destination: "s3://my_aws_storage/prod/my_report_source/my_file_type/my_report_name/example_s3_report_YYYY-MM-DD.csv.gz";
headers: true;
compression: "gzip"; //option compression method, if utilized, the configured file extension in destination must end with .gz as shown above
format: "csv";
single_partition: true;
clear_flies: false;
primary_source_name: "example_s3_report_raw"; //only necessary for uploads where date is utilized in the naming convention, corresponds with the initial download name
test_destination: "s3://my_aws_storage/.../example_report_test.csv.gz" //Test destinations are supported by using this field
};