Amazon S3 File Uploader
Amazon Simple Storage Solution, or s3 storage can be used for exporting files the following file types:
- json
- csv
- compressed csv
Prerequisites
Exports to S3 require that you create an Amazon AWS Credential key in the Key Editor with access to the storage target.
Parameters
- clear_files boolean
- optional
- Flag to indicate whether the bucket to which data is uploaded should be cleared before uploading data. By default this value is set to “true”
- compression string
- optional
- Indicates the compression method. Accepted values include
gzip
. If utilized, the configured file extension in thedestination
parameter must end with.gz
- destination string
- required
- The path to the location where the file will be uploaded. Our standard for google storage is:
gs://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_UUID.csv.gz
- The following date patterns may be used to fill in the current date automatically.
-
2022-06-01
(YYYY-MM-DD)
-
2022-6-1
(single digit month and date)
-
2022-Jun-1
(three letter month abbreviation)
-
2022-June-1
(full month name)
- The string
UUID
will be replaced with an eight digit alphanumeric string to ensure the uniqueness of the filename. It may be used in conjunction with date formatting such thats3://switchboard-example/fileYYYYMMDD_UUID.csv.gz
will be formatted as../file20220727_30FG06et.csv.gz
- We strongly recommend that if you are going to use these two patterns together you use the date first so that you can sort the files by date.
- format string
- required
- The format data should be uploaded as. Allowed values incude
csv
,json
,parquet
, andavro
- headers boolean
- optional
- Indicates whether or not a header row should be written to the destination file (for csv files)
- partition_count integer
- optional
- Some systems can’t handle really large files and as such uploads must be “partitioned” (made smaller). Typically you want only one partition, which is the default, but it may be convient if downstream processes consuming the data require smaller files.
- primary_source_name string
- required
- The name of the initial download from which data is being uploaded. Required for uploads where date a pattern is utilized in the naming convention.
- test_destination string
- optional
- The location to write data for the purpose of test runs — unless this is specified, test data will not be uploaded. Note that date based variables cannot be used in a test destination name.
Switchboard Script Syntax
upload example_s3_report to {
type: "s3-ng";
key: "my_aws_key";
// Our standard for s3 storage is:
// s3://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_NN.csv.gz
destination: "s3://my_aws_storage/prod/my_report_source/my_file_type/my_report_name/example_s3_report_YYYY-MM-DD.csv.gz";
headers: true;
compression: "gzip"; //option compression method, if utilized, the configured file extension in destination must end with .gz as shown above
format: "csv";
partition_count: 1;
clear_flies: false;
primary_source_name: "example_s3_report_raw"; //only necessary for uploads where date is utilized in the naming convention, corresponds with the initial download name
test_destination: "s3://my_aws_storage/.../example_report_test.csv.gz" //Test destinations are supported by using this field
};