Amazon S3 File Uploader

Amazon Simple Storage Solution, or s3 storage can be used for exporting files the following file types:

  • json
  • csv
  • compressed csv

Prerequisites

Exports to S3 require that you create an Amazon AWS Credential key in the Key Editor with access to the storage target.

Parameters

clear_files boolean
optional
Flag to indicate whether the bucket to which data is uploaded should be cleared before uploading data. By default this value is set to “true”
compression string
optional
Indicates the compression method. Accepted values include gzip. If utilized, the configured file extension in the destination parameter must end with .gz
destination string
required
The path to the location where the file will be uploaded. Our standard for google storage is:
gs://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_UUID.csv.gz
 
The following date patterns may be used to fill in the current date automatically.
  • 2022-06-01 (YYYY-MM-DD)
  • 2022-6-1 (single digit month and date)
  • 2022-Jun-1 (three letter month abbreviation)
  • 2022-June-1 (full month name)
 
The string UUID will be replaced with an eight digit alphanumeric string to ensure the uniqueness of the filename. It may be used in conjunction with date formatting such that s3://switchboard-example/fileYYYYMMDD_UUID.csv.gz will be formatted as ../file20220727_30FG06et.csv.gz
 
We strongly recommend that if you are going to use these two patterns together you use the date first so that you can sort the files by date.
format string
required
The format data should be uploaded as. Allowed values incude csv, json, parquet, and avro
headers boolean
optional
Indicates whether or not a header row should be written to the destination file (for csv files)
partition_count integer
optional
Some systems can’t handle really large files and as such uploads must be “partitioned” (made smaller). Typically you want only one partition, which is the default, but it may be convient if downstream processes consuming the data require smaller files.
primary_source_name string
required
The name of the initial download from which data is being uploaded. Required for uploads where date a pattern is utilized in the naming convention.
test_destination string
optional
The location to write data for the purpose of test runs — unless this is specified, test data will not be uploaded. Note that date based variables cannot be used in a test destination name.

Switchboard Script Syntax


upload example_s3_report to {

    type: "s3-ng";
    key: "my_aws_key";
    
    // Our standard for s3 storage is:
    // s3://<bucket_name>/<purpose:prod|dev|test>/<source_name>/<file_type:csv|json|parquet|avro>/<report_name>/<report_name>_YYYY-MM-DD_NN.csv.gz
    destination: "s3://my_aws_storage/prod/my_report_source/my_file_type/my_report_name/example_s3_report_YYYY-MM-DD.csv.gz";
    headers: true;
    compression: "gzip"; //option compression method, if utilized, the configured file extension in destination must end with .gz as shown above
    format: "csv";
    partition_count: 1;
    clear_flies: false;
    
    primary_source_name: "example_s3_report_raw"; //only necessary for uploads where date is utilized in the naming convention, corresponds with the initial download name
    test_destination: "s3://my_aws_storage/.../example_report_test.csv.gz" //Test destinations are supported by using this field
};