Amazon S3

The Switchboard Amazon Simple Storage Service (S3) connector provides automated, scheduled ingestions of data from S3 in a variety of formats.

Prerequisites

To configure access to the Amazon S3 connector, you need:

  • Amazon AWS IAM Key/Secret Pair
  • Bucket Path
  • File Path

To add a new Amazon AWS IAM Key/Secret Pair:

  1. Log in to the Dashboard.
  2. Click on the Keys tab. 
  3. In the Options menu, select AWS.

NOTE: The Amazon AWS IAM Key/Secret Pair must have read access to the source bucket.


Scheduling

Amazon S3 connector can be scheduled to run multiple times a day at user-defined hour and timezone.

  • To configure this schedule, use the delay_hours parameter.
  • By default, the connector will run once at 6am PT.

Using Switchboard Static IP

If necessary due to IT or security policy, this connector can be configured to route traffic through one of Switchboard’s static IP addresses. To do so, include the parameter static_ip: true; in the Switchboard Script import statement.

Parameters

File Pattern string
required
A list of requested file patterns.
Example: pattern: “s3://source_bucket/my_pattern”;
Example: regex: “s3://source_bucket/my_file_name.”;
datime_pattern string
optional
Specifies the date and time pattern type.
Example: datetime_pattern: “YYYY-MM-DD”;
For additional information, see the Datetime Patterns section.
delay_hours integer
optional
Delay in hours the system waits between the previous update and the next update. Variances will be outlined in the scheduling section for each connector.
Example: delay_hours: 11;
format string
optional
Specifies a format type.
Example: format: “csv”;
For additional CSV specific parameters, see the File Formats and Encoding section.
lookback_days integer
optional
Limits the number of previous days that the DateTime pattern applies.
Example: lookback_days: 5;
period_hours integer
optional
Frequency of the system to checks for updates.
Example: period_hours: 3;

Specifying Patterns

File Patterns

Switchboard matches target S3 files based on wildcard patterns or regular expression. Switchboard polls the source bucket for new files that match the pattern or regular expression provided. By default, Switchboard re-ingest files upon detection of source file checksum.

  • To specify a file match by pattern, use the pattern parameter. The * character is used as a wildcard pattern match:
    pattern: "s3://source_bucket/my_pattern*";
    
  • To specify a file match using a regular expression, use the regex parameter containing a valid matching pattern:
    regex: "s3://source_bucket/my_pattern(a|b)_\d{6}.csv";
    

Datetime patterns

Configure Switchboard to poll for file names that match a date pattern. It allows importing the date-range backfill in the Switchboard UI. 

  • Add a datetime_pattern to the import configuration. Since target objects may have multiple dates in the filename, it is important to specify a pattern that matches the specific date string required.
    • To match the first date string in an S3 object name of s3://my_bucket/my_file_name_2020-01-01_2020_01-08.csv, use the following pattern:
      datetime_pattern: "*my_file_name_YYYY-MM-DD_*";
      
  • To limit the number of previous days that the DateTime pattern applies, use the lookback_days parameter. If no files are found for the lookback period, Switchboard will consider this an error. 
    • To locate files with a DateTime pattern that matches files within the past 5 days:
      lookback_days: 5;
      

File Formats and Encoding

Switchboard imports files with a variety of formats, encodings, and compression schemes.

File Formats

To specify a format type, use the format parameter:

format: "csv";

The available options are:

Format Description
csv Character-separate values by row. See CSV specific options
json New-line delimited JSON
parquet Parquet file format — note use, the s3:raw downloader instead
avro Avro file format

CSV specific options

Options Description
header_row Boolean: Skip header row
preamble_rows Count of leading rows to skip
postamble_rows Count of trailing rows to skip
delimiter Delimiter characters:
comma: “,”
pipe: “|”
tab: “\t”
thorn: “þ”
space: “ “
caret: “^”
semicolon: “;”

Encodings

By default, Switchboard files are encoded in UTF-8. Switchboard uses standard Java Charset encoding format strings. 

  • To specify a file encoding using Latin Alphabet No. 1, provide the encoding parameter in the  import statement:
    encoding: "ISO-8859-1";
    

Compression

Switchboard ingests files in gzip or zip compression formats. 

  • To specify files in gzip format, provide the parameter in the import statement:
    compression: "gzip";
    

Sample Switchboard Script

import s3_file_raw from {
     type: "s3_ng";
     regex: "s3://source_bucket/my_file_name.*";
     datetime_pattern: "*_YYYY-MM-DD_*";
     key: "aws_key";
     lookback_days: 1;
     } using {
             filename: string;
             date: datetime;
             bx_pub_name: string;
             placement: string;
             device: string;
             demand_source: string;
     };

For parquet files you should use the raw version of this downloader.

This version of the downloader does not support the same format, encoding and compression parameters as the above . It only supports the parquet format, with no other options.

download t from {
    type: "s3:raw";
    pattern: "s3://mybucket/somepath/*.parquet";
    datetime_pattern: "*file-YYYY-MM-DD.parquet";
    format: "parquet";
} using {
    idcol: integer;
    *
};