SFTP
The Switchboard SFTP Connector provides automated, scheduled ingestions of data from SFTP servers in a variety of formats.
Prerequisites
To configure access to the SFTP Connector, you need:
- SFTP Hostname
- SFTP Port
- An SFTP Account Username
- Either an account password, or a SSH Private Key
Scheduling
The SFTP connector can be scheduled to run multiple times a day at user-defined hour and timezone.
- To configure this schedule, use the delay_hours parameter.
- By default, the connector will run once at 6am PT.
Sample Switchboard Script
import from {
type: "sftp_ng";
pattern: "/my_folder/my_file_pattern*.csv";
key: "amobee_key";
} using {
filename: string;
Day: datetime;
Campaign_Name: string;
Operative_Order_ID: string;
};
For parquet files you should use the raw
version of this downloader.
This version of the downloader does not support the same format
, encoding
and compression
parameters as the above . It only supports the parquet format, with no other options.
download t from {
type: "sftp:raw";
pattern: "/my_folder/my_file_pattern*.csv";
datetime_pattern: "*file-YYYY-MM-DD.parquet";
format: "parquet";
} using {
idcol: integer;
*
};
Parameters
Parameter | Description | Required/Optional? |
---|---|---|
File Pattern | A list of requested file patterns: pattern: “source_folder/my_pattern”; regex: “/source_folder/my_file_name.”; |
Required |
lookback_days | Limits the number of previous days that the DateTime pattern applies. For Example: lookback_days: 5; |
Optional |
period_hours | Frequency of the system to checks for updates. For Example: period_hours: 3; |
Optional |
delay_hours | Delay in hours the system waits between the previous update and the next update after midnight. For Example: delay_hours: 11; |
Optional |
format | Specifies a format type. For Example: format: “csv”; For additional CSV specific parameters, see the File Formats and Encoding section. |
Optional |
datime_pattern | Specifies the date and time pattern type. For Example: datetime_pattern: “YYYY-MM-DD”; For additional information, see the Datetime Patterns section. |
Optional |
Specifying Patterns
File Patterns
Switchboard matches target SFTP files based on wildcard patterns or regular expression. Switchboard polls the source bucket for new files that match the pattern or regular expression provided. By default, Switchboard re-ingest files upon detection of source file checksum.
- To specify a file match by pattern, use the pattern parameter. The * character is used as a wildcard pattern match:
pattern: "/folder/my_pattern*";
- To specify a file match using a regular expression, use the regex parameter containing a valid matching pattern:
regex: "/folder/my_pattern(a|b)_\d{6}.csv";
Datetime patterns
Configure Switchboard to poll for file names that match a date pattern. It allows importing the date-range backfill in the Switchboard UI.
- Add a datetime_pattern to the import configuration. Since target objects may have multiple dates in the filename, it is important to specify a pattern that matches the specific date string required.
- To match the first date string in an file name of /source_folder/my_file_name_2020-01-01_2020_01-08.csv, use the following pattern:
datetime_pattern: "*my_file_name_YYYY-MM-DD_*";
- To match the first date string in an file name of /source_folder/my_file_name_2020-01-01_2020_01-08.csv, use the following pattern:
- To limit the number of previous days that the DateTime pattern applies, use the lookback_days parameter. If no files are found for the lookback period, Switchboard will consider this an error.
- To locate files with a DateTime pattern that matches files within the past 5 days:
lookback_days: 5;
- To locate files with a DateTime pattern that matches files within the past 5 days:
File Formats and Encoding
Switchboard imports files with a variety of formats, encodings, and compression schemes.
File Formats
To specify a format type, use the format parameter:
format: "csv";
The available options are:
Format | Description |
---|---|
csv | Character-separate values by row. See CSV specific options |
json | New-line delimited JSON |
parquet | Parquet file format — note, use the sftp:raw downloader instead |
avro | Avro file format |
CSV specific options
Options | Description |
---|---|
header_row | Boolean: Skip header row |
preamble_rows | Count of leading rows to skip |
postamble_rows | Count of trailing rows to skip |
delimiter | Delimiter characters: comma: “,” pipe: “|” tab: “\t” thorn: “þ” space: “ “ caret: “^” semicolon: “;” |
Encodings
By default, Switchboard files are encoded in UTF-8. Switchboard uses standard Java Charset encoding format strings.
- To specify a file encoding using Latin Alphabet No. 1, provide the encoding parameter in the import statement:
encoding: "ISO-8859-1";
Compression
Switchboard ingests files in gzip or zip compression formats.
- To specify files in gzip format, provide the parameter in the import statement:
compression: "gzip";