Scripting Best Practices

Introduction

Switchboard extracts data from a variety of data sources so it can be transformed into a normalized schema and loaded into any cloud destination for analysis.

We use our pre-built connectors to authenticate and interact with difference data sources and repositories. We extract data from data sources using downloaders. Similarly, we have corresponding uploaders that allow us to push your data to your desired destination.

You will use our Switchboard Script (SBS) to indicate what data download into our system and how via the connectors, do any necessary transformations, and then send the result to a destination you select with an uploader.

This guide will help you gather the information necessary to set up your connectors so you can start working on your scripts.

Basic Requirements

Data Source and Destination Identification

Before you begin, you will need to ensure that you have identified both your data sources you wish Switchboard to access and set up the data warehouse location into which you want your data uploaded.

Obtaining Credentials

Switchboard will need access to your data sources in order to retrieve its data.

The method by which access is obtained will vary depending upon the data source, but typically this requires a username/password or a token (such as an API Key). In some cases we will require you to login via OAuth in order to obtain the credentials.

In our documentation for our connectors we attempt to guide you through the process of obtaining the credentials Switchboard will need or whom you should contact for a given data source. If you need assistance, please check in with your Switchboard technical account manager to enable API reporting.

Adding Credentials to Switchboard

Moving data from its various sources into a data warehouse requires that Switchboard have credentials (”Keys”) to all the 3rd party systems to which it connects.

The Key Editor of our application allows Switchboard to securely store credentials to those systems.

To add a Key:

  • From the left hand menu, select “Keys”
  • Press the “+ Key” button. The “Add Key” panel will appear
  • Select the type of data source for which you wish add credentials from the “Credential Type” drop down
  • Give your key a sensible name and add the required values (username/password, API Key, etc.)

For more information, see the Key Editor Guide.

Understand Your Data Source

  • Determine what your reporting data requirements are, such as metrics and dimensions
  • Similarly, you will need to know how often new data will be available for Switchboard so you can properly configure scheduling
  • Determine if there are particular properties that are required for your data source like “Network ID” or report names. Check our connector documentation for the data source or contact support@switchboard-software.com for additional information
  • How many connections will accessing the data require?
    • Does having multiple accounts/seats or accessing different reports (e.g., Google Ad Manager with ”Revenue Report” and “Yield Partner Report”) require Switchboard to make multiple connections? This can be an issue with email reports and API data sources.
  • How is you data accessed? Is it API or file based?

Downstream Requirements

To take advantage of Switchboard’s transformation capabilities, consider how the consumers of your data will be processing it.

  • Does the data need to be normalized?
  • Does data need to adhere to specific naming conventions on upload, such as file patterns
  • Is the data part of a larger ecosystem that uses a dashboard or another data visualization tool?
  • Will you need access to the raw data? While the data going into a data warehouse where it’s accessed by reporting/visualization tools may be modified (e.g., for normalization purposes), the raw data may also be piped into a separate cold storage location.

Scripting Basics

See our Scripting Guide for full details about working with our script editor.

Published Scriptbook vs Draft

The scripts that Switchboard executes to extract, transform, and load your data are part of a “scriptbook” for which there are two states.

The published version of the scriptbook is what is currently running in production. It cannot be edited

Draft versions of the scriptbook may be created and be used for testing, editing, adding new configurations before they published. All modifications to scripts (or “scripting”) occur in this draft state.

A history is kept of all your published version of scripts if you ever need to revert to a previous version and multiple drafts versions may exist at once that can be merged into the published scriptbook.

Working With Drafts

To create a draft script, Select Scripts » Drafts and click the “+ Draft” button.

Using the Diff Editor

The Diff Editor allows you to see the changes between your draft version of the scriptbook and the published version of the scriptbook.

This will include files that have been added, removed, or changed as well as the individual lines that have been modified within a file.

Below shows examples as to where lines have been removed and where lines and been removed and replaced with new lines.

The diff editor will default to showing only the changed portions of files.

Diff editor example

From within the diff editor, you can accept the version as published or as drafted. Or you can keep both changes and resolve the duplication yourself.

To access the Diff Editor, open the a draft version of your script and click the page icon with the plus sign on it at the bottom of the left hand column of the script editor.

Body of the Script

The a Switchboard Script consists of statements that will be executed to do one of three things:

  • import the data from a source into Switchboard as part of a download declaration
  • transform the imported data
  • export the data from Switchboard to a data warehouse location as part of an upload declaration

A very simple script showing the download and upload declarations is shown below where the source and destination is a Google sheet

import google_sheet_example_import from {
            type: "google_sheets";
            key: "my_google_outh_key";
            spreadsheet_id: "{your google UIID}";
            data_range: "A:H";
} using { 
        "name" : string;
        "phone" : string;
        "email" : string;
        "address" : string;
        "postalZip" : string;
        "region" : string;
        "country" : string;
        "currency" : string;
};

export google_sheet_example_import to {
       type: "google_sheets";
       key: "my_google_outh_key";
       primary_source_name: "google_sheet_test_import";
       folder_name: "switchboard_uploads";
       spreadsheet_name: "Sample Upload"; // Required
       test_spreadsheet_name: "Sample Upload";
 };

Download Declaration

The download declaration consists of two blocks demarcated by curly braces { }

The first block is preceded with the keyword import

It is followed by the name you are giving to the download as well as parameters that describe the means by which the download is obtained, including the connector, the credentials, and any other parameters to will define the scope of the data being retrieved.

Some parameters defined within this block will be common to connectors such as scheduling parameters, others will be specific to that connector.

Immediately after the import block is the using block which defines the schema of the data being downloaded, that is the name of the data fields and their data type.

We’ll discuss the different parts of the download declaration further.

Download Name

required

  • The string of characters that appears immediately after the import keyword
  • The download name must be unique within a scriptbook. You can see a list of the downloads you’ve named in the script editor by selecting the “Imports” tab from the lefthand side bar (the arrow pointing down 2nd from the top)
  • Name initial download table sensibly
    • any user can understand the source of uploaded or otherwise referenced data
    • users can easily differentiate between data sources.
    • Example: gam_report_raw is not sensible when there are multiple Google Ad Manager (GAM) connections. Instead be more explicit with names like gam_yield_report_raw and gam_custom_dimensions_report_raw.
    • Consistently differentiate the state of the data using suffixes such as raw, normalized, or final

Connector Type

required

  • Must be a valid connector within Switchboard. See our list of connectors for more info

Credentials

required

  • Must be a valid Key for the connector type (see the connector specific documentation for details)
  • Must have been previously defined in the Keys section

Reporting Fields

data source dependent

  • Metrics/dimensions or otherwise named
  • Different requirements depending on the data source - App ID, Network ID, Report Name

Scheduling

optional

For full details see our scheduling parameters guide. Some parameters may be combined but not all

  • By default our connectors pull data 1x a day at 6 AM Pacific
  • repull_days
    • commonly used: Repulls the previous N days of reporting to capture any restatements for API based connections
  • lookback_days
    • commonly used: Looks back at the previous N days for any new or restated files. Can be applied to some API based connections
  • Other options
    • timezone
      • Set the timezone for your API call schedule: ex 6 AM GMT
    • period_hours
    • hour_of_day
    • day_of_week

Schema

The schema is defined in the using section of the import statement. It consists of the field names and data types of the columns being imported.

  • Depending on data source will be positional (? check with MM on language - placement matters but field name doesn’t have to match) or json (must be exact match to field but placement doesn’t matter)
  • Declare the column name and data type. You can alter the data type in transformation tables downstream as needed
  • Data type

Transformation Tables

Before exporting data to its data warehouse, Switchboard allows you to do numerous modifications or “transformations” to the data.

These transformations tables will reference previously declared download or other table names.

Below is a simple transformation that uses one of our built in date/time functions to convert data-time data types into a specified timezone.

table my_transformed_csv_data is
    select
        name,
        date,
        value,
        processing_datetime() as processed_timestamp
from my_csv_from_s3;

The transformation table is declared using the keyword table followed by a name unique to the scriptbook.

After the name is the keyword is following a SQL like select statement.

Below are some uses cases for transformation. For additional documentation, see Switchboard Script for SQL Users

Normalization

Below are some examples of changes you may want to make to the data to normalize it, especially if you are combining data from multiple sources.

  • renaming fields
  • stripping characters not supported by data type or data warehouse
  • removing duplicate data (de-duping)
  • reordering fields
  • changing data type
    • ex: cast datetime to date, integer to float

Applying Business Rules

  • sums
  • case statements
  • haircuts (e.g., sum(revenue * 0.8) **as** revenue )

Enriching Data

Joins

Switchboard supports SQL like syntax for joins, including outer (or left) joins and unions.

String Manipulation

Switchboard offers numerous string manipulation functions such as:

  • concat
  • extract
  • replace
  • split
extract(CustomTargeting, literal "(?:^|.*;?)indb2b=(.*?)(?:;.*|$)")[1] as indb2b

Json Unnesting and Exploding Data

When uploading data you can declare a field to be JSON as a datatype and retrieve indvidual subproperties or explode the data.

Upload Declarations

As with the download declaration, the upload declaration is contained with curly braces { } , this time preceded by the export keyword.

Data Source to Upload

required

The string of characters between the export and the to keyword incate from where the uploaded data comes from. This will be a previously defined download or table name.

That is, you may reference either the raw download (i.e., the original data as downloaded) or a table that has been used to normalize or otherwise transform the data.

Upload Type

required

The upload type will correspond to one of the Switchboard Uploaders. This will be a file based location (e.g., Amazon S3 or Google Cloud) or a data warehouse (e.g., Big Query or Snowflake).

See our list of uploaders for the details.

Credentials

required

As with the download declaration, you must specify the name of a Key for the uploader defined in the Keys section of the Switchboard App.

Truncate

optional

Truncate is a boolean parameter used to indicate if the current data in the table should be dropped (true) or appended to (false).

The default value to true

Destination Fields

data source dependent

Each uploader will have specific optional and required parameters based on how it stores data.

Using Date Patterns in Uploader Destinations

Often you will want to use date patterns in the path or location where data that is uploaded will be kept in order to distinguish it from previous uploads.

Switchboard uses a consist convention for specifying the format using characters found is ISO date formats (ISO 8601) — e.g., YYYY - MM - DD where. YYYY is the year [all the digits, i.e. 2012] MM is the month [01 (January) to 12 (December)] DD is the day [01 to 31].

For full details see the documentation in the uploader or our guide to Date Formats in Switchboard

Data Architecture Best Practices

Raw Data

You will need to decide whether or not you wish to retain your raw (original) data from your data source.

On the one hand, retaining your raw data means it can be used for validation and re-processing after the data retention window from your original data source has closed.

However, it can can increase data storage costs depending on data size.

If you do decide to retain your raw data, keep in mind:

  • “Raw” data can include some clean up based on data warehouse requirements
    • For example, clean up could include properly formatting the data. An integer value might need trailing whitespace stripped off before it can be migrated into a data warehouse.
  • Leaving fields as close to vendor names will be helpful for assisting with vendor troubleshooting as needed

Normalized Data

  • Are your business rules stable? Can build directly before upload to data warehouse
  • Are you expecting changes? Apply business rules once in data warehouse
  • Data retention
    • Dependent on data sources - can be anywhere from days to multiple years

Backfills

Backfills allow you to download/upload data from an earlier date or date range (e.g., you have have added a new transformation that needs to be applied to past data)