extract

extract( expr, pattern )

The extract function will extract an array or struct from the string using capture groups specified in the given regular expression.

Non-greedy Matching Syntax

Patterns used by the extract function must account for the entire string being evaluated. This means that if the pertinent data lies within a longer string, that content needs to be accounted for in the pattern.

For example, if “awesome” was the only portion of the string to be extracted, the pattern to utilize would be ‘.*=(\w+$)’

Using Unnamed Capture Groups

The regular expression (\w+)=(\w+) has three unnamed capture groups.

When applied to the input “switchboard=awesome”, it will produce an array:

["switchboard=awesome", "switchboard", "awesome"]

The first element of the array will always be the entire input.

import t1 from rows {
    "switchboard=awesome"
} using {
    c: string
};

table t2 is select
        extract(c, literal "(\w+)=(\w+)") as c1
        extract(c, literal "(?<key>\w+)=(?<value>\w+)") as c2
from t1;

Using Named Capture Groups

The regular expression (?<key>\w+)=(?<value>\w+) contains two named capture groups

When this expression applied to the same input it will yield a structure:

{key: "switchboard", value: "awesome"}

You may extract individual array elements and structure fields in subsequent select statements.

import t1 from rows {
    "switchboard=awesome"
} using {
    c: string
};

table t2 is select
        extract(c, literal "(?<key>\w+)=(?<value>\w+)") as c2
from t1;

Example: Extracting Number From Within a String

The regular expression syntax used by Switchboard supports the typical set of regex meta character classes.

Below is an example using \d which matches any decimal digit 0-9

// Source data: "adunit_1234567_September_video"
// Returns: "1234567"
extract(source_column, literal '.*_(\d{7})_.*')[1]

Example: Extracting First N characters of a string

The following example demonstrates how to specify the number of character to match using the { and } mata character.

// Source data: "adunit_1234567_September_video"
// Returns: "adunit"
extract(source_column, literal '^(.{6}).*')[1]

Example: Multiple Matches

This example demonstrates how to reference matches when multiple matches are expected based on the given expression.

In the following example, the given expression matches both the numeric portion of the string as well as the month name that comes after it.

// Source data: "adunit_1234567_September_video"
// Returns: "1234567"
extract(source_column, literal '.*_(.*)_.*')[1]

// Source data: "adunit_1234567_September_video"
// Returns: "September"
extract(source_column, literal '.*_(.*)_.*')[2]