extract
extract( expr, pattern )
The extract function will extract an array or struct from the string using capture groups specified in the given regular expression.
Non-greedy Matching Syntax
Patterns used by the extract function must account for the entire string being evaluated. This means that if the pertinent data lies within a longer string, that content needs to be accounted for in the pattern.
For example, if “awesome” was the only portion of the string to be extracted, the pattern to utilize would be ‘.*=(\w+$)’
Using Unnamed Capture Groups
The regular expression (\w+)=(\w+)
has three unnamed capture groups.
When applied to the input “switchboard=awesome”, it will produce an array:
["switchboard=awesome", "switchboard", "awesome"]
The first element of the array will always be the entire input.
import t1 from rows {
"switchboard=awesome"
} using {
c: string
};
table t2 is select
extract(c, literal "(\w+)=(\w+)") as c1
extract(c, literal "(?<key>\w+)=(?<value>\w+)") as c2
from t1;
Using Named Capture Groups
The regular expression (?<key>\w+)=(?<value>\w+)
contains two named capture groups
When this expression applied to the same input it will yield a structure:
{key: "switchboard", value: "awesome"}
You may extract individual array elements and structure fields in subsequent select statements.
import t1 from rows {
"switchboard=awesome"
} using {
c: string
};
table t2 is select
extract(c, literal "(?<key>\w+)=(?<value>\w+)") as c2
from t1;
Example: Extracting Number From Within a String
The regular expression syntax used by Switchboard supports the typical set of regex meta character classes.
Below is an example using \d
which matches any decimal digit 0-9
// Source data: "adunit_1234567_September_video"
// Returns: "1234567"
extract(source_column, literal '.*_(\d{7})_.*')[1]
Example: Extracting First N characters of a string
The following example demonstrates how to specify the number of character to match using the {
and }
mata character.
// Source data: "adunit_1234567_September_video"
// Returns: "adunit"
extract(source_column, literal '^(.{6}).*')[1]
Example: Multiple Matches
This example demonstrates how to reference matches when multiple matches are expected based on the given expression.
In the following example, the given expression matches both the numeric portion of the string as well as the month name that comes after it.
// Source data: "adunit_1234567_September_video"
// Returns: "1234567"
extract(source_column, literal '.*_(.*)_.*')[1]
// Source data: "adunit_1234567_September_video"
// Returns: "September"
extract(source_column, literal '.*_(.*)_.*')[2]