Cleaning and filtering data can sometimes go beyond the basic find and replace or "equals", "greater than" or "less than" statement. In these instances, there are Regular Expressions or REGEX for short. The NinjaCat Template Builder includes support for the use of REGEX for the purpose of text manipulation to aid in data cleaning. This article is a brief resource with some general REGEX knowledge, tips for use within NinjaCat, and links to external resources to broaden your REGEX knowledge.
Section Links
Resources
Regex Basics
Regex Filter Examples
Find and Replace
Extra Examples
Resources
Regex Basics
What it does |
Regex can look for dynamic variables, any sequence of characters that we specify, rather than having to match an exact string Regex operates left to right, character followed by another character |
|
|
Flags |
|
g |
Global search |
i |
Ignore case, can be capital or lower case |
m |
Multiline search, continues after carriage returns |
|
|
Characters |
|
// |
Slashes are delimiters that indicate the beginning and end of your regular expression |
$ |
End of the string |
\ |
“Escape it” Using a backslash before a reserved character actually looks for the actual character rather than functioning as a reserved character |
. |
Period means look for any character |
+ |
When using + this looks at the previous character to say look for one or more of the previous character |
* |
Asterisk or star, matches as many times as possible, matches 0 or more of the preceding character |
[ ] |
Braces match anything within the braces Example: [A-Z] matches any character A-Z Example: [A-Z]+ matches any character A-Z any number of times Example: [ENAM] matches any character E or N or A or M Example: [ENAM]+ matches any character E or N or A or M any number of times |
^ |
Start of String when used outside of Brackets Inverse when used in brackets match anything that is not Example: [^ENAM] matches any character thats that are not E or N or A or M |
\s |
Representation of a “space” you can use a “space” or the representation of a space to indicate a space Example: [^\s] matches anything that is not a space one time Example: [^\s]+ matches anything that is not a space until a space |
\S |
Represents any non-whitespace, excludes a space, tab, or new line |
{ } |
Brackets used to specify number of matches you want for the preceding character or set Example: a{3} matches “a” 3 times Example: a{3,} matches “a” 3 or more times Example: a{3,6} matches “a” between 3 and 6 times |
( ) |
Parentheses, creates a capture group, match anything between these parentheses Using question mark, colon (?: ) will indicate capture what i want in the parentheses but don’t create a group |
$ |
Used for returning a capture group Example: $1 will return whatever is between the first set of ( ) |
? |
Question mark, look for 0 - 1 of the preceding character |
Regex Filter Examples
Contains This but NOT That |
^(?=.*contains this)(?!.*but does not contain that).* |
Any string that contains apples, pears or peaches but NOT peas, onions or carrots: |
^(?=.*(apples|pears|peaches).*)(?!.*(peas|onions|carrots).*).*$ |
Contains This AND That |
^(?=.*contains this)(?=.*and contains that).* |
Contains This OR That |
(this|that) |
Contains This OR This OR This |
(contains this|or this|or this) |
NOT-Contains This OR That |
^(?!.*not contains this|.*that).* ^(?!.*not contains this)(?!.*and not contains that).* |
NOT This OR That |
^(?!this|that).* |
Include Special Characters that Regex would recognize as a command in a search string by using a backslash before the special character |
ex: source medium filter Google / cpc | facebook / paidsocial (google \/ cpc|facebook \/ paidsocial) |
^New\s\w*$ |
Starts with ‘New’, followed by a space, followed by any word, followed by nothing else. For example New Nissan New Kia Would EXCLUDE ‘New Kia Soul’ |
^New\s\w*\s.* |
Starts with ‘New’, followed by a space, followed by any word, followed by a space, followed by anything. Examples: New Kia Soul New Kia Honda New Kia Soul Is So Cool New Honda Pilot is what my husband drives Would EXCLUDE ‘New Kia’ or ‘New Honda’ |
Find and Replace (make sure to check mark Regex)
Tactic |
Find |
Replace |
Custom Dimension Rename Regex Take the entire string and replace it, when it contains This |
^.*(This).*$ |
With whatever string you type |
Custom Dimension Rename Regex Take the entire string and replace it, when it contains This OR That |
^.*(This|That).*$ |
With whatever string you type |
Find Groups of text within a string separated by a dash and Replace the Name with just one Group of the String |
^(.*)-(.*)-(.*)-(.*)-(.*)-(.*)$ |
$1 $2 $3 |
Find the first string of contiguous digits and replace with only the same string of digits |
(\d+) matches contiguous strings of digits. |
$1 replaces any match with the result of the first capturing group. |
The "Find" will match any character that is not a number, and the empty replace will cause all non-numbers to be replaced with nothingness |
[^0-9] |
|
Find the first group of numbers and replace with only the first group of numbers. That will match the whole thing, and capture the first group of numbers. Then replace the match with the capture. |
.*?([0-9]+).* |
$1 |
Remove one blank at the end of a string of character. Note this will only remove ONE blank at end - not all blanks.
|
^(.*)\s+$ |
$1 |
In a string like, “Darwin Family Dental Care | FL” remove all characters through to and including the space after the “|” leaving just FL. |
^(.*)\s+$ |
$1 |
Creative Use for Dimension Rename Filtering using Replacement (credit to Aidan) |
Use Case: |
Example: |
Specifically in charts only one Dimension Can be selected Filters can be applied but the filter will be applied after the Replacement |
Campaign name looks like: “WR - Search - LocationName” |
|
Filter “Starts with” “WR - Search - “ This returns all campaign names that start with “WR - Search - “ |
|
"Dimension Replacement: Find: WR - Search - Replace: “Blank”" |
Result: No data is returned because we are trying to filter on campaign names starting with “WR - Search - “ but the campaign names no longer start with “WR - Search - “ Because we re-named them trying to shorten them to just the “locationName”
|
-
-
We take a reverse approach by excluding and re-naming
-
We use a Regex to find all results that dont match “WR - Search - “ and re-name all the non matching campaign names to “A”
-
So we are keeping the campaign names we want to return intact in their original names
-
We add a second replacement to remove “WR - Search - “
-
Then we apply a “campaign name” filter
-
The result is campaign names showing just the Location name with the prefix “WR - Search - “removed and all non “WR - Search - “ campaigns filtered out
-
#HASHTAGS
-
Isolating #HASHTAGS within a Post string to combine and count them
-
regex101: build, test, and debug regex
-
#pavement The lake is a long way from here.
-
#maximum A song can make or ruin a person’s day if they let it get to them.
-
#migration My Mum tries to be cool by saying that she likes all
-
Find: ^(#\w*)?.*$
-
Replace: $1w
-
Result:
-
#pavement
-
#maximum
-
#migration
Jays EXAMPLES:
Find = ^(.*) - (.*)$
Replace = $1
Result = Dedicated Hosting
Bucketing/Grouping Numbers:
-
Example: Creating custom age group buckets to display in charts and tables. The client wants to bucket ages into custom groups. 22-31, 32-41, 42-51, 52-61, 62-71, 72-81, 112-121
Comments
0 comments
Article is closed for comments.