Do you ever wish you could search for multiple strings in a Google Analytics report? Or set up one trigger to fire on multiple pages in Google Tag Manager? Are you constantly having to create complex filters? Or running out of slots in your Segments or advanced searches?
If so, I’m here to introduce you to your new best friend: regular expressions.
What are Regular Expressions (RegEx)?
A regular expression (written as RegEx, regex, or regexp) is a coded text string that describes a pattern or set of patterns in order to search text. Some variation of RegEx is in most scripting languages, but you can also use it in all Google Marketing Platform products. It sounds complex and it might be a little intimidating at first, but once you get the hang of a few key characters, you’re on your way to reporting like a pro!
Basic Characters
There are special characters in RegEx that each mean different things, but by combining these characters, you can create very powerful patterns. The list below is not comprehensive, but rather includes RegEx that I use often and recommend starting with.
Character | Definition | Example | Example Pattern |
| | A bar/pipe is used to mark an “or” | infotrust|InfoTrust | Contains: infotrust or InfoTrust |
( ) | Parentheses can be used to group different text together | (Info)Trust | Contains: InfoTrust |
[ ] | Square brackets can be used to express that anything within them is interchangeable | b[aeiou]bble | Contains: |
[a-b] [0-9] | A hyphen between digits or letters with square brackets around it can be used to designate a range of letters or digits/numbers | [b-f]at | Contains: |
? | A question mark is used to declare that the previous character is optional | favou?rite | Contains: |
* | An asterisks is used to indicate that the previous character can be optional or repeated unlimited times | go*gle | Contains: |
+ | A plus sign is used to say that the previous character can be repeated unlimited times | go+gle | Contains: |
{ } | Curly brackets with a number inside can be used to create multiples of the previous character (or group of characters) | b{3} | Contains: |
{ , } | Curly brackets with numbers separated by a comma, is used to designate a specific range of multiples for the previous character (or group of characters) | a{3,6} | Contains: |
\ | Escape any of the special characters | InfoTrust is the Sh\*t | Contains: |
\d | A single digit from 0 to 9 (a short hand for range [0-9]) | \d | Contains: |
\n | Designates a new line in the text | Hello\nWorld | Contains: |
. | A period represents a single character (digit, letter, or character) | inf.trust | Contains: infotrust, inf*trust, or inf8trust |
^ | A carrot helps identify the beginning of string | ^cat | Begins with: cat |
$ | A dollar sign marks the end of a string | dog$ | Ends with: dog |
Common Combinations
Using these common combinations, along with the basic characters, you can quickly start using RegEx in your reporting.
Combo | Definition | Example | Pattern |
.* | Technically this combination is any character + any multiple of previous character. Effectively this becomes any collection of characters. | .*\.example\.com | Contains: Example.com, sub1.example.com, or sub2.example.com (Will count all subdomains as long as the hostname ends in “.example.com”) |
(( )) | Nested parentheses are used to group different actions together especially when you want another RegEx character to act on a whole set of characters. It always reminds me of PEMDAS in middle school! | rege(x(es)?|xps?) | Contains: regex, regexes, regexp, or regexps |
( )? | Using parentheses with an action like a question mark means that the whole group within the parentheses is subjected to the RegEx character of the question mark’s action. | g(oog)+le | Contains: google, googoogle, googoogoogle, googoogoogoogle, etc |
\d{ } | \d and curly brackets can help with any number patterns like for phone numbers or SSN. | \d{5}(-\d{4})? | Contains: 03948-4758 (aka a zip code) |
\d+ | \d and the plus sign can help with integers for if you have a range of values from 0 – 100000 and want to be able to account for all possible values. | \d+(\.\d\d)? | Contains: X.XX, XX.XX, XXX.XX, XXXX.XX, etc (A positive integer or a floating point number with exactly two characters after the decimal point. X is a digit [0-9]) |
\? \. \/ | The backslash character with any of the RegEx characters turns the RegEx character back into a regular character. | www\.example\.com\/test\?p=xtest | Contains: www.example.com\test?p=xtest |
^ $ | When a carrot and dollar sign are used then you are saying that the string is EXACTLY whatever is in between these two characters. | ^InfoTrust$ | Is exactly: InfoTrust |
Tools to Help
The only downside of RegEx is that they take a little while to get the hang of, so you should always test them—especially when starting out so that you ensure you’re using the characters correctly, but also to test your skills if you’re trying a new combination of characters. To help me learn (and continually use) RegEx, I have two different types of RegEx tools: one for pattern visualization and one for matching.
Pattern Visualization:
Pattern visualizations can be used to work out RegEx to make sure you’re creating the correct pattern (especially since it’s easy to forget a parenthesis or character when writing exceptionally long expressions). There are multiple tools that are free to use, but I like the simplicity of regexper.com. I recommend using something like this tool especially as you’re starting out to make sure you are getting the hang of the new “language” of RegEx. I found it really helped when trying to help visualize nested functions since a long string can quickly get messy.
Example RegEx: .*(This Is A Tool (That Helps (Visualize|Simplify) a Complex (|or nested )Expression)).*
Matching:
Matching tools are useful when you have specific strings that you want to match, but also have others that you want to avoid. I’ve used it most commonly for when I have multiple URLs that I want to specify, but also make sure to avoid others URLs. Regextester.com is another free tool worth trying out, specifically for matching. (Side note: Be aware that a lot of RegEx matching tools are meant for developers, so when you’re working with RegEx for Google Marketing Platform products, you will want to make sure it looks at JavaScript’s version of RegEx, as they are slightly different in different languages.)
Example RegEx: .*(This Is A Tool (That Helps (Visualize|Simplify) a Complex (|or nested )Expression)).*
Conclusion
You now have a good idea of how to start using RegEx in your filters, segments, data studio reports, or even GTM trigger. There are literally millions of possibilities for what you need, and only you will know the patterns that suit your data set. Hopefully this is enough to get you started, but if you do have questions, feel free to reach out to the analytics consultant and engineers at InfoTrust.
Now, young padawan, go out into the world and start feeling like a technical reporting genius and impress all your coworkers with the power of RegEx!