Intro to Regular Expressions, and How to Use Them

Estimated Reading Time: 6 minutes
June 3, 2020

Do you ever wish you could search for multiple strings in a Google Analytics report? Or set up one trigger to fire on multiple pages in Google Tag Manager? Are you constantly having to create complex filters? Or running out of slots in your Segments or advanced searches?

If so, I’m here to introduce you to your new best friend: regular expressions.

What are Regular Expressions (RegEx)?

A regular expression (written as RegEx, regex, or regexp) is a coded text string that describes a pattern or set of patterns in order to search text. Some variation of RegEx is in most scripting languages, but you can also use it in all Google Marketing Platform products. It sounds complex and it might be a little intimidating at first, but once you get the hang of a few key characters, you’re on your way to reporting like a pro!

Basic Characters

There are special characters in RegEx that each mean different things, but by combining these characters, you can create very powerful patterns. The list below is not comprehensive, but rather includes RegEx that I use often and recommend starting with.

Character

Definition

Example

Example Pattern

  |

A bar/pipe is used to mark an “or”

infotrust|InfoTrust

Contains:

infotrust or InfoTrust

( )

Parentheses can be used to group different text together 

(Info)Trust

Contains:

InfoTrust

[ ]

Square brackets can be used to express that anything within them is interchangeable

b[aeiou]bble

Contains:
babble, bebble, bibble, bobble, or bubble

[a-b]

[0-9]

A hyphen between digits or letters with square brackets around it can be used to designate a range of letters or digits/numbers

[b-f]at

Contains:
bat, cat , dat, eat, or fat

?

A question mark is used to declare that the previous character is optional

favou?rite

Contains:
favorite or favourite

*

An asterisks is used to indicate that the previous character can be optional or repeated unlimited times

go*gle

Contains:
ggle, gogle, google, gooogle, goooogle, etc

+

A plus sign is used to say that the previous character can be repeated unlimited times

go+gle

Contains:
gogle, google, gooogle, goooogle, etc

{ }

Curly brackets with a number inside can be used to create multiples of the previous character (or group of characters)

b{3}

Contains:
bbb

{ , }

Curly brackets with numbers separated by a comma, is used to designate a specific range of multiples for the previous character (or group of characters)

a{3,6}

Contains:
aaa, aaaa, aaaaa, or aaaaaa

\

Escape any of the special characters

InfoTrust is the Sh\*t

Contains:
InfoTrust is the Sh*t

\d

A single digit from 0 to 9 (a short hand for range [0-9])

\d

Contains:
0, 1, 2, 3, 4, 5, 6, 7, 8, or 9

\n

Designates a new line in the text

Hello\nWorld

Contains:
Hello
World

.

A period represents a single character (digit, letter, or character)

inf.trust

Contains:

infotrust, inf*trust, or inf8trust

^

A carrot helps identify the beginning of string

^cat

Begins with:

cat

$

A dollar sign marks the end of a string

dog$

Ends with: 

dog

Common Combinations

Using these common combinations, along with the basic characters, you can quickly start using RegEx in your reporting.

Combo

Definition

Example

Pattern

.*

Technically this combination is any character + any multiple of previous character. Effectively this becomes any collection of characters.

.*\.example\.com

Contains:

Example.com, sub1.example.com, or sub2.example.com

(Will count all subdomains as long as the hostname ends in “.example.com”)

(( ))

Nested parentheses are used to group different actions together especially when you want another RegEx character to act on a whole set of characters. It always reminds me of PEMDAS in middle school! 

rege(x(es)?|xps?)

Contains:

regex, regexes, regexp, or regexps

( )?

Using parentheses with an action like a question mark means that the whole group within the parentheses is subjected to the RegEx character of the question mark’s action.

g(oog)+le

Contains:

google, googoogle, googoogoogle, googoogoogoogle, etc

\d{ }

\d and curly brackets can help with any number patterns like for phone numbers or SSN.

\d{5}(-\d{4})?

Contains:

03948-4758 (aka a zip code)

\d+

\d and the plus sign can help with integers for if you have a range of values from 0 – 100000 and want to be able to account for all possible values.

\d+(\.\d\d)?

Contains: 

X.XX, XX.XX, XXX.XX, XXXX.XX, etc

(A positive integer or a floating point number with exactly two characters after the decimal point. X is a digit [0-9])

\?

\.

\/

The backslash character with any of the RegEx characters turns the RegEx character back into a regular character.

www\.example\.com\/test\?p=xtest

Contains:

www.example.com\test?p=xtest

^ $

When a carrot and dollar sign are used then you are saying that the string is EXACTLY whatever is in between these two characters.

^InfoTrust$

Is exactly:

InfoTrust

Tools to Help

The only downside of RegEx is that they take a little while to get the hang of, so you should always test them—especially when starting out so that you ensure you’re using the characters correctly, but also to test your skills if you’re trying a new combination of characters. To help me learn (and continually use) RegEx,  I have two different types of RegEx tools: one for pattern visualization and one for matching. 

Pattern Visualization:

Pattern visualizations can be used to work out RegEx to make sure you’re creating the correct pattern (especially since it’s easy to forget a parenthesis or character when writing exceptionally long expressions). There are multiple tools that are free to use, but I like the simplicity of regexper.com. I recommend using something like this tool especially as you’re starting out to make sure you are getting the hang of the new “language” of RegEx. I found it really helped when trying to help visualize nested functions since a long string can quickly get messy.

Example RegEx: .*(This Is A Tool (That Helps (Visualize|Simplify) a Complex (|or nested )Expression)).*

Matching:

Matching tools are useful when you have specific strings that you want to match, but also have others that you want to avoid. I’ve used it most commonly for when I have multiple URLs that I want to specify, but also make sure to avoid others URLs. Regextester.com is another free tool worth trying out, specifically for matching. (Side note: Be aware that a lot of RegEx matching tools are meant for developers, so when you’re working with RegEx for Google Marketing Platform products, you will want to make sure it looks at JavaScript’s version of RegEx, as they are slightly different in different languages.) 

Example RegEx: .*(This Is A Tool (That Helps (Visualize|Simplify) a Complex (|or nested )Expression)).*

Conclusion

You now have a good idea of how to start using RegEx in your filters, segments, data studio reports, or even GTM trigger. There are literally millions of possibilities for what you need, and only you will know the patterns that suit your data set. Hopefully this is enough to get you started, but if you do have questions, feel free to reach out to the analytics consultant and engineers at InfoTrust.

Now, young padawan, go out into the world and start feeling like a technical reporting genius and impress all your coworkers with the power of RegEx!

Questions About RegEx?

Reach out to our experienced analytics team if you have any questions.

Author

Last Updated: June 4, 2020

Get Your Assessment

Thank you! We will be in touch with your results soon.
{{ field.placeholder }}
{{ option.name }}

Talk To Us

Talk To Us

Receive Book Updates

Fill out this form to receive email announcements about Crawl, Walk, Run: Advancing Analytics Maturity with Google Marketing Platform. This includes pre-sale dates, official publishing dates, and more.

Search InfoTrust

Leave Us A Review

Leave a review and let us know how we’re doing. Only actual clients, please.

  • This field is for validation purposes and should be left unchanged.