What’s the Best Way to Extract Google Analytics Data?

What’s the Best Way to Extract Google Analytics Data?
Estimated Reading Time: 7 minutes

It’s true: there are many ways to extract data from Google Analytics (Universal Analytics) into other platforms to create ad-hoc analysis, dashboards, data science models, and much more. While we know a great implementation speaks for itself, unlocking additional value for your organization often means extracting your Google Analytics data and using it elsewhere. In this post, I’ll cover some of the most common ways to extract your data and provide insight from my years of working with organizations in all shapes and sizes to hopefully save you some headache and time as you think about what’s best for your needs. Below are a few questions to consider before we hop into the different methods:

Does this data already exist?

I have seen a number of organizations sponsor various projects that have similar data sets from Google Analytics—which is a challenge to manage from a governance, cost, and data quality perspective. If you can use a previously-developed dataset, we’d encourage it—but if not, think about how your dataset can be used by others in the future. In either case, it’s easiest to manage data quality and consistency the less you have.

How quickly and frequently do you need data?

Some projects may need real-time data where others may only need an updated monthly summary three business days at the close of a preceding month. Think through what you will need and write it down as this will help guide your choices.

How detailed is your reporting?

While this may seem like a difficult question to answer, I would recommend first documenting every dimension and metric you’d like to report on. The reason this matters is it’s easier to work with summarized data but not all metrics can be easily summarized and therefore hit level data may be required. Some likely culprits are ratios and time-based metrics, or if you use moderate to complex segments for reporting the data can be far easier to work with at its most granular level. 

How perfect does the data need to be?

There are often tradeoffs between the velocity, accuracy, and cost. Google Analytics will sometimes sample data to deliver reports in certain scenarios. This may be perfectly fine for you or the end of the world, so be mindful of your requirements to strike a good balance.

Are there geographic restrictions to consider?

GDPR is a hot topic for many of our clients and if you’re a global organization you will need to be mindful of where your data sits and is being processed. EU countries are not allowed to transfer data outside of the EU; if applicable, what implications could this have for you? Additionally, time zones and currencies are often painful to align in reporting and should be considered if relevant for your project.

Which platforms will be consuming this data?

Ensuring compatibility between your platforms is also a key consideration. More often than not,  organizations have multiple cloud environments, visualization, ETL, and modeling platforms. We always encourage clients to make optimal use of their platforms and ensure where Google Analytics data will be stored and transformed will be compatible end to end.

Now that we have some requirements as to how the data will be used, let’s cover some of the ways to extract data. In most cases, you will use the API or data transfer service within BigQuery to load your data. However, it should not be forgotten that you can pull unsampled reports or custom reports from Google Analytics directly and also use the Google Sheets plugin. There’s also third-party connectors for extracting data, but they rely on the services outlined below—the advantage being you don’t have to write code to extract your data. If simpler solutions meet your needs, fantastic! Below are high-level considerations and a comparison between using Big Query DTS and the Google Analytics 360 API.

 BigQuery Transfer ServiceGoogle Analytics 360 API
GranularityRaw sessions and hit level dataAggregated Data
Data Availability Within 24 hours, 1 file for previous day data generally by morning. Streaming data (intra-day) is also an option and is updated continuously and generally incomplete for preceding 4 hours from when pulledIntraday reporting is available, as is historical
Backfilling13 months or 10 billion hits (whichever is smaller) of historical data to BigQueryHistorical data is limited to up to 14 months
Data RefreshnessData is updated up to 8 hours delayData is updated up to 4 hours delay
SamplingN/ABoth sampled and unsampled data is available (**refer to table below )

Additionally, if you’re using the API the following sampling limitations will apply:

SampledUnsampled
Available in both standard and premium Google Analytics 360 versionIt’s only available in premium Google Analytics 360 version
It provides 7 parametersIt provides only 4 parameters
Number of requests per day per project: 50,000
Number of requests per view (profile) per day: 10,000 (cannot be increased)
Number of requests per 100 seconds per project: 2,000
Number of requests per 100 seconds per user per project: 100 (can be increased in Google API Console to 1,000)
The maximum number of unique dimension values (i.e. rows) that will be reported is 3 million
Analytics Standard: 500k sessions
Analytics 360: 1M sessions
Analytics 360 using resource based quota: 100M sessions
There are no session limits

The summary tables above speak to some of the limitations, but our experience has taught us to be mindful of the following:

Data Availability and SLAs

There is nothing worse than working on a dashboard, perfecting it, and doing training with your stakeholders only to have it be disrupted by inconsistent data arrival. It is widely known that when working with the BigQuery DTS tables, Google Analytics data shows up when it wants to. In many cases, you’ll have your data show up before noon for the preceding day but not always. The API is the only sure fire way to ensure your data is consistently available at a specific time. So, if you’re looking for a consistent delivery time you’ll want to use the API. We have in the past used both the API and BigQuery DTS in the same project to balance data quality and availability.

Data Sampling Using the API

This goes without saying, but stakeholders generally expect perfect data, matching to the Google Analytics console at all times. Be careful as you work with your data to ensure that if sampling is happening, to what degree it is and it is communicated to stakeholders. In some cases, you’ll have more flexibility but perfect data is never guaranteed when working with the API unless you’re within the limits noted above.

Governance

It is sad to say, but we’ve seen organizations that have different numbers for the same metric between dashboards, due to a difference in methodologies of calculating the metric. I touched on this earlier, but if you can reuse a dataset that is ideal to maintain consistency in reporting. Conversely, adding new fields to an existing dataset isn’t a bad idea and could save a ton of time in the long run. I’ve also seen departments hampered by API quotas when data requests are not optimized and orchestrated—meaning reports that worked well for years begin to fail because someone else started running extracts at the same time, causing complete failure or sampling. This can happen out of the blue if there’s no governance in how data is pulled from Google Analytics, so be mindful of how many folks in your organization are making use of the API and you’re sticking to the limits.

Tips and Tricks (BONUS)

If you’re working with the BigQuery DTS, consider triggering your ETL process based on when data shows up as documented in your log files. Some organizations schedule their ETL processes to run at certain times only to find the data is missing and needs to be re-run later on. This is a foolproof method of ensuring your data shows up.

When using the API, also make use of the “isdatagolden” field to ensure your reporting is complete. Retry extracting data until this field is set to “true” to ensure the highest data quality.

Stakeholders will appreciate knowing when your data was last refreshed. We typically capture this when an ETL process has been completed to embed into a report or dashboard frontend. Ultimately, it will create less questions for you so we highly recommend this little trick to help better manage stakeholder expectations.

Technical debt is the arch nemesis of a well-run ETL process. When something breaks do you have folks available and documentation to support fixing it? Make time to document how things work today—your future self will thank you.

Struggling with your Google Analytics data extraction?

Our team of experts is here to help whenever you need us.
Facebook
Twitter
LinkedIn
Email
Last Updated: January 17, 2023

Other Articles You Will Enjoy

Comparing Five Google Analytics Alternatives for Europe

Comparing Five Google Analytics Alternatives for Europe

EU Country Rulings Multiple data protection authorities across Europe have released unfavorable findings against Google Analytics (GA), including the latest iteration of Google Analytics…

7-minute read
Four A/B Testing Platforms to Consider When Planning for the Sunset of Google Optimize

Four A/B Testing Platforms to Consider When Planning for the Sunset of Google Optimize

Google recently announced that both Google Optimize and the Google Optimize integration with Google Analytics 4 (GA4) will sunset on September 30, 2023. After…

11-minute read
Setting Up Goals in Google Analytics 4

Setting Up Goals in Google Analytics 4

Google Analytics 4 (GA4) has introduced a new way to set goals and conversions. Instead of needing to build out a goal as was…

4-minute read
Recreating Core Universal Analytics Reports in Google Analytics 4: Top Behavior Reports

Recreating Core Universal Analytics Reports in Google Analytics 4: Top Behavior Reports

Introduction As the martech world begins preparing for our cookieless, privacy-centric future, Google has responded with their own platform updates to address consumer concerns…

6-minute read
How To Fix Multiple Payment Method Issues In Google Analytics

How To Fix Multiple Payment Method Issues In Google Analytics

Most online businesses, including eCommerce, accept payment online using multiple payment methods. These payment methods may include PayPal, card payment, Stripe, etc. This is…

5-minute read
What Happened to Views and Filters in Google Analytics 4?

What Happened to Views and Filters in Google Analytics 4?

For years, Google Analytics has been known for its simple hierarchy of Account > Property > View. If you took a Google Analytics Academy…

10-minute read
What You Need to Know about Google Analytics 4 Account Structure

What You Need to Know about Google Analytics 4 Account Structure

Google Analytics is the most powerful and effective tool for tracking and analyzing your website traffic. It is also the most popular: roughly 55…

7-minute read
Sessions in Google Analytics 4: What Are They and Where Can You Find Them?

Sessions in Google Analytics 4: What Are They and Where Can You Find Them?

As we enter the new year, we are getting closer to welcoming Google Analytics 4 (GA4) as the replacement to our beloved Universal Analytics…

5-minute read
3 Ways to Ensure Your Tag Events Are Firing Correctly

3 Ways to Ensure Your Tag Events Are Firing Correctly

Tags can be tricky and simple at the same time. The concept behind the Universal Analytics (UA) event tags are simple. Three values are…

5-minute read

Get Your Assessment

Thank you! We will be in touch with your results soon.
{{ field.placeholder }}
{{ option.name }}

Talk To Us

Talk To Us

Receive Book Updates

Fill out this form to receive email announcements about Crawl, Walk, Run: Advancing Analytics Maturity with Google Marketing Platform. This includes pre-sale dates, official publishing dates, and more.

Search InfoTrust

Leave Us A Review

Leave a review and let us know how we’re doing. Only actual clients, please.