“Do you suppose we’ll meet any wild animals?” Dorothy timidly asked the Tin Man before being startled by the Lion on their trip down the yellow brick road in the classic movie, The Wizard of Oz. As you begin to utilize Google Analytics 4 (GA4) for reporting, you too may be anxious about what you may run into that could limit the data you expect to see in your reports, but unlike Dorothy, we know upfront what we may come across—sampling, limits due to cardinality, and thresholding. Let’s get familiar with these topics so that, just like Dorothy and the Tin Man accepted the Lion on their journey, you too can embrace and trust your GA4 reporting.
Data sampling in Google Analytics reporting isn’t new, as those of us using Universal Analytics are familiar, but in GA4, it’s based on event volume. You may run into sampling, especially in explore reports, since these reports query raw event and user-level data and these queries may be asking to process more events or users than the GA4 quota limit (10 million events for GA4 standard vs. 1 billion events for GA4 360).
What can you do …
- If sampling does occur, the larger the sample size (as a % of the population), the greater accuracy of your results. Try decreasing the date range if your sample size is too small for your report to find greater precision in your reporting.
- If sampling occurs on an Explore report and you are a GA4 360 client with Analyst or higher permissions, you may Request unsampled results (beta).
Cardinality (and row limits)
Now that we’ve talked about sampling, let’s explore why you may see the “(other)” row appear in your reports. This occurs when the result of multiplying the cardinality of each dimension in your report exceeds the row limit for the report.
The number of unique values for a dimension is referred to as its cardinality. Some dimensions like Client ID, User ID, or Page path can have hundreds or thousands of unique values, whereas Device category only has up to three (desktop, mobile, tablet). A dimension that may have more than 500 unique values in one day is considered high-cardinality. If your report has high-cardinality dimensions, this is going to lead to an increased number of rows in the report, and, as stated previously, reports have row limits, so this can lead to the appearance of the “(other)” row in your report.
Standard reports (that do not have a secondary dimension or comparison applied) have a row limit of 50,000 for the table. These reports only include the data needed for the report. Explore reports, or standard reports that have a secondary dimension or comparison applied, have a row limit of 2 million for the table. It’s important to note that these reports will include all of the dimensions in your property for the date range within the table. This means that data that may not even be shown in the report is contributing to the row limit. If your report includes dimensions with a high-cardinality, the chances that you’ll see the “(other)” row are very high.
For GA4 360 properties, you have the option to trigger automatic expanded datasets for both standard and explore reports, which have a row limit of 2 million for the table. The key difference here is that the expanded dataset does not include all of the dimensions collected for the property, but rather, only includes the dimensions needed for the report.
What you can do …
- Use standard reports when possible.
- Use an explore report if you see the “(other)” row, and if you’re a 360 client, you can request unsampled exploration results.
- When setting up your property and event parameters, use predefined dimensions if possible before setting up custom dimensions for the same data.
- If possible, limit your collection of high-cardinality values in event parameters and user properties.
- If possible, don’t use a custom dimension that is a unique identifier for your users; instead, use User-ID.
- If you currently only have GA4 Standard, consider upgrading to GA4 360, where automatic expanded datasets are automatically enabled when you see the “(other)” row so that data aggregation under that row occurs far less frequently.
- Export your data to BigQuery.
Finally, let’s tackle understanding what ‘thresholding’ is in GA4 and when you may encounter it. If thresholding has been applied to your report, you will see this data indicator:
Thresholding is a privacy protection measure by Google so as to prevent the possibility of identifying individual users by their demographics or interests. Google doesn’t want what happened to Oz (his identity was discovered!) to happen to your users. There are a few reasons that thresholding may have been applied to your report:
- If Google Signals has been enabled so that you may more accurately track users across different devices and platforms and remarket to more users across devices, but the report you have run has a low volume of users/events
- If your report includes user-identifying information such as demographic info along with user identifiers, custom dimensions, and some user-generated content fields
- If the report has a low user/event count (perhaps due to too narrow of a date range)
What you can do …
- Expand the date range for your report, so as to increase the user/event count and potentially enable seeing any data that had been previously thresholded.
- If your property has Google Signals enabled, consider setting up an identical GA4 property that does not have Google Signals enabled.
If you have enabled Google Signals because you need cross-device reporting and audiences, but you don’t want thresholding to affect your daily reporting needs, you do have another option. Change your reporting identity to be device-based. (This setting is retroactive and more closely aligns with reporting that would occur in UA and BigQuery.)
Now that you have a better understanding of how your data is handled within GA4 reports, you can now skip your way down the yellow brick road toward the sunset of Universal Analytics. Need a travel guide down that road? Reach out to us!