Unveiling the Secrets of the Google Analytics 4 BigQuery Schema: A Guide for Marketers

Estimated Reading Time: 8 minutes
May 7, 2024
Unveiling the Secrets of the Google Analytics 4 BigQuery Schema: A Guide for Marketers

Understanding how your users interact with your app or website is crucial for any marketer. Google Analytics 4 (GA4) offers a powerful tool for collecting this data, but its BigQuery schema can seem complex‌. This guide will unveil the mysteries of the GA4 schema, making it easier for you to unlock valuable user insights.

Core Event Data: Capturing User Interactions

The GA4 BigQuery schema centers around the events_YYYYMMDD table, where each row represents a single user interaction (event) logged on a specific date. This table captures essential details about the event, including:

  • Event details: Date, timestamp, name, and associated value (if applicable)
  • Event parameters: Additional information about the event, such as campaign details or custom metrics

By effectively utilizing event-level data and its associated parameters, you can gain a comprehensive picture of user interactions.

Extracting Meaning with Event Parameters

The power of event data lies in its ability to capture additional context through event_params (nested record). This nested record functions like a key-value dictionary, allowing you to store a wealth of information specific to each event. 

Here’s a breakdown of the key components:

Key (string): Name of the event parameter, which you define to capture specific details about the event. Examples include:

  • campaign_medium (string): Medium associated with the event (e.g., “email”, “organic_search”).
  • product_category (string): Category of the product interacted with (e.g., “clothing”, “electronics”).
  • content_type (string): Type of content viewed (e.g., “article”, “video”).

Value (nested record): Contains the actual value for the event parameter. The data type can be:

  • string_value (string): Textual value for the parameter (e.g., “summer_sale” for a campaign name).
  • int_value (INTEGER): Integer value (e.g., number of items added to cart).
  • double_value (FLOAT): Double-precision floating-point value (e.g., average time spent on a video).

Try the following SQL queries to extract this data from any parameters:

  • With UNNEST in FROM clause 

SELECT 
event_name,
event_params.key as key,
event_params.value.string_value,
event_params.value.double_value,
event_params.value.int_value,
event_params.value.float_value
FROM
`infotrust-discovery-sandbox.analytics_207462216.events_20240123` ,UNNEST(event_params) as event_params

  • With UNNEST in SELECT clause 

SELECT
   event_name,
   user_pseudo_id,
   (select value.string_value from unnest(user_properties) where      key = ‘preferred_lang’) AS preferred_lang,
   (select value.int_value from unnest(event_params) where key = ‘ga_session_id’) as session_start,
FROM
`infotrust-discovery-sandbox.analytics_207462216.events_20240123`

Event-Related Fields in the GA4 Schema 

  • event_date: Date the event was logged (YYYYMMDD)
  • event_timestamp: Time (microseconds, UTC) of the event
  • event_name: Name of the event
  • event_value_in_usd: Currency-converted value (USD) of the event (if applicable)
  • event_bundle_sequence_id: Sequential ID of the bundle containing the event
  • event_server_timestamp_offset: Timestamp offset between collection and upload time in microseconds

Diving Deeper: Nested Records for Richer Data

The GA4 schema utilizes nested records to store complex data structures. There are three places in the schema where these records allow for detailed information about various aspects of user interactions:

  • Event parameters: This nested record provides a flexible way to store campaign data, contextual details, and even user-defined parameters associated with an event. It uses a key-value structure, where the key identifies the parameter and the value can be a string, integer, double, or float.
  • Items (for purchases and refunds): This record stores information about items involved in purchase or refund events, including unique identifiers, names, and various attributes like price, quantity, and revenue. Additionally, it allows for user-defined custom item parameters.
  • User information: This record stores data about the user associated with the event, including a unique user ID, a pseudonymous ID, and timestamps related to the user’s first interaction and daily activity status. It also includes user properties set by the developer and, for non-intraday tables, lifetime value information.

Traffic Source Demystified: Understanding User Acquisition

The schema incorporates two distinct traffic source fields to provide insights into user acquisition:

  • collected_traffic_source: This record captures traffic source data associated with a specific event, offering a snapshot of the immediate source that triggered the user’s action. This is particularly helpful for analyzing the effectiveness of marketing campaigns.
  • traffic_source: This record provides historical data about the channel that first acquired the user. It includes details like campaign name, acquisition medium (e.g., paid search, organic search), and source network. This information remains constant in daily tables, reflecting the user’s initial touchpoint.

Fields in GA4 Schema 

  • collected_traffic_source (nested record): This nested record captures traffic source data collected at the time of the event. This includes:
    • manual_campaign_ (string):* These fields capture various manual campaign parameters like utm_campaign, utm_source, utm_medium, utm_term, and utm_content. They can also include parsed parameters from referral links, not just UTM values.
    • gclid (string): This field stores the Google click identifier (if available).
    • dclid (string): This field stores the Google Marketing Platform (GMP) identifier (if available).
    • srsltid (string): This field stores the Google Merchant Center identifier (if available).
  • traffic_source (nested record): This nested record captures information about the traffic source that first acquired the user. This includes:
    • name (string): Name of the marketing campaign that first acquired the user (daily tables only).
    • medium (string): Name of the medium (paid search, organic search, email, etc.) that first acquired the user (daily tables only).
    • source (string): Name of the network that first acquired the user (daily tables only).

Key Points

  • collected_traffic_source: Captures dynamic data associated with the specific event, providing insights into the immediate source that triggered the user’s action.
    • You can ladder up and create session-level traffic source by using this field.
  • traffic_source: Captures historical data about the user’s initial acquisition channel, offering a long-term perspective. 
    • This information remains constant in daily tables because it’s determined at the first interaction, and not updated by subsequent events.
  • Please note that channel grouping is not available in the GA4 schema at the moment but can be calculated in BigQuery. You can find definitions in this Google support article.

By analyzing both traffic source fields, you can gain a comprehensive understanding of how users discover your app or website and what factors influence their long-term engagement.

Privacy Information

This record reflects user consent status for ad targeting and analytics storage. It also indicates whether measurement without cookies is enabled.

  • privacy_info (nested record): Consent status information (if enabled)
    • ads_storage (string): Whether ad targeting is enabled
    • analytics_storage (string): Whether Analytics storage is enabled
    • uses_transient_token (string): Whether measurement without cookies is enabled

Key Points

  • GA4 uses a user_pseudo_id to anonymously identify users across hits (events).
  • When the user denies consent at any point within the session, user_pseudo_id and session_id will become null.
  • Denying consent initially will not record a user_pseudo_id but can be overridden if consent is granted later in the same session.
  • Data nested recorded before and after granting consent in the same session will be combined under the same user_pseudo_id in GA4.
  • Universal Analytics relies on cookies and won’t record any data if consent is denied.

Overall, GA4 attempts to maintain some level of user behavior data collection even with consent restrictions by using temporary tokens and userID overrides, while Universal Analytics relies on cookies and stops data collection entirely when consent is denied.

Additional Data Points for a Holistic View

The GA4 schema captures various other data points to provide a well-rounded picture of user behavior:

  • Device information: This record stores details about the device used for the event, including category (mobile, tablet, etc.), brand, operating system, and other relevant attributes. Additionally, it captures web browser information if applicable.
  • Geolocation information: This record provides the user’s approximate geographic location based on IP address.
  • App information: This record offers details about the app itself, such as app ID, Firebase App ID, installation source, and app version.

Unlocking the Power of GA4 Data

Understanding the GA4 BigQuery schema empowers you to leverage its rich data for insightful marketing analysis. By dissecting event details, traffic sources, and other data points, you can:

  • Sessionize the data to derive insights on the session level.
  • Measure the effectiveness of marketing campaigns and optimize them for better user acquisition and engagement.
  • Identify the most successful acquisition channels to focus your marketing efforts.
  • Gain insights into user behavior trends over time to make informed decisions.

With a solid grasp of the GA4 schema, you can unlock the full potential of this powerful analytics platform and gain a deeper understanding of your users, ultimately driving better marketing strategies and business growth.

Do you have questions about Google Analytics 4 and BigQuery?

Our team of experts is here to help whenever you need us.

Author

Last Updated: May 7, 2024

Get Your Assessment

Thank you! We will be in touch with your results soon.
{{ field.placeholder }}
{{ option.name }}

Talk To Us

Talk To Us

Receive Book Updates

Fill out this form to receive email announcements about Crawl, Walk, Run: Advancing Analytics Maturity with Google Marketing Platform. This includes pre-sale dates, official publishing dates, and more.

Search InfoTrust

Leave Us A Review

Leave a review and let us know how we’re doing. Only actual clients, please.