Overcoming Thresholding in Google Analytics 4: A Guide
Encountering a weird orange exclamation mark in your Google Analytics 4 (GA4) report that says "Thresholding applied" can be confusing. Even if the report says it's unsampled, the word "threshold" might hint at a sampling problem. In this article, we'll explain what thresholding is, what it could mean, and ways to deal with it.
Understanding the Cause:
Thresholds in GA4 are linked to a feature called Google Signals. While this feature is initially disabled, enabling it can lead to peculiar outcomes.
Why Enable Google Signals?
Google Signals facilitates cross-device and cross-platform user tracking. By collecting data from users signed in to a Google account, it provides insights into demographics, interests, and more. Enabling Google Signals enhances data collection and unlocks features, making it appealing for two main reasons:
Populating demographic data in GA4.
Reusing GA audiences for retargeting in Google Ads.
However, a caveat accompanies this feature—thresholding.
The Impact of Thresholding:
When thresholding is triggered in GA4, the interface hides rows in reports with small user numbers, likely below 50 users/events per row. For instance, in a Traffic Acquisition report, if some traffic sources generated fewer than 50 users in a given timeframe, GA4 conceals that data. While the data remains stored, it is not displayed in the report.
Why Google Implements Thresholding:
Officially, Google states that thresholding prevents GA users from identifying individual users based on the additional data Google Signals contributes to reports, such as age and gender. Despite the uncertainty about the identification risk, Google maintains its position, and GA users have limited control over these system-defined thresholds.
Strategies to Avoid Thresholding:
Don't Enable Google Signals:
Pros: Avoids thresholding, but sacrifices certain demographic data.
Enable Google Signals but Disable Reporting Identity Option:
Pros: Retains demographic data, yet minimizes the impact of thresholding.
Steps: Admin > Data Settings > Data Collection > Disable "Include Google signals in reporting identity."
Change Reporting Identity to "Device-based":
Pros: Eliminates thresholding, but user counts may be less accurate.
Steps: Admin > Reporting Identity > Select "Device-based."
Dealing with Existing Thresholding Issues:
If you're already facing thresholding issues, consider changing the default reporting identity:
Admin > Reporting Identity > Show all > Choose "Device-based."
Remember, you can switch reporting identities retroactively without affecting stored data. However, accuracy trade-offs exist, especially when using the device-based identity.
Final Considerations:
While rows with small numbers typically impact a small percentage of traffic, certain situations, like smaller websites, may face more significant challenges. Regularly switching between reporting identity settings is recommended to assess the impact. Unfortunately, this issue persists in GA4, and users can only navigate around it with informed strategies. Thresholding in GA4 is distinct from sampling, and awareness of its conditions is crucial for effective data interpretation and utilization.