Blast Analytics and Marketing

Analytics Blog

Supporting Leaders to EVOLVE
Category: Digital Analytics

The Best Way to Filter Bot Traffic in Google Analytics

August 25, 2020

It’s really tough to keep up with all the changes happening in the world of digital analytics. Between General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) privacy laws, Intelligent Tracking Protection (ITP) 2.0, and Chrome 80 cookie changes, collecting quality data is becoming increasingly difficult. And on top of it all, Google has announced that Network Domain and Service Provider data is no longer supported in Google Analytics.

For those who manage websites that receive a lot of bot traffic, both the Network Domain and Service Provider fields are often used in filtering bot traffic out of reporting. For large sites with a flexible budget, this data can still be collected with the help of a third-party tool and a couple of custom dimensions. But paying for a tool to capture Network Domain and Service Providers data isn’t always an option, so is there a way to block and filter bot traffic without paying for an additional service? Yes, but it’s complicated.

If you’re looking for a quick fix to continue capturing Network Domain and Service Provider for free, you’re out of luck. However, with a bit of customTask magic, we can implement a decent solution.

Note: If you’re unfamiliar with the customTask field in Google Analytics, stop now and read the definitive blog post on the topic by Simo Ahava customTask.

At this point, I want to highlight that the following solution is an adaptation of the “High-Hit Visit Processing” rule that Adobe Analytics uses to help detect bot traffic. Adobe documents the rule as follows:

High-Hit Visit Processing: If more than 100 hits occur in a visit, reporting determines if the time of the visit in seconds is less than or equal to the number of hits in the visit. In this situation … reporting starts over with a new visit. High-hit visits are typically caused by bot attacks and are not considered normal visitor browsing. (Full details here)

Simply put, if a visit has more than 100 hits and the number of hits is greater than the visit duration in seconds, the hits and user are considered bot traffic. That seems reasonable, but we now need to convert this logic into something we can implement via our tag manager.

Step One: Defining the Goal

Let’s define our goal. Our goal is to, using the number of hits over a given time, detect and block bot traffic from being sent to Google Analytics. To achieve this goal, we first need to determine what number of hits over what period of time would constitute bot traffic. Luckily, Google actually does some hit limiting on its own and gives us a great place to start.

Under the Google development guide — Google Analytics Collection Limits and Quotas — in the Client library / SDK specific rate limits section, for gtag.js and analytics.js tracker implementations, Google Analytics has these limits:

“Each gtag.js and analytics.js tracker object starts with 20 hits that are replenished at a rate of 2 hits per second. This limit applies to all hits except for ecommerce (item or transaction).”

For example, the maximum number of non-ecommerce hits a user could send in 10 seconds would be 38. That’s 20 hits in the first second and 2 hits per second for the remaining nine seconds. Where n equals duration in seconds, the formula for calculating the maximum number of hits Google Analytics accepts in a given duration would be:

Max # Hits = 20+((n-1) *2)

It should be noted that this limit is applied by the server and is likely never reached unless your site is sending a large number of calls all at once. It’s unlikely that any bot would trigger this limit either, so we know that our bot limit should be more strict than this. But how strict?

Since we’re working on the page at a hit level, this is very difficult to answer. Since we don’t have the luxury of seeing the entire session in post-processing, we have to decide if a hit and all subsequent hits should be categorized as bot traffic. Too strict, and normal traffic is removed; not strict enough, and bots will continue to dilute reporting.

Ultimately the choice is down to how you and your organization want to classify a bot, but I’d recommend a limit of 11 hits in 5 seconds. I arrived at these numbers under these assumptions:

      • User changes pages every 2 seconds
      • Pageview event and a single event are fired on each page

Even if we assumed a user-triggered 3 events per page (9 hits in 5 seconds), or separately assumed the user viewed a new page every second (10 hits in 5 seconds), we’d still be fine. With a quality tracking strategy and an optimized implementation, there are very few reasons you should be sending more than a handful of hits in a 5-second window.

All this said, I strongly suggest reviewing your site prior to making these changes. It’s a fine line between filtering bots and excluding legitimate users.

Step Two: Building a Foundation

Now that we’ve determined our limits, we can jump into the code. To do this, we’re going to capture the timestamp of each hit in an array that’s stored in a session-based cookie. Each time a hit goes out, we’ll add the time stamp to the array and remove any time stamps that are greater than the 5-second window we set. The length of the array effectively tells us the number of hits that have fired during our 5-second window.

Step Three: Add Blocking Logic

We start by generating a timestamp that we’ll immediately push to our hitStampArray. After we push our newest timestamp, we check the array for hits that no longer fall within our limit time frame. Now we need to add logic to start blocking hits over our limit.

It’s important to remember that customTask runs every time a hit is constructed. This means that our entire function will run and, thus, variables must be stored globally or in a cookie if we want to use them with each hit. We address this by writing our hitStampArray to a cookie.

Step Four: Label and Block Bot Traffic

With this additional code, we’ll now be able to store and utilize our hitStampArray any time a hit fires. With this in place, we can start labeling visitors as bots and blocking their traffic.

Step Five: Adjustments for Easy Maintenance

At this point, we should add in the customTask framework and some variables to make adjusting the script a bit easier.

Step Six: Final Touches

For the finishing touches, we want to set a dimension to “true” if the visitor (and their hits) should be recorded as bot traffic.

You’re All Set

And with that, we have our customTask that can effectively block detected bot traffic. Simply apply the customTask script to your Google Analytics tag, and you’re all set.

Disclaimer: This isn’t a perfect solution. If a bot is jumping around your site clearing cookies between each page, this solution won’t detect or be able to block it. Likewise, if your site has a large amount of custom event tracking, you may block legitimate user traffic accidentally. This solution should be carefully considered before being applied to your site.

Nik Earnest
About the Author

On the leading edge of advanced and cognitive analytics, Nik specializes in evaluating, developing, and implementing analytics solutions for operations, risk, compliance, and financial reporting across multiple sectors and industries.

Connect with Nik on LinkedIn. Nik Earnest has written on the Blast Digital Customer Experience and Analytics Blog.