Are Rogue Sites Influencing Your Google Analytics Data?
Did you know that if someone puts your Google Analytics tracking code on their site (the same UA-#), visits to their site will show up in your Google Analytics profiles?
It is true, but thankfully, there is a way to fix this issue.
We won’t get deep into why someone would do this, but it generally stems from someone lifting your design or embedding your content within their site — both nefarious. The people that do this are often unaware or too lazy to remove the tracking code.
In this Google Analytics Tips article, you’ll learn two valuable lessons;
- How to identify external sites and URLs that contain your own site’s tracking code
- How to filter out these visits so they do not impact your data analysis
How to Identify Rogue Sites
Let’s create a custom ‘Hostname’ report that will have a dimension of ‘Hostname’ and a drill-down to ‘Landing Page.’ On this custom report, we’ll show metrics of ‘Visits’ and ‘Unique Visitors.’ Feel free to add additional metrics such as bounce rate, goal conversions, and revenue. Goals and revenue can be great indicators, to make sure that you aren’t going to filter out traffic that is valuable to you.
To make things easy, just click this custom report share link to add it to your GA login (or follow the instructions below to set this up yourself).
- Load up GA and go into your website’s profile (in the new version).
- Click on the ‘Custom Reports’ tab at the top.
- From the ‘Overview’ option on the left, click on ‘+ New Custom Report’.
- Enter a report name, name the report tab and metric group.
- Add a dimension of ‘Hostname’ and a drilldown dimension of ‘Landing Page’.
- Add the ‘Visits’ and ‘Unique Visitors’ metrics.
- Optionally, add a context filter to the custom report to exclude any hostnames that include your domain name. This step is not required and you may want to not exclude your domain since it can provide a better picture as to what percentage of traffic the rogue site is contributing.
- Save the report and run it.
On this report, hopefully you don’t see any hostnames that you don’t recognize. One word of caution is that you’ll likely see two Google related hostnames: translate.googleusercontent.com and webcache.googleusercontent.com. Both of these should not be looked at as rogue. The translate hostname shows up when someone comes to your site and uses the Google translate service. The webcache hostname shows up when someone clicks on the ‘Cached’ option on the lower left of your organic search result. This can be an indication that perhaps your website was experiencing downtime for your visitor (not always, but a good indicator).
Now, if you do see some hostnames that you don’t recognize, click on the hostname to drill down to the visitor’s landing page. This shows the landing page URI on that hostname. From here, you can paste in the hostname and landing page URI in a new browser tab to see what site is using your traffic. You should also view the page source code and hit ctrl/cmd+f to find your UA-#.
If you are convinced that this domain is rogue, you can of course take action as appropriate against them, but who knows how long it will take them to remove your content/tracking code. Instead, let’s go ahead and filter them out now.
How to Filter Against Rogue Sites
There are two methods of filtering out rogue sites using your Google Analytics tracking code. One is a proactive approach while the other is reactive. We personally prefer the proactive approach, but I’ll share both. Both of these approaches are accomplished by using profile level filters.
To add a filter in Google Analytics, you need to go into your profile settings and add a new filter. You can find additional help on adding filters by reading this Google Analytics help article.
If your website domain is ‘www.yourwebsite.com’ we can setup the below filter to ONLY include visits to the site where the domain matches the following regular expression: ‘yourwebsite|googleusercontent’. The | character denotes an ‘OR’ condition. I like to keep hostnames that contain googleusercontent because that shows me how many people visit from cache and how many people visit from the translate service.
It is worth mentioning that the profile filter type we are using here is an ‘Include’ filter. The include filter will ONLY keep data that matches the expression you enter. If the hostname does not match, you won’t be seeing this data in your reports.
After analyzing your new custom hostname report, make a list of the hostnames you want to block. Either add several hostname exclusion filters or add one and use the | character as an OR condition.
So for example, let’s say you wanted to exclude the following (hopefully fictitious) hostnames from your profile: www.istoleyourcontent.com, www.iknowishouldntsteal.com. The filter expression would be to exclude the hostname filter field with a filter pattern of ‘istoleyourcontent|iknowishouldntsteal’ (don’t use the single-quote around the pattern). I don’t include the www and I also don’t include .com. If they own .com, .net, or other domains, I still don’t want them to show up.
Important Notes About Google Analytics Filters
- When you apply a filter to your Google Analytics profile, it only filters NEW data. Your historical data will not be re-processed. At the same time, if you incorrectly apply a filter that ends up excluding all of your traffic, you can’t undo it — you are stuck with your mistake. For this reason, we STRONGLY recommend that you initially apply any filters to a new, test profile and then monitor the data for a few days and only apply the filter to your master profile when you are comfortable with your decision. If you are nervous about applying this or any type of Google Analytics filter, we do offer Google Analytics consulting services to ease your mind.
- A Google Analytics best practice is to always create an additional profile that remains unfiltered. That way, if you did mess up, you’ll at least have data from the affected time period.
- Be very careful when applying an ‘Include’ filter — if you enter the wrong filter pattern, you can easily end up excluding ALL traffic. Another general tip about include filters is that you can use multiple include filters if the filter fields are mutually exclusive. There is a great blog post about using multiple include filters from Lunametrics (another Google Analytics Certified Partner).