How to Calculate Statistical Significance for Session-Based Metrics in A/B Tests
Customer Experience Optimization teams are put in charge to improve the digital customer experience. Most times the A/B testing strategy focuses on impacting the bottom-line or primary key performance indicators (KPIs) for a business.
However, there will be times where other metrics will be of importance and it’s critical to have a strategy in place to ensure these A/B testing KPIs are accurately tracked and that the correct statistical significance analysis is being used to evaluate performance.
“…it’s critical to have a strategy in place to ensure these A/B testing KPIs are accurately tracked and that the correct statistical significance analysis is being used to evaluate performance.” Click & Tweet!
At Blast, our Customer Experience Optimization team came across this very circumstance for one of our clients. The typical binomial A/B testing metrics, such as lead completions, were no longer the goal. Instead, a greater emphasis was placed on continuous metrics (defined below), such as average page per session and other session-based metrics. Tracking continuous metrics can pose several challenges, mainly centered around the difficulty in calculating statistical significance for A/B test results.
In this blog post, we’ll describe what steps need to be taken to resolve these issues, provide an example of how we put this method to work for one of our own clients, and finally, we’ll introduce our new and improved Blast Statistical Significance Calculator.
Binomial vs Continuous Metrics — What’s the Difference?
More often than not, KPIs, such as transactions, cart adds and lead completions, are the primary metrics for A/B testing. These KPIs are known as binomial metrics because they result in only two outcomes (e.g. transaction vs no transaction, lead completion vs no lead completion). It’s easy to determine whether a metric is binomial.
The general rule is that if you can refer to it as a “rate” (e.g. Transaction rate, lead completion rate, add to cart rate), then it is a binomial metric. Using these types of metrics as a KPI for A/B testing doesn’t usually pose a lot of challenges. The available testing platforms are well-equipped to handle this type of data and can provide results with statistical significance impact.
“The general rule is that if you can refer to it as a “rate”, then it is a binomial metric.” Click & Tweet!
However, there will likely be times where the Customer Experience Optimization team or a client will want to focus on other metrics that look at the averages instead of the rates (e.g. average pages per session, average events per session, etc.). These metrics are considered non-binomial (or continuous) because there are more than two possible outcomes. It’s not simply a matter of whether the conversion happened or not.
The Challenge of Using Continuous Metrics as Goals for A/B Testing
Anyone who’s had to figure out how to calculate statistical significance for continuous metrics knows testing platforms are not always well-suited to handle this task, particularly if the metric is session-based. For example, a number of testing platforms counting methodology are visitor-based, not session-based. Therefore, when looking at their respective results dashboard, the traffic for each variation is expressed as “Unique Visitors” instead of Sessions:
The workaround here is to integrate the analytics platform with the testing platform so you can pull in test data and analyze performance in an analytics report, such as a Google Analytics custom report.
Having access to overall performance for session-based metrics is the first step. However, if a team is attempting to analyze performance in analytics, there remains a big challenge on how to determine if the results you are seeing are having an impact or in other words, how to calculate statistical significance for such metrics.
Overall Results Won’t Do! We Need Session-Level Data
Standard A/B testing significance calculators (as shown below) are used to dealing with binomial data, where one can simply enter overall traffic and conversion volume per variation. However, these statistical significance test calculators don’t work well when the data is continuous. In other words, you can’t just enter overall sessions for the Original vs overall sessions for the Variation to accurately determine statistical significance for “average” session-based metrics.
Instead your team needs to obtain session-level data from your A/B tests in order to perform a proper statistical significance calculation.
It is possible to get this session-level data for an A/B test in your team’s analytics platform, although, your team will need to implement a custom dimension for Session ID (e.g. Using GTM) to start tracking this data. Please check out our guide for detailed step-by-step instructions on how to implement this custom dimension.
The New Blast Calculator Tackles Session-Level Data
Even after taking the necessary steps to obtain session-level data, challenges still remain. As stated above, most A/B test significance calculators are not built to handle continuous metrics. Specifically, there is no option to add or upload session-level data, which is necessary to do a proper statistical significance calculation for these types of metrics.
Recognizing that there was a need to have a readily available statistical significance test calculator to handle these types of metrics, Blast decided to create its own and make it available to everyone! Specifically, we created a statistical significance A/B test calculator that has the ability to handle the various types of metrics a Customer Experience Optimization team may need to analyze, including the continuous metrics described in this blog post.
Further, the new Blast Statistical Significance Calculator will also have an option for calculating statistical significance for typical binomial metrics, such as transaction rate, add to cart rate and lead completion rate.
Putting the Blast Statistical Significance Calculator to the Test
Blast had to tackle the challenges described above for one of our own optimization clients, who wanted to run a few tests that were more focused with on-site engagement instead of the usual primary KPIs. In order to meet their needs, our analytics implementation team used the step-by-step instructions outlined in the above mentioned guide to implement the Session ID custom dimension for this client’s Google Analytics account.
As a best practice, their testing platform was already integrated with their analytics platform (Google Analytics). This allowed the Customer Experience Optimization team to access the newly built custom dimension in Google Analytics (GA) to obtain session-level data for the Original and Variation treatment in our test.
Without taking the steps to create this custom dimension, we still would have been able to view test performance in GA but only at the aggregate level, making it difficult to do a proper statistical significance calculation.
To conduct a meaningful analysis of our A/B test results, we took the following steps to get the results ready for use in the Blast Statistical Significance Calculator:
1. Create a Custom Report in Analytics — Including the Custom Dimensions for the Test Integration, Session ID and Targeted Metric
2. Export the Custom Report to a CSV File
Please note if your team is not using Google Analytics 360 and your data is likely to be sampled, you’ll need to ensure you are exporting all data from the test and not just sampled data. If you need a solution to get around the sampling issue, one way to achieve this is by linking your Google Analytics account to Unsampler.
With Unsampler, you’ll be able to create a similar report (as described above) that will include all of your data and further, you can directly export your report to a csv file.
3. Format csv file for upload
With the csv, your team will need to filter data by the treatment (Original or Variation), then copy and paste metric data in a new tab.
Save the new tab as a separate csv and this will be used for the Blast Statistical Significance Test Calculator.
4. Upload the csv file to the calculator — After having the csv file properly formatted, you can then go to the Blast Statistical Significance Calculator. Select “Continuous” from the Test Type dropdown, set preferred significance threshold, and upload the csv file
Taking the following steps, we were able to calculate statistical significance and properly analyze our A/B test results to see if there was a significant impact.
The process outlined above for how to calculate statistical significance for A/B tests is one that your team can immediately put into practice. While this post discussed continuous metrics in terms of session-based metrics, it can also be used for measuring A/B test results for other metrics, such as avg. transactions per user and avg. pages per user or other user-level based data.
Instead of creating a custom dimension for Session ID, instead your team would need to create a custom dimension to obtain user-level data (e.g. Client ID #1).
“The new Blast calculator is meant to provide teams the flexibility to perform various statistical significance calculations depending on their needs…” Click & Tweet!
The new Blast calculator is meant to provide teams the flexibility to perform various statistical significance calculations depending on their needs, including:
- a binomial calculation for typical primary KPIs (e.g. Transaction rate, add to cart rate, lead completion rate),
- a calculation for non-binomial metrics (e.g. “Average“ metrics) with a T-test approach, and
- a nonparametric calculation for non-binomial metrics where a team wants all data points to be considered.