Blast Analytics and Marketing

Analytics Blog

Supporting Leaders to EVOLVE
Category: Digital Experience

How to Calculate Statistical Significance for Session-Based Metrics in A/B Tests

August 6, 2019

Customer Experience Optimization teams are put in charge to improve the digital customer experience. Most times the A/B testing strategy focuses on impacting the bottom-line or primary key performance indicators (KPIs) for a business.

However, there will be times where other metrics will be of importance and it’s critical to have a strategy in place to ensure these A/B testing KPIs are accurately tracked and that the correct statistical significance analysis is being used to evaluate performance.

“…it’s critical to have a strategy in place to ensure these A/B testing KPIs are accurately tracked and that the correct statistical significance analysis is being used to evaluate performance.”

At Blast, our Customer Experience Optimization team came across this very circumstance for one of our clients. The typical binomial A/B testing metrics, such as lead completions, were no longer the goal. Instead, a greater emphasis was placed on continuous metrics (defined below), such as average page per session and other session-based metrics. Tracking continuous metrics can pose several challenges, mainly centered around the difficulty in calculating statistical significance for A/B test results.

In this blog post, we’ll describe what steps need to be taken to resolve these issues, provide an example of how we put this method to work for one of our own clients, and finally, we’ll introduce our new and improved Blast Statistical Significance Calculator.

Binomial vs Continuous Metrics — What’s the Difference?

image representing the difference between binomial and continuous metrics

More often than not, KPIs, such as transactions, cart adds and lead completions, are the primary metrics for A/B testing. These KPIs are known as binomial metrics because they result in only two outcomes (e.g. transaction vs no transaction, lead completion vs no lead completion). It’s easy to determine whether a metric is binomial.

The general rule is that if you can refer to it as a “rate” (e.g. Transaction rate, lead completion rate, add to cart rate), then it is a binomial metric. Using these types of metrics as a KPI for A/B testing doesn’t usually pose a lot of challenges. The available testing platforms are well-equipped to handle this type of data and can provide results with statistical significance impact.

“The general rule is that if you can refer to it as a “rate”, then it is a binomial metric.”

However, there will likely be times where the Customer Experience Optimization team or a client will want to focus on other metrics that look at the averages instead of the rates (e.g. average pages per session, average events per session, etc.). These metrics are considered non-binomial (or continuous) because there are more than two possible outcomes. It’s not simply a matter of whether the conversion happened or not.

The Challenge of Using Continuous Metrics as Goals for A/B Testing

Anyone who’s had to figure out how to calculate statistical significance for continuous metrics knows testing platforms are not always well-suited to handle this task, particularly if the metric is session-based. For example, a number of testing platforms counting methodology are visitor-based, not session-based. Therefore, when looking at their respective results dashboard, the traffic for each variation is expressed as “Unique Visitors” instead of Sessions:

image representing the traffic for each variation expressed as unique visitors instead of sessions

The workaround here is to integrate the analytics platform with the testing platform so you can pull in test data and analyze performance in an analytics report, such as a Google Analytics custom report.

workaround to integrate the analytics platform with the testing platform

Having access to overall performance for session-based metrics is the first step. However, if a team is attempting to analyze performance in analytics, there remains a big challenge on how to determine if the results you are seeing are having an impact or in other words, how to calculate statistical significance for such metrics.

Overall Results Won’t Do! We Need Session-Level Data

Standard A/B testing significance calculators (as shown below) are used to dealing with binomial data, where one can simply enter overall traffic and conversion volume per variation. However, these statistical significance test calculators don’t work well when the data is continuous. In other words, you can’t just enter overall sessions for the Original vs overall sessions for the Variation to accurately determine statistical significance for “average” session-based metrics.

image representing test data

Instead your team needs to obtain session-level data from your A/B tests in order to perform a proper statistical significance calculation.

It is possible to get this session-level data for an A/B test in your team’s analytics platform, although, your team will need to implement a custom dimension for Session ID (e.g. Using GTM) to start tracking this data. Please check out our guide for detailed step-by-step instructions on how to implement this custom dimension.

The New Blast Calculator Tackles Session-Level Data

Even after taking the necessary steps to obtain session-level data, challenges still remain. As stated above, most A/B test significance calculators are not built to handle continuous metrics. Specifically, there is no option to add or upload session-level data, which is necessary to do a proper statistical significance calculation for these types of metrics.

Recognizing that there was a need to have a readily available statistical significance test calculator to handle these types of metrics, Blast decided to create its own and make it available to everyone! Specifically, we created a statistical significance A/B test calculator that has the ability to handle the various types of metrics a Customer Experience Optimization team may need to analyze, including the continuous metrics described in this blog post.

Further, the new Blast Statistical Significance Calculator will also have an option for calculating statistical significance for typical binomial metrics, such as transaction rate, add to cart rate and lead completion rate.

Putting the Blast Statistical Significance Calculator to the Test

Blast had to tackle the challenges described above for one of our own optimization clients, who wanted to run a few tests that were more focused with on-site engagement instead of the usual primary KPIs. In order to meet their needs, our analytics implementation team used the step-by-step instructions outlined in the above mentioned guide to implement the Session ID custom dimension for this client’s Google Analytics account.

screenshot representing the script needed to implement the session id custom dimensions

As a best practice, their testing platform was already integrated with their analytics platform (Google Analytics). This allowed the Customer Experience Optimization team to access the newly built custom dimension in Google Analytics (GA) to obtain session-level data for the Original and Variation treatment in our test.

Without taking the steps to create this custom dimension, we still would have been able to view test performance in GA but only at the aggregate level, making it difficult to do a proper statistical significance calculation.

To conduct a meaningful analysis of our A/B test results, we took the following steps to get the results ready for use in the Blast Statistical Significance Calculator:

1. Create a Custom Report in Analytics — Including the Custom Dimensions for the Test Integration, Session ID and Targeted Metric

image representing the creation of a custom report

2. Export the Custom Report to a CSV File

Please note if your team is not using Google Analytics 360 and your data is likely to be sampled, you’ll need to ensure you are exporting all data from the test and not just sampled data. If you need a solution to get around the sampling issue, one way to achieve this is by linking your Google Analytics account to Unsampler.

With Unsampler, you’ll be able to create a similar report (as described above) that will include all of your data and further, you can directly export your report to a csv file.

3. Format csv file for upload

formatting csv file for upload to statistical significance calculator

With the csv, your team will need to filter data by the treatment (Original or Variation), then copy and paste metric data in a new tab.

filtering data from csv file for statistical significance

Save the new tab as a separate csv and this will be used for the Blast Statistical Significance Test Calculator.

4. Upload the csv file to the calculator — After having the csv file properly formatted, you can then go to the Blast Statistical Significance Calculator. Select “Continuous” from the Test Type dropdown, set preferred significance threshold, and upload the csv file

image showing how to select test type for statistical significance calculator

Taking the following steps, we were able to calculate statistical significance and properly analyze our A/B test results to see if there was a significant impact.

image showing analysis of statistical significance A/B test results to see if a significant impact

Conclusion

The process outlined above for how to calculate statistical significance for A/B tests is one that your team can immediately put into practice. While this post discussed continuous metrics in terms of session-based metrics, it can also be used for measuring A/B test results for other metrics, such as avg. transactions per user and avg. pages per user or other user-level based data.

Instead of creating a custom dimension for Session ID, instead your team would need to create a custom dimension to obtain user-level data (e.g. Client ID #1).

“The new Blast calculator is meant to provide teams the flexibility to perform various statistical significance calculations depending on their needs…”

The new Blast calculator is meant to provide teams the flexibility to perform various statistical significance calculations depending on their needs, including:

  1. a binomial calculation for typical primary KPIs (e.g. Transaction rate, add to cart rate, lead completion rate),
  2. a calculation for non-binomial metrics (e.g. “Average“ metrics) with a T-test approach, and
  3. a nonparametric calculation for non-binomial metrics where a team wants all data points to be considered.
Roopa Carpenter
About the Author

Roopa Carpenter is Vice President of Digital Experience (DX) at Blast. She leads a team of talented, DX consultants responsible for helping brands to better understand, optimize and measurably impact their digital experience. With many years of experience, Roopa offers a high level of knowledge and guidance to clients across a variety of industries regarding testing and personalization strategy and execution, user experience research and closing the empathy gap through Voice of Customer. Her data-driven approach focuses on impacting customer conversion and driving desired business outcomes.

Connect with Roopa on LinkedIn. Roopa Carpenter has written on the Blast Digital Customer Experience and Analytics Blog.