Tableau Python Forecasting: Increase Your Accuracy!
With Tableau’s rise to prominence came a growing demand for data science integration. Back in Tableau 8, R functionality was introduced, and now recently with 10, Python has finally made its way into the space with Tableau Python forecasting. For the unenlightened, Python is an incredibly powerful programming language that can solve nearly any data related problem. It is the most actively used and developed language, and while it is used by the top industry professionals and academics, it has always lacked a solid visualization foundation. As a Data Scientist and Tableau Consultant, I’m so excited to say that that has changed!
Python is an incredibly powerful programming language that can solve nearly any data related problem. Click & Tweet!
This blog post will share some information to help you get started using TabPy to do non-trivial tasks like ARIMA modeling and take advantage of Scikit-Learn’s various machine learning libraries. Since we can now implement things like Random Forests and neural networks from within Tableau, we can do everything we need from within the Tableau environment. This means we can do things like track and predict Customer Lifetime Value (CLV), figure out churn rate and predict its drivers, or in this case, apply highly dependable forecasting models and then turn around and graph those results with a few clicks.
A Resounding Cheer for TabPy
Python has some great packages for visualization, yet I know I’ve spent hours tuning a graph to make it look how I need it to. With the introduction of TabPy, many of my troubles have been put to rest (at least from a visualization standpoint).
Traditional Forecasting vs. Tableau Python Examples
Forecasting is an integral part of the goal setting process, so it is important that it is done correctly. Traditional forecasting is usually carried out by simply drawing a line in the general direction of the graph’s point. Some more advanced techniques will use moving averages, but even these are based off of assumptions around a relatively recent collection of points. With Python’s forecasting, you will not only be able to capture general trends in the model, but also capture things like seasonality, correlation between recent points, and growth trends that may adversely affect the accuracy of your forecast. With these more accurate forecasts, it becomes feasible to set data-driven goals and make business decisions with more confidence.
With these more accurate forecasts, it becomes feasible to set data-driven goals and make business decisions with more confidence. Click & Tweet!
As a quick aside, Tableau prediction and forecasting does exist, but Tableau forecast accuracy is low — it is more or less a black box in implementation. When looking at Tableau forecast vs. actual values, the performance of Tableau pales in comparison to the results produced by R and Python, which makes me think they are simply using moving averages or other basic models. In my experience, Python can be up to 20x better than Tableau’s native forecasting tool, even when the Tableau model is well optimized.
What is ARIMA?
According to Wikipedia, ARIMA is defined as, “in statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model which is used to better understand the data or to predict future points in the series.” The parameters for ARIMA break down like this:
AR = Autoregressive
I = Integrated
MA = Moving Average
Setting Up the ARIMA Model
To give credit where credit is due, I am going to outline how to make a simplified, yet still equally as effective version of the program made here. The linked page can also get you set up with downloading and installing TabPy. In this Tableau Python tutorial I am going to focus on actually running the analysis, and then I’ll address some of the issues I encountered when following the linked example.
I will be making a customizable ARIMA model, that is not only easy to interpret, but also easy to adjust and tune.
We’ll be using the Dairy Production Dataset, (download page), as it is already cleaned up, and contains things like seasonality and trends which will emulate business data. We’ll be performing analysis that will use past dairy production data to make future predictions. Instead of dairy, your business may be interested in using something like revenue data to see what kind of numbers you can expect next quarter, and conveniently enough, this model will work in both contexts.
Steps to Set Up This Model
Step 1: Format the Data
We’re first going to load in and format the data. Select “Text File” in Tableau’s connect menu, and find your file.
Step 2: Python Compatibility
Once connected, we’re going to quickly reformat our dates to be Python compatible. What’s that? They’re already Python compatible!? Yea, so Python datetime, and most dates in the data world, use the format of year-month-day. Assuming our dates needed fixing, one could quickly do so by right (control) clicking on the “Date Dimension,” changing its type to Date, then right clicking on the date object and selecting Default properties > Date Format > Custom. Here one can format the date to be something like “yyyy-MM”. I’m also going to quickly rename our values column to not be astronomically long; we’ll call it “milk”.
Step 3: Customize your Model
Now we’re going to make some parameters that’ll serve as our model customizers. ARIMA modeling is a fancy term for using the past to try and figure out the future. It uses past data, as well as adjustments for stationarity to make a prediction about the (near) future. You could try and use this for a really far out projection, but as with anything, uncertainty gets crazy the further out you look. As mentioned above, AR, I, and MA are parameters for the ARIMA model.
Step 4: Create Dimensions
Once we’ve made those parameters, we’re going to make some dimensions that will be handy for graphing our Python projections. Essentially what we’re doing is shifting our dates by the amount we want to forecast. So if our original calendar says the first day is Tuesday, and we’re forecasting 2 days, now it says our first day is Thursday. Not quite as cool as a DeLorean, but still pretty neat. What this did is give our prediction data a date column to identify with.
The TabPy Tutorial
Now for the fun stuff. What Tableau does with Python is pretty cool. You give it a string of Python code in a calculated field along with some measures and parameters, it sends those over to your local Python server, runs the string, and returns an output. The output, at least from what I’ve gathered, needs to be a singular thing, so this is where many, including myself, get tripped up. No use crying over spilt milk. What we can do is have Python send a list over, which makes both parties happy.
Note: You may have noticed some weird MIN ()s in the code. These are present because when we pass a parameter, Tableau can’t pass just one, it has to pass all that are present in the selection. Min grabs the lowest one in the selection.
I’ve typed out the script for this TabPy tutorial that should work if you did all the previous steps. It’s crucial that your variables have the same names as mine have or else you’ll have to do some tinkering. Open up a new “Calculated Field” by clicking “Analysis” on the top tool bar, and navigating to “Create Calculated Field.” You can then copy and paste this code into the calculation box.
Breaking Down the Code
Here’s a basic run-through of this code in layman’s terms.
- We bring in our dates and amount of milk (as “milk”) and make a dataframe out of them.
- We make “milk” a float, and make our “dates” datetime (Python’s native time format).
- We’ll then make our row indexes the dates and get rid of any NaNs.
We then instantiate our ARIMA model, and plug in the parameters we passed into this python instance, using the min() function to get the lowest value in each parameter list. In this case, I’m creating a new list that relies on our model to predict all of the values for every day in the original list, plus the amount we asked it to forecast out. Voila, this is now a working ARIMA model.
Note: If you do try and tinker with this code at all you may need to pay close attention to the length of the list you return. I’m not going to go into too much detail here, but just be mindful of that.
Drag your calculated fields over to the shelves, and watch things happen. You can do some nifty comparisons by using dual axes and such.
As a little bonus, we can add our AIC score, which is one of the methods these models are scored on. We’ll basically copy all of our code from the ARIMA part, but change a couple lines at the end.
And then add [AIC Score] to our title and our measures shelf.
Your Tableau Forecast Accuracy Has Improved!
As you can see from the graph above, when all is said and done, our model does a really good job at simulating the data series. In blue, the actual values are present, and in green we have data generated completely from our model, as well as a forecast past the original data’s values.
While Tableau does provide a very easy 3 click solution to forecasting, if you are relying on forecasts to determine business decisions, it is a good idea to use something that captures all of the nuances that exist in time series data; an ARIMA model is just the model for that job.
…if you are relying on forecasts to determine business decisions, it is a good idea to use something that captures all of the nuances that exist in time series data… Click & Tweet!
This new integration has boundless potential, and I think it will really bring Tableau and Data Science together for the better. Thanks for reading, and feel free to reach out with questions in the comments!