Tableau Tips: Review of New Cross Database Joins
If you’re like me, you’ve been waiting a long time for Tableau to add the Holy Grail of functionality, cross database joins. And with Tableau 10 just around the corner, it is on the threshold of arrival. Get ready for a shorter, more simple path to a holistic picture of your data.
A cross database join allows you to join data from two different types of databases as if they were in the same database. The ability to execute a Tableau cross database join will save you time and money, as it allows users to perform tasks that normally require IT resources to be involved; specifically, creating a project where IT extracts the needed data from various databases and consolidates them into a single database (i.e. a data warehouse).
Instead of being dependent on IT, you can join data to create reports and visualizations to answer your own questions.
The new Tableau cross database join functionality enables:
- Rapid prototyping and deployment of reports and visualizations joining data from multiple databases.
- Prototyping how data should be modeled and brought into a data warehouse in order to meet report and visualization needs.
- Creation and publication of data sources that join data across multiple databases, which other Tableau users can use to create their own reports and visualizations.
Tableau Cross Database Join Examples
Suppose you have Order data in a Redshift database which contains Order information and Customer Information. And in Oracle you have Regional Sales information, such as region, zip code, and salesperson.
You want to know how many orders were placed in what Region, and what sales person is responsible for sales. Since the data is in two different databases, you previously couldn’t answer the question without either:
- moving the data into a common database;
- blending the data in Tableau; or
- some other manual effort.
However, in Tableau 10, you can join these two tables, one from Redshift, the other from Oracle, as if they were in the same database. Essentially the resulting SQL would look something like:
Sales Region Oracle
Redshift.Customer Zip = Oracle.Zip Code
This is just a simple example, as you can do much more complex joins using multiple sources and types of databases.
For instance, in Tableau 10.0 beta, using data from a Redshift database and four CSV files which I unioned (using Tableau’s text file Union functionality), I created a cross database join with Redshift data to the unioned CSV files:
CLICK IMAGE BELOW FOR FULL SIZE VIEW
Another example shows data from Google Sheets (yes, in Tableau 10 you will be able to use Google Sheets as a datasource just like Excel or text files) being joined with data from a Redshift database (joining the two “databases” using a common field found in both data sources of Lead ID to Id):
CLICK IMAGE BELOW FOR FULL SIZE VIEW
What About Tableau Data Blending?
Previously, if you had a report that required data from multiple databases, you had to do one of the following:
(1) Submit a project request to IT and wait until they could put the required data into a data warehouse before the reports could be created. (2) Use Tableau data blending to combine the data sources as needed to create the reports. Although data blending is quite powerful and useful (I once created a dashboard which required nine different data sources to be blended) it presents a few challenges:
- Data blending can be difficult and complex to implement correctly;
- it isn’t very flexible; and
- it can’t always deliver the solution needed.
Why Cross Database Joins are Better
The process of creating a Tableau cross database join becomes a prototype for how the data should be brought into the data warehouse in order to meet reporting needs. How many times have you gone through the process of implementing a project to get data added to the data warehouse only to find out, after the work is done:
- The report the user wanted has changed and they need additional or completely different data?
- The visualizations you create drive new insights, changing the user needs and driving new data needs, which have to go back through the IT project lifecycle?
- The data needed wasn’t modeled correctly making it difficult (if not impossible) to use effectively?
With cross database join functionality, you can spend time with the end-user prototyping with actual data, rapidly adding and changing data as new insights occur, and changing requirements as needed.
Once the solution is delivered, the results can be used as the requirements for the data warehouse project,
thus eliminating unsuccessful projects and inefficient re-work.
Here’s a great example of a data model:
The resulting data source can be published to Tableau Server for others to use, or an extract created, and if published it can be scheduled for refreshes. Most things you can do with a data source (with a couple of caveats listed below) you can do with the resulting cross database join data source.
Tableau Tips for Troubleshooting Joins
Like most things, as I’ve tested Tableau 10.0 beta, it doesn’t give me everything I hoped for. Tableau’s development team says additional functionality will be added in later releases, but below are some caveats on what can be done:
- You cannot use a data source from Tableau Server in a cross database join. Even though a data source on Tableau Server is essentially a connection to a data source (be it Oracle, Redshift, Excel, etc.), currently you cannot use it in a database cross join.
- You cannot use a Tableau Extract as a source for cross database joins. There are some complexities with extracts, but Tableau has said they hope to allow cross database joins to extracts in a future release. (Note: Since this post, this feature has been added and will be released with Tableau 10.0)
- The fields used in the cross database joins must be of the same data type. For example, if the Customer Zip Code in Redshift has a datatype of Number, and Zip Code in Oracle has a datatype of String, you cannot do a cross database join on the data. However, you can create a Custom SQL on one of the data sources to change the datatype. As an example, in Tableau you could create Custom SQL for the Oracle database, changing Zip Code to a Number, and then do a cross database join of the Redshift data to the Oracle Custom SQL data. Also, if the data you want to do a cross database join with is from Excel, CSV, or Text, Tableau allows you to modify the data type of a field in these types of files, thus allowing you to do a cross database join using the modified datatype.
- You cannot create a Custom SQL cross database join; you must use the framework/interface that Tableau has provided to create the joins (not much different than the existing interface for joining tables from the same database).
- The performance of rendering visualizations could be slow, depending on the number of databases joined, the amount of data, the complexity of the join, etc. You can create an extract of the data from the cross database join (not a bad thing) to improve performance, and as mentioned previously, the extract can be published to Tableau Server as a scheduled extract to be refreshed as specified.
Some things that I want to see to make this a Holier Grail (is that even possible, holier?) would be the ability to perform cross database joins on extracts and Tableau Server data sources, including extracts on Tableau Server. Why? Let me ask you a question first.
How often have you said, “If I could just get my hands on the data, I could do the analysis and create visualizations?”
If Tableau allows cross database joins with extracts and data sources on Tableau Server, including extracts, you can become the creator of your own data warehouse without having to rely on IT! You can access the databases you need, publish the data to Tableau Server (as a scheduled extract or live) and then use Tableau Server as a data warehouse, doing cross database joins with any of the sources you have published. If you don’t have Tableau Server, you can still do this by creating a set of extract files for all sources and joining what you need.
Of course, with great power comes great responsibility. You would want to ensure you have processes in place to manage the data, keep accurate documentation, and ensure you don’t needlessly replicate data. But if done properly and wisely, the results can be powerful and they can be used for data warehouse requirements when IT resources become available, with the resulting additions to the data warehouse already proven to meet reporting, visualization and analysis needs!
So after this, what’s next on my checklist?
Eh, maybe not… too much responsibility.
Meet Nick and his fellow data visualization enthusiasts during the next Tableau User Group meeting at the Blast HQ in the Sacramento area.