Data relationships vs. data joins in Tableau

Find out which problem data relationships solve, and why they’re so useful in the example below.

In the past, whenever you combined data in the Data Source tab, Tableau would automatically create a join. You could specify what type of join, and which fields should be used to perform the join.

However, you may have noticed that since version 2020.2, Tableau creates a data relationship by default. You’ll see something like this when you drag in additional data sources.

You’ll notice that a join hasn’t been defined, and no data has been combined yet.

❓ Did you know?

Data linked with relationships are only combined when the data is used in a view. At which point, Tableau will combine the data, and aggregate at the same level of detail as the original tables.


This will make more sense once you look at the example below!

When you click on the relationship between two datasets, Tableau lets you specify how the data is related. In the case below, the tables customers and orders are linked together using customer_id.

❓ Did you know?

Tableau considers data combined using joins to be in the physical layer of the data model, as the data is physically combined in the database.


Data combined using relationships is located in the logical layer, as the combined data doesn’t exist anywhere in our database. We can only see this combined data when we build views.

What are the benefits of using relationships?

This is best shown using an example. Let’s take our customers and orders tables from above - this is what the individual data tables look like:

Customers

customer_id customer_name mambership_status customer_years
1 Michael non-member 3
2 Jim member 5
3 Pam member 12
4 Dwight non-member 4
5 Kevin member 7
6 Angela member 8

customer_years indicates how long someone has been a customer

Orders

order_id customer_id amount_usd
1111 2 233
1222 2 521
1422 2 41
5111 3 400
6663 3 389
5522 4 56
8222 4 85
4422 6 22

When using an outer join to combine this data on customer_id, it looks like this:

You’ll see that some customer data is now duplicated due to the customer having multiple orders. For example, Jim’s membership_status and customer_years data are here three times. This can cause some unexpected results when aggregating the data as you’ll see below.

Also, you’ll see that we have null values where we had a customer record, but no orders from those customers - Michael and Kevin above.

Click here to clone and experiment with the Tableau workbook

Tableau logo

Aggregating data from a join

If we build a view using the joined data above, it runs all aggregations on this joined data. This means that it can throw out some unexpected results for users who aren’t familiar with data joins.

For example, taking the simple average of the number of years a person has been a customer, customer_years by membership_status gives us the incorrect answer - this is because customers with multiple orders are duplicated in the joined dataset.

Aggregating data from a relationship

When the data combination is done using a relationship, Tableau aggregates customer_years to the level of detail in the original table, i.e. customer level - which results in the correct aggregation.

That’s right, Tableau combines this data on the fly whenever you build a new view. And the aggregation is performed to the level of detail of the original table the data came from. Which results in the correct average by membership status.

We could still get to the same result when using a join by defining a calculated field with a LOD (Level of Detail) calculation, but this requires more effort, and novice Tableau users may not understand why they’re seeing “incorrect” aggregations.

Relationships also mean that you don’t need to create custom joins for different types of views.

That’s why Tableau introduced this feature, and I’m glad they did, as it will save a lot of time when explaining to clients why they’re not getting the results they expected.