Joins let you connect different models to each other so that you can explore more than one model at the same time in Lightdash and see how different parts of your data relate to each other.
fields
to include all metrics and dimensions defined in the joined table.sql_on
statement can include references to user attributes for row-level filtering.label
tag, for example on the messages
model it makes sense to rename the joined users
table as “Sender”:
alias
to distinguish between the different tables. A common use case is joining a user table multiple times to another table depending on the type of user. For example this messages
model has both a sender and a recipient:
${sender.user_id}
rather than ${users.user_id}
.${users.*}
will fail to compile. Be careful of aliasing tables that are used in the base model.label:
field as above.type
field in your YAML configuration file. Set its value to one of the following: inner
, left
, right
, or full
. This will determine how the tables are joined in your query, aligning with SQL’s inner join, left outer join, right outer join, or full outer join respectively.
Here’s an example of how to specify a join type:
type
is specified, Lightdash will use a left join
.Join Type | Generated SQL | Description |
---|---|---|
inner | inner join | Returns rows that have matching values in both tables. |
left | left outer join | Returns all rows from the left table, and the matching rows from the right table. Non-matching rows will have NULL for right table’s columns. |
right | right outer join | Returns all rows from the right table, and the matching rows from the left table. Non-matching rows will have NULL for left table’s columns. |
full | full outer join | Returns all rows when there is a match in either the left or right table records. Non-matching rows will have NULL for columns of the table that lacks a match. |
users
table and a subscriptions
table, an inner join would return only the users who have a subscription.
Here’s an example of how to specify an inner join:
join
to), and the matching rows from the right table (i.e. the model you’ve specified in - join:
). Non-matching rows will have NULL
for right table’s columns. For example, if you have a users
table and a subscriptions
table, a left join would return all users, and the subscription information for users who have a subscription.
Here’s an example of how to specify a left join:
- join:
), and the matching rows from the left table (i.e. the model where you’re adding the join
to). Non-matching rows will have NULL
for left table’s columns. For example, if you have a users
table and a subscriptions
table, a right join would return all subscriptions, and the user information for users who have a subscription.
Here’s an example of how to specify a right join:
NULL
for columns of the table that lacks a match. For example, if you have a users
table and a subscriptions
table, a full join would return all users and all subscriptions, and the subscription information for users who have a subscription.
relationship
field to your join configuration:
one-to-many
many-to-one
one-to-one
always
field to true
.
fields
tag to select a subset of fields from a join. This is useful if you want to join a model but only a few of its fields are useful in the joined context. For example this messages
model only needs the name
andemail
fields from the users
model. Note we must also include the user_id
field since it’s needed for the join.
deals.unique_enterprise_deals
by using a joined field in a filter: ${accounts.segment} = 'Enterprise'
.Then later you need to join deals
to a marketing_attribution
model.The unique_enterprise_deals
metric must be excluded from the join unless you also join accounts
to get the accounts.segment
field.Check out our dimensions and metrics reference docs to see all of the other configurations you can use with your joined fields.hidden: true
tag to joins, like this to hide these intermediate models:
hidden
tag in the join and changing nothing else will remove the joined table from the sidebar of metrics/dimensions in users
, but it will still appear in the Tables list as a table to explore on its own.
min(order_total)
and max(order_total)
give you the same answer whether you have duplicates or not, because they’re looking for the extreme values, not counting or adding things up. The maximum value in a set doesn’t change just because that value appears multiple times.Aggregate functions with distinct:
count(distinct user_id)
ensures each user is counted only once, regardless of how many times they appear in the joined result.sum
, count
, avg
without using distinctorganization
table to a users
table, each organization can have multiple users.
In this scenario:
✅ User metrics remain accurate Since each user appears exactly once in the joined result, any metrics calculated on the user level (like average user age or total user count) will be correct.
❌ Organization metrics become inflated - Each organization will appear multiple times in the result set (once for each associated user). This means organization-level metrics will be counted multiple times, leading to inflated results.
For example, if “ABC Corp” has 2 users, its org_total_users metric cannot be summed because it will be duplicated twice.
Raw tables
organizations
table
organization_id | organization_name | org_total_users |
---|---|---|
1 | ABC Corp | 2 |
2 | XYZ Ltd | 3 |
users
table
organization_id | user_id | user_age |
---|---|---|
1 | 100 | 57 |
1 | 200 | 13 |
2 | 300 | 20 |
2 | 400 | 30 |
2 | 500 | 19 |
user_id | organization_id | org_total_users | user_age |
---|---|---|---|
100 | 1 | 2 | 57 |
200 | 1 | 2 | 13 |
300 | 2 | 3 | 20 |
400 | 2 | 3 | 30 |
500 | 2 | 3 | 19 |
users
to their orders
, and then orders
to their support tickets
.
In this scenario:
✅ Ticket metrics remain accurate - The final table (tickets
) in the chain will have accurate metrics because each ticket appears exactly once in the result set.
❌ Upstream tables experience compounding inflation - The further up the chain you go, the worse the inflation gets:
users
→ orders
→ tickets
, the result would look like:
Raw tables
users
table
user_id | user_name | user_age |
---|---|---|
1 | Alice | 23 |
orders
table
order_id | user_id | order_total_items |
---|---|---|
1001 | 1 | 5 |
2001 | 1 | 2 |
tickets
Table
order_id | ticket_id | ticket_time_to_first_response_mins |
---|---|---|
1001 | A | 8 |
1001 | B | 62 |
1001 | C | 47 |
2001 | D | 5 |
2001 | E | 1 |
2001 | F | 89 |
user_name | user_age | order_id | order_total_items |
---|---|---|---|
Alice | 23 | 1001 | 5 |
Alice | 23 | 2001 | 2 |
user_name | order_id | ticket_id | user_age | order_total_items | ticket_time_to_first_response_mins |
---|---|---|---|---|---|
Alice | 1001 | A | 23 | 5 | 8 |
Alice | 1001 | B | 23 | 5 | 62 |
Alice | 1001 | C | 23 | 5 | 47 |
Alice | 2001 | D | 23 | 2 | 5 |
Alice | 2001 | E | 23 | 2 | 1 |
Alice | 2001 | F | 23 | 2 | 89 |
users
and orders
are duplicated but tickets
are not, which means we can only safely apply aggregates to metrics on the tickets
table.users
→ orders
AND users
→ tickets
(both orders
and tickets
relate to users
, but not directly to each other)
This creates the most problematic data situation, when you join multiple one-to-many relationships that branch from the same table but aren’t directly related to each other.
In this scenario:
❌ EVERY aggregate metric gets inflated due to the cartesian product effect
orders
× tickets
. If a user has 5 orders and 10 support tickets, each user-level metric will be counted 50 times!users
to both orders
and tickets
separately, the result would look like:
Raw tables
users
table
user_id | user_name | user_credit_amount |
---|---|---|
1 | Alice | 100 |
orders
table
user_id | order_id | order_total_items |
---|---|---|
1 | 1001 | 5 |
1 | 2001 | 2 |
tickets
Table
user_id | ticket_id | ticket_time_to_first_response_mins |
---|---|---|
1 | A | 8 |
1 | B | 62 |
1 | C | 47 |
user_name | order_id | ticket_id | user_credit_amount | order_total_items | ticket_time_to_first_response_mins |
---|---|---|---|---|---|
Alice | 1001 | A | 100 | 5 | 8 |
Alice | 1001 | B | 100 | 5 | 62 |
Alice | 1001 | C | 100 | 5 | 47 |
Alice | 2001 | A | 100 | 2 | 8 |
Alice | 2001 | B | 100 | 2 | 62 |
Alice | 2001 | C | 100 | 2 | 47 |
users
, orders
and tickets
records. We cannot apply aggregates on any metrics without deduplicating records.sum
, count
, avg
) is safe without explicit deduplication techniques.
primary_key
and the join relationship
allows Lightdash to:
account_id | account_name |
---|---|
1 | TechCorp |
2 | StartupXYZ |
user_id | account_id | user_name |
---|---|---|
101 | 1 | Alice |
102 | 1 | Bob |
201 | 2 | David |
account_id | fee_type | per_user_amount |
---|---|---|
1 | Support Fee | 80.00 |
1 | Training Fee | 120.00 |
2 | Support Fee | 50.00 |
2 | Training Fee | 75.00 |
account_name | user_name | fee_type | per_user_amount |
---|---|---|---|
TechCorp | Alice | Support Fee | 80.00 |
TechCorp | Alice | Training Fee | 120.00 |
TechCorp | Bob | Support Fee | 80.00 |
TechCorp | Bob | Training Fee | 120.00 |
StartupXYZ | David | Support Fee | 50.00 |
StartupXYZ | David | Training Fee | 75.00 |