Update your project connection
To setup your Lightdash connection you'll need to:
- Connect to your data warehouse (bigquery, postgres, redshift, snowflake, databricks)
- Connect to your dbt project
Open up your Lightdash instance to get started.
To update an existing connection, head to:
Settings
Project management
Settings
for your project
(see steps 1-3 in the image below)
To create a new project connection, head to:
Settings
Project management
Create New
(see steps 1, 2 & 4 in the image below)
1. Connect to a warehouse
We currently support:
We always recommend giving read-only permissions to Lightdash, that way you ensure than no data can be manipulated. See each section below for warehouse specific details.
Since we host your Lightdash for you, you may need to add our static IP addresses to the allow-list for your security group or firewall rules so they don’t block calls from your Lightdash Cloud to your connected databases.
If you login at:
- app.lightdash.cloud use 35.245.81.252
- eu1.lightdash.cloud use 34.79.239.130
If you login at a different domain, look for the IP in the project settings page (see image below).
Bigquery
Project
This is project ID from Google Cloud Platform for the data that you want to connect Lightdash to.
To find your project ID, head to the BigQuery Cloud console.
Once you're in there, click on the project name in the top bar.
Then, you should see all of the projects and their project IDs in your organization (that you have access to).
For the project you want to connect Lightdash to, just copy its id
and pop it into the project
field in the Warehouse Connection form in Lightdash.
Data set
This is the default dataset used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE
or your local profiles.yml
file.
If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:
If you work with dbt locally, then check your profiles.yml
file at ~/.dbt/profiles.yml
and look for a field
named dataset
:
my-profile:
target: dev
outputs:
dev:
type: bigquery
method: oauth
project: abc-123
dataset: my_dataset # look for this one!
...
Location
The data location of the dataset in BigQuery where the output of your dbt models is written to.
location
may be either a multi-regional location (e.g. EU
, US
), or a regional location (e.g. us-west2
). Check out the BigQuery documentation for more information on dataset locations.
You can find the location of the dataset you're using for your dbt project in your dbt profiles.yml
file, or in your BigQuery console.
Key File
To connect to BigQuery, you'll need to have a service account that you can use with Lightdash. You can read more about creating and managing service accounts with Google BigQuery in their docs.
If you're creating a new service account, make sure that you save your JSON key file somewhere safe! You'll need it for connecting to Lightdash.
The service account you use with Lightdash will need to have the following roles in your GCP project:
roles/bigquery.dataViewer
(to see data in your project)roles/bigquery.jobUser
(to run queries in your project)
If you need to provide access to data across multiple BigQuery projects, the service account will need to be granted roles/bigquery.dataViewer
on each additional BigQuery project.
Once you have a service account all ready to go, you'll need to add its JSON key file to Lightdash in the key file
section of the Warehouse Connection page.
Timeout in seconds
BigQuery supports query timeouts. By default, the timeout is set to 300 seconds. If a query run by Lightdash takes longer than this timeout to complete, then BigQuery may cancel the query and issue the following error:
Operation did not complete within the designated timeout.
To change this timeout, use the Timeout in seconds
configuration.
Priority
The priority for the BigQuery jobs that Lightdash executes can be configured with the priority
configuration in your Warehouse Connection settings. The priority
field can be set to one of batch
or interactive
.
For more information on query priority, check out the BigQuery documentation.
Retries
The retries
configuration specifies the number of times Lightdash should retry queries that result in unhandled server errors.
For example, setting retries
to 5 means that Lightdash will retry BigQuery queries 5 times with a delay. If the query does not succeed after the fifth attempt, then Lightdash will raise an error.
By default, the number of retries is set to 3.
Maximum bytes billed
If a value for the Maximum bytes billed
is set, then queries executed by Lightdash will fail if they exceed the configured maximum bytes threshhold. This configuration should be supplied as an integer number of bytes.
For example, setting this to 1000000000
means if a query would bill more than a gigabyte of data (e.g. 2Gb), then BigQuery will reject the query and you'd get an error like this:
Query exceeded limit for bytes billed: 1000000000. 2000000000 or higher required.
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
Execution project
Here you can specify an execution project to bill for query execution, instead of using the project where your dbt resources are materialized. If you leave this blank, all costs get applied to the project from the top of the connection details.
Postgres
You can see more details in dbt documentation.
Host
This is the host where the database is running.
User
This is the database user name.
Password
This is the database user password.
DB name
This is the database name.
Schema
This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE
or your local profiles.yml
file.
If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:
If you work with dbt locally, then check your profiles.yml
file at ~/.dbt/profiles.yml
and look for a field
named schema
:
company-name:
target: dev
outputs:
dev:
type: postgres
host: [hostname]
user: [username]
password: [password]
port: [port]
dbname: [database name]
schema: [dbt schema] # look for this one!
Port
This is the port where the database is running.
Keep alive idle (seconds)
This specifies the amount of seconds with no network activity after which the operating system should send a TCP keepalive message to the client. You can see more details in postgresqlco documentation.
Search path
This controls the Postgres "search path". You can see more details in dbt documentation.
SSL mode
This controls how dbt connects to Postgres databases using SSL. You can see more details in dbt documentation.
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
Use SSH tunnel
Enable to input your SSH Remote Host, SSH Remote Port, SSH Username, and to generate a public SSH key.
Redshift
You can see more details in dbt documentation.
Host
This is the host where the database is running.
User
This is the database user name.
Password
This is the database user password.
DB name
This is the database name.
Schema
This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE
or your local profiles.yml
file.
If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:
If you work with dbt locally, then check your profiles.yml
file at ~/.dbt/profiles.yml
and look for a field
named schema
:
company-name:
target: dev
outputs:
dev:
type: redshift
host: hostname.region.redshift.amazonaws.com
user: username
password: password1
port: 5439
dbname: analytics
schema: analytics # look for this one!
Port
This is the port where the database is running.
Keep alive idle (seconds)
This specifies the amount of seconds with no network activity after which the operating system should send a TCP keepalive message to the client.
If the database closes its connection while Lightdash is waiting for data, you may see the error SSL SYSCALL error: EOF detected
. Lowering the keepalives_idle
value may prevent this, because the server will send a ping to keep the connection active more frequently.
By default, this value is set to 240 seconds, but can be configured lower (perhaps 120 or 60), at the cost of a chattier network connection.
SSL mode
This controls how dbt connects to Postgres databases using SSL.
RA3 Node
Allow dbt to use cross-database-resources
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
Use SSH tunnel
Enable to input your SSH Remote Host, SSH Remote Port, SSH Username, and to generate a public SSH key.
Snowflake
You can see more details in dbt documentation.
Account
This is your Snowflake account identifer.
The easiest way to find this is to login to snowflake in your browser and your account identifier is whatever comes
before snowflakecomputing.com
.
For example in the image below, the user logs in via https://aaa99827.snowflakecomputing.com/console/login#/
so
the account identifier is aaa99827
.
If you don't have access via the browser, you can use the following format <organization_name>-<account_name>
where
organization_name
and account_name
can be found by following any of the methods listed in
Managing accounts in your organization.
User
This is the login name for your Snowflake user. This is usually the same username you use to login to Snowflake.
Alternatively, you can create a new user through the snowflake console with a username and password specifically for Lightdash to use.
If you're a snowflake admin you can list all users available in the snowflake console:
Password
This is the password your Snowflake user. This is usually the same password you use to login to Snowflake:
Alternatively, you can create a new user through the snowflake console with a username and password specifically for Lightdash to use.
If you're a Snowflake admin you can list all users available in the snowflake console:
Role
This is the security role that you would like to use when running queries as the specified user. The role must have access to any warehouses, databases, schemas, and tables you want to use.
If you're a Snowflake admin you can list all roles available in the snowflake console:
You can configure your role to allow read access to all warehouses, databases, schemas, and tables by following the guide for Creating custom read-only roles.
Database
This is the name of your database. The specified user must be granted access to this database. You can see a list of databases available in the snowflake console:
Warehouse
This is the name of the warehouse you would like to use for running queries. The specified user must be grantend access to use this warehouse. You can see al list of warehouses available in the snowflake console:
If “Always use this warehouse” is set to yes, this warehouse will be used for all queries, even if the dbt configuration specifies a different warehouse using snowflake_warehouse.
Schema
This is the default schema used by dbt to compile and run your dbt project. You can find this in the dbt cloud IDE
or your local profiles.yml
file.
If you're a dbt cloud user you can find this under your profile in the dbt cloud IDE:
If you work with dbt locally, then check your profiles.yml
file at ~/.dbt/profiles.yml
and look for a field
named schema
:
my-snowflake-db:
target: dev
outputs:
dev:
type: snowflake
account: [account id]
user: [username]
password: [password]
role: [user role]
database: [database name]
warehouse: [warehouse name]
schema: [dbt schema] # Look for this one!
Keep client session alive
This is intended to keep Snowflake sessions alive beyond the typical 4 hour timeout limit. You can see more details in dbt documentation.
Query tag
A value with which to tag all queries, for later searching in QUERY_HISTORY view)
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
Databricks
The credentials needed to connect to your cluster can be found in the ODBC options in your databricks account:
- Go to the
Compute
tab in the sidebar. - Click the configuration tab for the cluster that you're connecting to Lightdash.
- Expand the
Advanced options
tab - Open the
JDBC/ODBC
tab
Server hostname
Follow the instructions above to find your ODBC connection instructions.
HTTP Path
Follow the instructions above to find your ODBC connection instructions.
Port
Follow the instructions above to find your ODBC connection instructions.
Personal Access Token
Your personal access token can be found in your user settings in databricks:
- Open
Settings
by clicking the cog ⚙️ in the sidebar and selectUser settings
- Click
Generate token
. You'll be asked to enter a name and expiry. - Copy the token
Database
The default database name used by dbt for this connection.
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
Trino
We only support LDAP authentication with Trino. You can see more details in dbt's documentation.
Host
The hostname of your cluster. E.g. mycluster.mydomain.com
Don't include the http:// or https:// prefix.
User
The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username.
Format for Starburst Enterprise or Trino:
user.name
user.name@mydomain.com
Format for Starburst Galaxy:
user.name@mydomain.com/role
Password
This is the password for authentication.
DB name
Specify the name of the database that your dbt models are built into. This is the name of a catalog in your cluster.
e.g. my_postgres_catalog
Port
The port to connect to your cluster. By default, it's 443 for TLS enabled clusters.
e.g. 443
SSL mode
This controls how dbt connects to Trino database using SSL.
Start of week
This controls what day is the start of the week in Lightdash. Auto
sets it to whatever the default is for your data warehouse. Or, you can customize it and select the day of the week from the drop-down menu. This will be taken into account when using 'WEEK' time interval in Lightdash.
2. Import a dbt project
Connecting Lightdash to a hosted dbt project means that you'll be able to keep your Lightdash instance in sync with the changes in your dbt project.
To connect your dbt project, just head to your project connection settings in Lightdash:
Then scroll down to your dbt project connection:
Pick your repository type and follow the guide below:
We don't support connecting Lightdash to dbt Cloud managed repositories. If you're using dbt Cloud and you'd like to use Lightdash, you'll need to have a remote repository for your dbt project in one of the services listed above.
GitHub
Personal access token
You can create a personal access token (classic), or a fine-grained access token (beta).
Personal access token (classic)
This is used to access your repo. See the instructions for creating a personal access token here.
Select repo
scope when you're creating the token.
Fine-grained access token (beta)
Fine-grained access tokens are new special tokens that can only give access to individual repositories on your github account. You can read more about it on the Github docs.
- Go to settings > Developer access > Personal access tokens > fine-grained token
- Add the name, expiration, description and owner (we'll need their username later)
- Add repository access, you might want to give access only to your
dbt
repository
- On repository permissions, select
Contents
- Click
Generate token
and copy the token.
You could also replace your old tokens with new fine-grained tokens on project settings.
Repository
This should be in the format my-org/my-repo
. e.g. lightdash/lightdash-analytics
Branch
This is the branch in your GitHub repo that Lightdash should sync to. e.g. main
, master
or dev
By default, we've set this to main
but you can change it to whatever you'd like.
Project directory path
This is the folder where your dbt_project.yml
file is found in the GitHub repository you entered above.
- Put
/
if yourdbt_project.yml
file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml) - Include the path to the sub-folder where your dbt project is if your dbt project is in a sub-folder in your repo. For example, if my project was in lightdash/lightdash-analytics/dbt/dbt_project.yml, I'd write
/dbt
in this field.
Host domain
If you've customized the domain for your GitHub pages, you can add the custom domain for your project in here.
By default, this is github.com
Target name
target
contains information about your dbt connection to your warehouse.
It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml
file.
If you want to change this to be something other than the default target
defined in dbt, you can enter the target of your choice here (for example dbt_khindson
.)
To read more about dbt targets, check out the dbt docs here.
dbt selector
You can filter out models in your dbt project that you don't want to connect to Lightdash. This is useful if you have a large dbt project and you want to speed up the sync process. Unlike table selection, this selector is applied to the dbt models, so it will skip the entire compilation process for the models that you don't want to connect to Lightdash.
To do this, you can add a dbt_selector
to your dbt project. This is a JSON object that contains the models you want to include in Lightdash.
For example, if you only want to include the my_model
and all models with the lightdash
tag in Lightdash, you can add the following to your dbt project settings:
my_model tag:lightdash
We support all the dbt selectors, you can read more about them on the dbt docs.
Environment variables
If you've used environment variables in your dbt profiles.yml
file, you can add these to Lightdash here.
For each environment variable, you'll need to add the key
+ value
pair for the item.
You'll normally find these values in a file called .env
in your dbt project directory.
For example, I might have something like:
profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....
Then a .env
file like:
export DBT_USER="myspecialuserkey123"
So, in Lightdash, I'd add a new environment variable and put key
as DBT_USER
and value
as myspecialuserkey123
.
GitLab
Personal access token
This is used to access your repo. See the instructions for creating a personal access token here.
Select read_repository
scope when you're creating the token. The token, if using a project access token, or the user, when using a personal access token, needs to have permission to download the code. Normally this would be the Reporter
role.
Repository
You can find this in the GitLab URL when you're in your repo. This should be in the format my-org/my-repo
. e.g. if my browser had https://gitlab.com/lightdash/lightdash-analytics.gitlab.io
, I'd put in: lightdash/lightdash-analytics
as my repository in Lightdash.
Branch
This is the branch in your GitLab repo that Lightdash should sync to. e.g. main
, master
or dev
By default, we've set this to main
but you can change it to whatever you'd like.
Project directory path
This is the folder where your dbt_project.yml
file is found in the GitLab repository you entered above.
If your dbt_project.yml
file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml
),
then you don't need to change anything in here. You can just leave the default value we've put in.
If your dbt project is in a sub-folder in your repo (e.g. lightdash/lightdash-analytics/dbt/dbt_project.yml
), then
you'll need to include the path to the sub-folder where your dbt project is (e.g. /dbt
).
Host domain
If you've customized the domain for your GitLab pages, you can add the custom domain for your project in here.
By default, this is gitlab.io
.
Target name
target
contains information about your dbt connection to your warehouse.
It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml
file.
If you want to change this to be something other than the default target
defined in dbt, you can enter the target of your choice here (for example dbt_khindson
.)
To read more about dbt targets, check out the dbt docs here.
dbt selector
You can filter out models in your dbt project that you don't want to see in Lightdash. This is useful if you have a large dbt project and you want to speed up the sync process. Unlike table selection, this selector is applied to the dbt models, so it will skip the entire compilation process for the models that you don't want to see in Lightdash.
To do this, you can add a dbt_selector
to your dbt project. This is a JSON object that contains the models you want to include in Lightdash.
For example, if you only want to include the my_model
and all models with the lightdash
tag in Lightdash, you can add the following to your dbt project settings:
my_model tag:lightdash
We support all the dbt selectors, you can read more about them on the dbt docs.
Environment variables
If you've used environment variables in your dbt profiles.yml
file, you can add these to Lightdash here.
For each environment variable, you'll need to add the key
+ value
pair for the item.
You'll normally find these values in a file called .env
in your dbt project directory.
For example, I might have something like:
profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....
Then a .env
file like:
export DBT_USER="myspecialuserkey123"
So, in Lightdash, I'd add a new environment variable and put key
as DBT_USER
and value
as myspecialuserkey123
.
Azure DevOps
Personal access token
This is your secret token used to access Azure Devops. See the instructions to create a personal access token You must specify at least the Repo:Read scope.
Organization
This is the name of the organization that owns your repository
Project
This is the name of the project that owns your repository
Repository
This is the name of the repository. For many projects, this is the same as your project name above.
Branch
This is the branch in your repository that Lightdash should sync to. e.g. main
, master
or dev
By default, we've set this to main
but you can change it to whatever you'd like.
Project directory path
This is the folder where your dbt_project.yml
file is found in the repository you entered above.
If your dbt_project.yml
file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml
),
then you don't need to change anything in here. You can just leave the default value we've put in.
If your dbt project is in a sub-folder in your repo (e.g. lightdash/lightdash-analytics/dbt/dbt_project.yml
), then
you'll need to include the path to the sub-folder where your dbt project is (e.g. /dbt
).
Target name
target
contains information about your dbt connection to your warehouse.
It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml
file.
If you want to change this to be something other than the default target
defined in dbt, you can enter the target of your choice here (for example dbt_khindson
.)
To read more about dbt targets, check out the dbt docs here.
dbt selector
You can filter out models in your dbt project that you don't want to see in Lightdash. This is useful if you have a large dbt project and you want to speed up the sync process. Unlike table selection, this selector is applied to the dbt models, so it will skip the entire compilation process for the models that you don't want to see in Lightdash.
To do this, you can add a dbt_selector
to your dbt project. This is a JSON object that contains the models you want to include in Lightdash.
For example, if you only want to include the my_model
and all models with the lightdash
tag in Lightdash, you can add the following to your dbt project settings:
my_model tag:lightdash
We support all the dbt selectors, you can read more about them on the dbt docs.
Environment variables
If you've used environment variables in your dbt profiles.yml
file, you can add these to Lightdash here.
For each environment variable, you'll need to add the key
+ value
pair for the item.
You'll normally find these values in a file called .env
in your dbt project directory.
For example, I might have something like:
profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....
Then a .env
file like:
export DBT_USER="myspecialuserkey123"
So, in Lightdash, I'd add a new environment variable and put key
as DBT_USER
and value
as myspecialuserkey123
.
Local dbt project
Unsuitable for production and only available for Lightdash instances installed on your local machine
To start Lightdash with the option to connect to a local dbt project, you must specify the directory of the dbt project when you start docker compose:
# Specify the absolute path to your dbt project
# e.g. export DBT_PROJECT_DIR=/Users/elonmusk/mydbtproject
export DBT_PROJECT_DIR= # Enter your path here!
docker compose start
Bitbucket
Username
This is the login name for your Bitbucket user. This is usually the same username you use to login to Bitbucket. You can find your username in Bitbucket by:
- Going to your avatar in the bottom left and click Personal settings.
- From the Account settings page, check the value next to the Username
Alternatively, you can create a new user through the Bitbucket console with a username and password specifically for Lightdash to use.
Http access token
Getting a token depends on whether you use Bitbucket Cloud or Bitbucket server:
- Bitbucket Cloud users should follow instructions for creating an App Password
- Bitbucket Server users should follow instructions for creating a HTTP Access Token
Select Project read
and Repository read
scope when you're creating the token.
Repository
This should be in the format my-org/my-repo
. e.g. lightdash/lightdash-analytics
Branch
This is the branch in your Bitbucket repo that Lightdash should sync to. e.g. main
, master
or dev
By default, we've set this to main
but you can change it to whatever you'd like.
Project directory path
This is the folder where your dbt_project.yml
file is found in the Bitbucket repository you entered above.
- Put
/
if yourdbt_project.yml
file is in the main folder of your repo (e.g. lightdash/lightdash-analytics/dbt_project.yml) - Include the path to the sub-folder where your dbt project is if your dbt project is in a sub-folder in your repo. For example, if my project was in lightdash/lightdash-analytics/dbt/dbt_project.yml, I'd write
/dbt
in this field.
Host domain
If you've customized the domain for your Bitbucket server, you can add the custom domain for your project in here.
Target name
target
contains information about your dbt connection to your warehouse.
It's the dataset/schema in your data warehouse that Lightdash will look for your dbt models. By default, we set this to be the same value as you have as the default in your profiles.yml
file.
If you want to change this to be something other than the default target
defined in dbt, you can enter the target of your choice here (for example dbt_khindson
.)
To read more about dbt targets, check out the dbt docs here.
dbt selector
You can filter out models in your dbt project that you don't want to see in Lightdash. This is useful if you have a large dbt project and you want to speed up the sync process. Unlike table selection, this selector is applied to the dbt models, so it will skip the entire compilation process for the models that you don't want to see in Lightdash.
To do this, you can add a dbt_selector
to your dbt project. This is a JSON object that contains the models you want to include in Lightdash.
For example, if you only want to include the my_model
and all models with the lightdash
tag in Lightdash, you can add the following to your dbt project settings:
my_model tag:lightdash
We support all the dbt selectors, you can read more about them on the dbt docs.
Environment variables
If you've used environment variables in your dbt profiles.yml
file, you can add these to Lightdash here.
For each environment variable, you'll need to add the key
+ value
pair for the item.
You'll normally find these values in a file called .env
in your dbt project directory.
For example, I might have something like:
profile:
target: prod
outputs:
prod:
type: postgres
host: 127.0.0.1
user: "{{ env_var('DBT_USER') }}"
....
Then a .env
file like:
export DBT_USER="myspecialuserkey123"
So, in Lightdash, I'd add a new environment variable and put key
as DBT_USER
and value
as myspecialuserkey123
.