Running dbt with Github Actions

dbt is one of those tools that once you get it, you don't want to go back to how you were doing things in the past. It's hard to convey the benefits of dbt unless you have first-hand experience of transforming raw data in your database to a format that your users or visualization tools need. Suffice it to say that managing and scheduling SQL queries and transformations can become quite complex. I prefer to use BigQuery for all my datawarehousing needs, but I often end up with tons of saved SQL queries that I can't always tell apart or how they are used in a project.

In a nutshell, dbt is a command-line tool that allows you to write code that dbt then compiles into SQL and runs against your database. This code allows you to express dependencies between tables and views, e.g. if table B depends on table A, dbt knows to generate table A first. And because it's just code, your dbt models can be version-controlled, tested and deployed as part of a CI/CD workflow.

Speaking of workflows. Since I keep my dbt projects in Github, I thought I'd try to automate my dbt executions using the amazing Github Actions CI/CD workflow solution. A workflow is just a text file where you list the steps you want to execute when an event is triggered in your repo.

In my case, I want to run dbt when I update the model code and push it to the repo. Because I have to update the data in my database every day, I also want to run dbt automatically every day. When dbt has finished I want to capture the output and send the results by email. This workflow can be described in a simple yaml file:

name: Schedule dbt

on:
  push:  # 1. run when new code is pushed
  schedule: # 2. also run once every day at 8am
   - cron:  '0 8 * * *'

jobs:
  action:
    runs-on: ubuntu-latest

    steps:
      - name: checkout # 3. check out models
      - uses: actions/checkout@v2 

      - name: dbt-run # 4. run dbt
        uses: mwhitaker/dbt-action@master  
        with:
          dbt_command: "dbt run --profiles-dir ." # the actual dbt command!
          dbt_project_folder: "dbt_project"
        env:
          DBT_BIGQUERY_TOKEN: $ # credentials to access this service

      - name: SendGrid # 5. send email
        uses: peter-evans/sendgrid-action@v1 
        env:
          SENDGRID_API_KEY: $ # credentials to access this service
          SENDGRID_FROM: $
          SENDGRID_TO: $

That's all that's needed to put together a workflow, and when it's done I get this email:

dbt output email screenshot

There are 1000s of actions you can add to your workflow. I use the awesome SendGrid Action for the email part, but you could use another email service, send by SMS or post to Slack. If nothing quite fits the bill you can just write your own action. I didn't find an existing action for dbt so I ended up writing my own, called dbt-action.

If you use dbt, please take a look at dbt-action and at the sample workflow I used in this post.