Let's build together: towards an open-source community-driven dbt package for the GA4 BigQuery export
Although it can be great fun to explore your GA4 data in BigQuery, it can also be frustrating to have the feeling that you are reinventing the wheel. As you probably noticed by now, building queries from scratch can be very time consuming.
If only there was a standard - and automated - way of modeling the GA4 export data, so you could actually spent your valuable time consuming the data instead of having to worry about syntax, transformation and definitions, before you can move on to analysis or reporting. When combining GA4, BigQuery, dbt (core or Cloud) and some help from the community this can become a reality.
Community backed standard
While browsing Measure Slack channels I saw Adam Ribaudo mentioning that he started an open source dbt package for GA4. When I read that he wants this project to become a community backed standard for GA4 data modeling, he definitely got my attention.
I asked him what his plans are. Adam:
I created dbt-ga4 to be a home for communal knowledge around GA4 and BigQuery. Most commonly, best practices are captured in one-off blog posts and Slack messages. With this repository, we have the opportunity to incrementally build up a library of tools and techniques that accelerate projects and save practitioners hundreds of hours. That won't happen without contributors! So please reach out and contribute however you can (open a PR, open an issue, or just send ideas).
As you can see below this project is aiming at doing a lot of the heavy lifting for you, such as flattening, sessionization, mapping default channel grouping, excluding parameters from urls, and so on.
Here's the to do list. Let's help Adam (and ourselves) out! You can reach Adam via Github, LinkedIn or email.
GA4 dbt package features
This package connects to an exported GA4 dataset and provides useful transformations as well as report-ready dimensional models that can be used to build reports or blend GA4 data with exported GA3 data.
Features include:
- Flattened models to access common events and event parameters such as
page_view
,session_start
, andpurchase
- Conversion of sharded event tables into a single partitioned table
- Incremental loading of GA4 data into your staging tables
- Session and user dimensional models with conversion counts
- Easy access to query parameters such as GCLID and UTM params
- Support for custom event parameters & custom user properties
- Mapping from source/medium to default channel grouping
- Ability to exclude query parameters (like
fbclid
) from page paths
Useful dbt resources
If you don't have a clue about dbt (data build tool), make sure you check some of these useful resources to get you going:
- Introduction to dbt (data build tool)
- Introduction to dbt and GA4
- dbt Fundamentals course (free)
- dbt Docs
- dbt Core (free)
- dbt Cloud (free for one developer, pricing for teams)
Now it's your turn!
I hope you've enjoyed this and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.