Tutorial: How to query and calculate goals based on urls and/or events
In this tutorial I will show you - step by step - how to calculate Google Analytics goals in BigQuery.
When you measure your ecommerce performance in Google Analytics, you can just use the transaction data in the BigQuery export schema to see how your store performs:
select
ifnull(sum(totals.transactions),0) as transactions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_20160801`
Transactions are easy. But what if you use goals, based on urls or events, as defined in the Google Analytics user interface? They are nowhere to be found in the BigQuery export.
Preparations
In this article we will use the Google Analytics Sample dataset for BigQuery, which contains analytics data from the Google Merchandise Store. I recommend you use your own data set instead.
Goal completions
You'll have to calculate them from scratch. We'll use a case when
clause to set the condition(s) for the goal and count the amount of sessions in which the conditions are met. As you probably know: goals are counted only once per session.
Let's assume we have two goals defined in Google Analytics. Goal 1 is based on a destination url and goal 2 is based on an event. As a bonus we'll create goal 3 as well, which is based on a combination of a destination url and an event (something that is not possible in GA).
Basic query
We start with the framework of our query. First let's define a time frame. In this case I'll take a static range with a start and end date. If you need a dynamic date range, read this.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
Next, add a source for our data. To be able to query multiple day tables at once we use a wildcard *
in the table name.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions
The query above will return an error. To make the wildcard work we need to add a _table_suffix
and refer to the tables we want to query, using a cross join
(in short:,
) after our sessions table.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
As we are calculating pages as well as events, we have to unnest our session data so that we have access to the hit level data (again we use a cross join
).
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
Now BigQuery will look at all hits. But as we only need the hits that contain pageviews and events, we can add a condition to the filter accordingly.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
Finally, we need to add another condition. We only want to count visits that contain interaction: totals.visits = 1
. By default and opposing to the Google Analytics user interface, the BigQuery export tables contains all sessions, including the ones without interactions.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
and totals.visits = 1
Add measure: goal 1
Now we are ready to add our first goal. Let's say that the goal we have configured in Google Analytics tracks the amount of orders placed. The destination url of the goal is /ordercompleted
. I use a regex_contains
function to be flexible in the case we need multiple pages. If you are not comfortable with using regex, you can also go for hits.page.pagepath = '/ordercompleted'
.
When the condition is met we get a session id concat(cast(fullvisitorid as string),cast(visitstarttime as string))
and the count(distinct...)
makes sure that the goal is not counted multiple times per session.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
Add measure: goal 2
Next up: goal 2. I use a specific combination of event category, event action and event label for this example. Then we count all unique session ids again that match this condition, like we do with goal 1. Note that I use the lower()
function in combination with lowercase strings to match all events, whether it's upper or lower case. If your event data is already lowercase by using a lowercase filter in Google Analytics, this is not necessary.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_2_completions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
Add measure: goal 3
To show you the magic of utilising the BigQuery export data, I also made up a goal with page and event data. This is not possible using Google Analytics only. As an example we're looking for add to cart
events on page urls that contain the string women
.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_2_completions,
count(distinct case when regexp_contains(hits.page.pagepath,'women') and lower(hits.eventinfo.eventcategory) = 'enhanced ecommerce' and lower(hits.eventinfo.eventaction) = 'add to cart' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_3_completions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
Add dimension: goal completion location
If you want to know on what page(s) your goal was counted, like in Google Analytics Conversions | Goals | Goal URLs report, you can add the goal completion location to your query. This makes most sense for goals that are based on events.
To get the goal completion location we add the dimension hits.page.pagepath
to the query and group our calculated measures by this dimension.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
hits.page.pagepath as goal_completion_location,
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_2_completions,
count(distinct case when regexp_contains(hits.page.pagepath,'women') and lower(hits.eventinfo.eventcategory) = 'enhanced ecommerce' and lower(hits.eventinfo.eventaction) = 'add to cart' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_3_completions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
group by
goal_completion_location
As you will see the query will now count our goals for all pages, so we have to add a filter to only get back results for pages that where a goal is counted at least once. When you have one measure, you can use having goal_x_completion >= 1
. In our case we have three goals, so we want to see pages where at least one of the goals is counted once or more.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
hits.page.pagepath as goal_completion_location,
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_2_completions,
count(distinct case when regexp_contains(hits.page.pagepath,'women') and lower(hits.eventinfo.eventcategory) = 'enhanced ecommerce' and lower(hits.eventinfo.eventaction) = 'add to cart' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_3_completions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
group by
goal_completion_location
having
goal_1_completions >= 1
or goal_2_completions >= 1
or goal_3_completions >= 1
Add measure: goal conversion rate
Just like the ecommerce conversion rate for transactions we can calculate a conversion rate for goals too. The formula: goal completions divided by all sessions. For the sake of visibility I'll remove the page dimension from the query and add the conversion rate calculations.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_1_completions,
count(distinct case when regexp_contains(hits.page.pagepath,'/ordercompleted') then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) / count(distinct concat(cast(fullvisitorid as string),cast(visitstarttime as string))) as goal_1_conversion_rate,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_2_completions,
count(distinct case when lower(hits.eventinfo.eventcategory) = 'contact us' and lower(hits.eventinfo.eventaction) = 'onsite click' and lower(hits.eventinfo.eventlabel) = 'phone' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) / count(distinct concat(cast(fullvisitorid as string),cast(visitstarttime as string))) as goal_2_conversion_rate,
count(distinct case when regexp_contains(hits.page.pagepath,'women') and lower(hits.eventinfo.eventcategory) = 'enhanced ecommerce' and lower(hits.eventinfo.eventaction) = 'add to cart' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) as goal_3_completions,
count(distinct case when regexp_contains(hits.page.pagepath,'women') and lower(hits.eventinfo.eventcategory) = 'enhanced ecommerce' and lower(hits.eventinfo.eventaction) = 'add to cart' then concat(cast(fullvisitorid as string),cast(visitstarttime as string)) else null end) / count(distinct concat(cast(fullvisitorid as string),cast(visitstarttime as string))) as goal_3_conversion_rate
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type in ('PAGE','EVENT')
Now it's your turn!
I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.