Tutorial: How to create your own custom channel grouping
In this tutorial I will show you - step by step - how to mirror the default channel grouping according to the Google Analytics definitions and set up a custom channel grouping, all in BigQuery.
If you need the Default Channel Grouping, just use the channel grouping dimension. However, there can be various reasons to build your own Channel Grouping.
When you use intraday data, for instance, as the channel grouping dimension is not available there. Or when you need to βrepairβ data quality issues on historical data, as this is not possible with the Default Channel Grouping dimension in the Google Analytics UI.
I will show you how you can mirror the standard Google Analytics definitions of the Default Channel Grouping in BigQuery. If you use this query as a starting point itβs not so difficult anymore to create your own custom or Default Channel Grouping.
Preparations
In this article we will use the Google Analytics Sample dataset for BigQuery, which contains analytics data from the Google Merchandise Store. I recommend you use your own data set instead.
Basic query
We start with the framework of our query. First let's define a time frame. In this case I'll take a static range with a start and end date. If you need a dynamic date range, read this.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
Next, add a source for our data. To be able to query multiple day tables at once we use a wildcard *
in the table name.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions
The query above will return an error. To make the wildcard work we need to add a _table_suffix
and refer to the tables we want to query, using a cross join
(in short:,
) after our sessions table.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
Finally, we need to add another condition. We only want to count visits that contain interaction: totals.visits = 1
. By default and opposing to the Google Analytics user interface, the BigQuery export tables contains all sessions, including the ones without interactions.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and totals.visits = 1
Add default channel grouping
As a reference we add the channelgrouping
dimension and a measure for sessions. We group the sessions by the alias default_channel_grouping
and order by sessions descending.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
channelgrouping as default_channel_grouping,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and totals.visits = 1
group by
default_channel_grouping
order by
sessions desc
Mirror default channel grouping
Next, we start mirroring the default channel grouping according to the documentation Google provides us. Basically we add a monstrous case when
statement with the exact definitions that Google Analytics uses by default to create the channel grouping. Don't forget to add ,unnest(hits) as hits
because the logic needs some hit level data too.
To add flexibility and efficiency in your query, you can use regular expression.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
channelgrouping as default_channel_grouping,
case
when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
when trafficsource.medium = 'organic' then 'Organic Search'
when hits.social.hassocialsourcereferral = 'Yes' or regexp_contains(trafficsource.medium, '^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
when trafficsource.medium = 'email' then 'Email'
when trafficsource.medium = 'affiliate' then 'Affiliates'
when trafficsource.medium = 'referral' then 'Referral'
when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' then 'Paid Search'
when regexp_contains(trafficsource.medium, '^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
when regexp_contains(trafficsource.medium, '^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'Content' then 'Display'
else
'(Other)'
end
as custom_channel_grouping,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and totals.visits = 1
group by
default_channel_grouping,
custom_channel_grouping
order by
sessions desc
The query above will probably generate some slight differences, even when the default channel grouping is not already adjusted in the Google Analytics user interface. You can check this in Admin > Channel Settings > Channel Grouping. Whenever you see System defined
just follow the default definition for this channel.
If the default channel grouping in your Google Analytics view is already customised for some channels, you'll have to add all additions on top of the default logic.
Here you are not limited by the building blocks that are already in the query: you can add any combination of conditions you want.
Example: add custom definition for Branded and Generic Paid Search
In the Google Analytics user interface there is a possibility to make a distinction between Generic and Branded Paid Search traffic. Although the query type
is not available in our BigQuery export, we can use some custom logic if the campaign names are set up in a consistent manner. In this example branded paid search always has the word brand
in the campaign title:
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
channelgrouping as default_channel_grouping,
case
when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
when trafficsource.medium = 'organic' then 'Organic Search'
when hits.social.hassocialsourcereferral = 'Yes' or regexp_contains(trafficsource.medium, '^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
when trafficsource.medium = 'email' then 'Email'
when trafficsource.medium = 'affiliate' then 'Affiliates'
when trafficsource.medium = 'referral' then 'Referral'
when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' and lower(trafficsource.campaign) like '%brand%' then 'Branded Paid Search'
when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' and lower(trafficsource.campaign) not like '%brand%' then 'Generic Paid Search'
when regexp_contains(trafficsource.medium, '^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
when regexp_contains(trafficsource.medium, '^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'Content' then 'Display'
else
'(Other)'
end
as custom_channel_grouping,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and totals.visits = 1
group by
default_channel_grouping,
custom_channel_grouping
order by
sessions desc
Now it's your turn!
I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.