Tutorial: How to replicate the 'Behavior | Site content | All pages' report
In this tutorial I will show you - step by step - how to replicate the Google Analytics All pages report in BigQuery.
In this tutorial I will show you - step by step - how to replicate the Google Analytics All pages report in BigQuery.
All pages report
In the Behaviour | Site content | All pages report you'll find data about the performance of your site content, segmented by page
and page title
.
Preparations
In this article we will use the Google Analytics Sample dataset for BigQuery, which contains analytics data from the Google Merchandise Store. I recommend you use your own data set instead.
Now, let's take a look at the primary dimensions and measures we need to replicate this report. As a reference I'll use the definitions in the Google Analytics user interface.
Primary dimensions:
- page
- page title
Measures
- pageviews
- unique pageviews
- avg. time on page
- entrances
- bounce rate
- exit rate
- page value
Basic query
We start with the framework of our query. First let's define a time frame. In this case I'll take a static range with a start and end date. If you need a dynamic date range, read this.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
Next, add a source for our data. To be able to query multiple day tables at once we use a wildcard *
in the table name.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions
The query above will return an error. To make the wildcard work we need to add a _table_suffix
and refer to the tables we want to query, using a cross join
(in short:,
) after our sessions table.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
As we are replicating the All pages report, we have to unnest our session data so that we have access to the hit level data (again we use a cross join
).
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
Now BigQuery will look at all hits. But as we only need the hits that contain pageviews, we can add a condition to the filter accordingly.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
Finally, we need to add another condition. We only want to count visits that contain interaction: totals.visits = 1
. By default and opposing to the Google Analytics user interface, the BigQuery export tables contains all sessions, including the ones without interactions.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
and totals.visits = 1
Add dimensions
Next, we add the dimensions we need to replicate this report. As we can see in the BigQuery export schema we need:
hits.page.pagepath
(page)hits.page.pagetitle
(page title)
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
and totals.visits = 1
Add measure: pageviews
Pageviews is the total number of pages viewed. Repeated views of a single page are counted.
To count the pageviews for our dimensions we just have to count all rows and group them by our dimensions. Remember, we can only use this approach in combination with the hits.type = 'PAGE'
condition.
By default the All pages report is sorted on pageviews (descending). We add that too.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
and totals.visits = 1
group by
page,
page_title
order by
pageviews desc
Add measure: unique pageviews
Unique Pageviews is the number of sessions during which the specified page was viewed at least once. A unique pageview is counted for each page URL + page title combination.
This last sentence from the definition is the tricky part here. There are multiple ways to calculate this measure, depending on the dimensions you choose to include in your report.
- if you have page and page title as dimensions, you can just use a count of sessions:
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews
- if you only have the dimension page in your report, you can use the same code, but it will be a quick and dirty calculation (as it will only be counted for each page url, and not for a combination with page title)
- if you only have the dimension page in your report and strictly follow Google Analytics' definition, then you'll need a subquery (
up
) to achieve this and join the result (unique pageviews) in with aleft join
:
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date),
up as (
select
page,
page_title,
sum(unique_pageviews) as unique_pageviews
from (
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
concat(hits.page.pagepath,hits.page.pagetitle) as page_concat,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
page_title,
page_concat)
group by
page,
page_title)
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
up.unique_pageviews
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join up on hits.page.pagepath = up.page
and hits.page.pagetitle = up.page_title
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
and totals.visits = 1
group by
page,
page_title,
up.unique_pageviews
order by
pageviews desc
In this tutorial we use both dimensions page and page value, so we can proceed without the subquery and count sessions as unique pageviews.
with date_range as (
select
'20160801' as start_date,
'20160802' as end_date)
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
where
_table_suffix between date_range.start_date and date_range.end_date
and hits.type = 'PAGE'
and totals.visits = 1
group by
page,
page_title
order by
pageviews desc
Add measure: average time on page
The average amount of time users spent viewing a specified page or screen, or set of pages or screens.
Next up, average time on page. For this measure we cannot escape to add a subquery, as it is quite complicated.
Again, we start with the building blocks of our subquery: we're going in deep here: for every visit(or) we look at the timestamp of every pageview and identify the last pageview in the session and for every page we set the timestamp of the next pageview.
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1
Next step: calculate the time on page for each page for each session.
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1)
Then we group pageviews, exits and total time on page by page and page title.
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle
Finally we calculate the average time on page, based on the total time on page and the amount of pageviews minus the exits.
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle)
Now we are ready to insert this subquery into our main query. We do this by wrapping it in a with
subquery and inserting it by a left join
on page and page title. This will result in quite a large query.
with date_range as (
select
'20160801' as start_date,
'20160801' as end_date),
avg_time as (
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle))
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
avg_time_on_page
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join avg_time on hits.page.pagepath = avg_time.page
and hits.page.pagetitle = avg_time.pagetitle
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
page_title,
avg_time_on_page
order by
pageviews desc
Add measure: entrances
Entrances is the number of times visitors entered your site through a specified page or set of pages.
The next few measures are relatively easy. Let's start with adding entrances. There is a column on hit level that indicates whether a hit is an entrance, we only have to count the cases.
with date_range as (
select
'20160801' as start_date,
'20160801' as end_date),
avg_time as (
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle))
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
avg_time_on_page,
countif(hits.isentrance = true) as entrances
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join avg_time on hits.page.pagepath = avg_time.page
and hits.page.pagetitle = avg_time.pagetitle
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
page_title,
avg_time_on_page
order by
pageviews desc
Add measure: bounce rate
The percentage of single-page sessions in which there was no interaction with the page. A bounced session has a duration of 0 seconds.
Luckily, we have data on the amount of bounces, so we are able to calculate the bounce rate by dividing them by the amount of sessions that included a certain page.
with date_range as (
select
'20160801' as start_date,
'20160801' as end_date),
avg_time as (
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle))
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
avg_time_on_page,
countif(hits.isentrance = true) as entrances,
countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join avg_time on hits.page.pagepath = avg_time.page
and hits.page.pagetitle = avg_time.pagetitle
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
page_title,
avg_time_on_page
order by
pageviews desc
Add measure: exit rate
%Exit is (number of exits) / (number of pageviews) for the page or set of pages. It indicates how often users exit from that page or set of pages when they view the page(s).
Same as with entrances: we check if a hit is an exit. If yes, then we divide the total amount of exits per page by the amount of sessions for that page.
with date_range as (
select
'20160801' as start_date,
'20160801' as end_date),
avg_time as (
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle))
select
hits.page.pagepath as page,
hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
avg_time_on_page,
countif(hits.isentrance = true) as entrances,
countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate,
countif(hits.isexit = true) / count(*) as exit_rate
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join avg_time on hits.page.pagepath = avg_time.page
and hits.page.pagetitle = avg_time.pagetitle
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
page_title,
avg_time_on_page
order by
pageviews desc
Add measure: page value
The average value of this page or set of pages. Page Value is ((Transaction Revenue + Total Goal Value) divided by Unique Pageviews for the page or set of pages)).
Our last measure from the All pages report is a bit less well known. In the documentation Google is more detailed than the description in the user interface:
Page Value is the average value for a page that a user visited before landing on the goal page or completing an Ecommerce transaction (or both).
As this measure is custom for every Google Analytics user, it is hard to include this in a standardised way. When I stumble upon a good use case and example data I may write a separate article about it.
Final query
with date_range as (
select
'20160801' as start_date,
'20160801' as end_date),
avg_time as (
select
pagepath as page,
pagetitle,
case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
select
pagepath,
pagetitle,
count(*) as pageviews,
countif(isexit is not null) as exits,
sum(time_on_page) as total_time_on_page
from (
select
fullvisitorid,
visitstarttime,
pagepath,
pagetitle,
hit_time,
type,
isexit,
case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
select
fullvisitorid,
visitstarttime,
hits.page.pagepath,
hits.page.pagetitle,
hits.time / 1000 as hit_time,
hits.type,
hits.isexit,
max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
unnest(hits) as hits,
date_range
where
_table_suffix between start_date and end_date
and hits.type = 'PAGE'
and totals.visits = 1))
group by
pagepath,
pagetitle))
select
hits.page.pagepath as page,
-- hits.page.pagetitle as page_title,
count(*) as pageviews,
count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
avg_time_on_page,
countif(hits.isentrance = true) as entrances,
countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate,
countif(hits.isexit = true) / count(*) as exit_rate
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
unnest(hits) as hits,
date_range
left join avg_time on hits.page.pagepath = avg_time.page
and hits.page.pagetitle = avg_time.pagetitle
where
_table_suffix between start_date and end_date
and totals.visits = 1
and hits.type = 'PAGE'
group by
page,
avg_time_on_page
-- ,page_title
order by
pageviews desc
Now it's your turn!
I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.