How to query realtime tables and views (and combine it with daily tables)
How to query a ga_realtime_sessions_ streaming data table and combine it with daily tables. Some example queries to get started.
If you donβt see an intraday table, but realtime tables and view, streaming export is enabled for your Google Analytics view. To query this data and join this with historical data from the ga_sessions_
tables, another approach is needed.
For each day, streaming export creates 1 new table and 1 (BigQuery) view of that table: Table: ga_realtime_sessions_YYYYMMDD is an internal staging table that includes all records of sessions for all activity that took place during the day. Data is exported continuously approximately every 15 minutes. Within this table are multiple records of a session when the session spans multiple export operations. The ga_realtime_sessions_YYYYMMDD tables should not be used (and are not supported by Google Analytics technical support) for queries. Queries on these tables may yield unexpected results as they may contain duplicate records of some sessions. Query the ga_realtime_sessions_view_YYYYMMDD view instead. View: ga_realtime_sessions_view_YYYYMMDD sits on top of the exported tables and is there to deduplicate multiple records of repeated sessions that exist across export boundaries. Query this table for deduplicated streaming data. (source)
At the time of writing this daily generated realtime view is only queryable with Legacy SQL. Any Standard SQL query will result in this error:
To be able to use Standard SQL, we have to create our own realtime view (source). Change the project and dataset in the following query and run it:
-- this query is not valid as there is no realtime data available in the sample data set
create view `project.dataset.ga_realtime_view` as
select
parse_date('%y%m%d',_table_suffix) as suffix,
visitkey,
array_agg(
(select as struct t.* except (visitkey))
order by exporttimeusec desc limit 1)[offset(0)].*
from
`project.dataset.ga_realtime_sessions_20*` as t
group by
suffix,
visitkey
Now you have a deduplicated realtime view called ga_realtime_view
. Next, you can query this view with Standard SQL:
-- this query is not valid as there is no realtime data available in the sample data set
select
date,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`project.dataset.ga_realtime_view`
where
suffix = current_date()
and totals.visits = 1
group by
date
Apart from this workaround, there is another thing about realtime data: not all data fields are available in the realtime view, such as ads data and the Default Channel Grouping dimension.
In the next example query we will combine data from our realtime view with historical data from the ga_sessions_
tables and deal with the absence of the Default Channel Grouping in the realtime view as a bonus (also see this tutorial). To do this weβll use union all
.
-- this query is not valid as there is no realtime data available in the sample data set
select
date,
channelgrouping as default_channel_grouping,
sessions
from (
select
date,
case
when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
when trafficsource.medium = 'organic' then 'Organic Search'
when hits.social.hassocialsourcereferral = 'yes' or regexp_contains(trafficsource.medium,'^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
when trafficsource.medium = 'email' then 'Email'
when trafficsource.medium = 'affiliate' then 'Affiliates'
when trafficsource.medium = 'referral' then 'Referral'
when regexp_contains(trafficsource.medium,'^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'content' then 'Paid Search'
when regexp_contains(trafficsource.medium,' ^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
when regexp_contains(trafficsource.medium,'^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'content' then 'Display'
else
'(Other)'
end
as channelgrouping,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`project.dataset.ga_realtime_view`,
unnest(hits) as hits
where
suffix = current_date()
and totals.visits = 1
group by
date,
default_channel_grouping
union all
select
date,
channelgrouping,
count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
_table_suffix between '20160801'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
and totals.visits = 1
group by
date,
channelgrouping)
order by
date desc