Tutorial: How to query multiple 'ga_session' tables with '_table_suffix'
Every day of Google Analytics data in BigQuery is stored in a separate table. This is how to query multiple days at once.
Every day of Google Analytics data in BigQuery is stored in a separate table. If you only need data from one day (let's say August 1st, 2016) the from
clause in your query will look something like this:
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_20160801
But in most cases you will need to query a larger period of time. Enter _table_suffix
. More details on the general usage here, but to be able to query multiple tables of Google Analytics data you only need to look at the example queries I present you here.
Note that you can use static dates, dynamic dates for a rolling period (the last 90 days) or combine static and dynamic dates. It is also possible to include the intraday table in your query.
All tables in data set
This is the easiest one. Just use the asterisk as a wildcard to match any table in the data set that starts with ga_sessions_
. Note: this will also include the intraday table (more on this below).
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
Static date range
When you perform an analysis on a static data range you should use fixed start and end dates. In this example we select August 1st 2016 to August 1st 2017.
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
_table_suffix between '20160801' and '20170801'
Dynamic date range
In this example we select a rolling last 30 day period: today minus 30 days until and including yesterday.
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
_table_suffix between format_date('%Y%m%d',date_sub(current_date(), interval 30 day))
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
Combination of static and dynamic dates
We know we have 366 day tables in our sample data set, so we could use a fixed end date here (20170801
), but usually I prefer a combination of a fixed start date and a dynamic end date (in this case: today minus one).
If new data is added to our data set it will automatically included in our query. In this example we select the data range August 1st 2016 up to and including yesterday. BigQuery will figure out if the tables actually exist and will only return data for tables that match the conditions and do exist.
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
_table_suffix between '20160801'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
Intraday tables
For every Google Analytics view that is exported to BigQuery, a ga_sessions_intraday_
table will be exported multiple times a day as well. Notice the different format for historical data (yesterday and beyond):
ga_sessions_yyyymmdd
versus intraday:
ga_sessions_intraday_yyyymmdd
The easiest way to include intraday data in your query as well as historical data is use the wildcard in combination with a wider _table_suffix
filter. With this condition it will include any table in the data set that starts with ga_sessions_
and contains a date with the format yyyymmdd
.
select
*
from
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
where
regexp_extract(_table_suffix, r'[0-9]+') between format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
and format_date('%Y%m%d', current_date())
Note: make sure your data set doesnβt contain any other tables with a title that starts with ga_sessions_
!