User dimensions & metrics (GA4)
How to query default and custom user dimensions and metrics from the GA4 BigQuery export, like (new) users, active users, user id, user pseudo id, user lifetime revenue and user properties key and values.
A lot of Google Analytics 4 dimensions and metrics can be derived straight from the tables without performing calculations on the data. With the help of unnest
and parse
for dates or timestamps we can do the trick.
Other dimensions and metrics that you might need are a bit harder to access. You will have to (re)calculate them, which can require some serious SQL-skills.
While designing the course Query GA4 Data In Google BigQuery I've learned a lot about best practices to calculate dimensions and metrics. To share this knowledge with you I will provide a combined query for default dimensions and metrics, and a single example query for every non-default dimension or metric.
As always, drop a line in the comments if you have any questions, feedback or suggestions related to this article.
Default user dimensions
- user_id
- user_pseudo_id
- is_active_user (available from
2023-07-17
) - user_first_touch_timestamp
- user_properties.key
- user_properties.value.string_value
- user_properties.value.set_timestamp_micros
- user_ltv.currency
Default user metrics
- user_properties.value.int_value
- user_properties.value.float_value
- user_properties.value.double_value
- user_ltv.revenue
-- comments
in the example query below for names and definitions and copy the part you need from the select
clause. Make sure that you also add any additional conditions (i.e. with
, from
, where
, group by
, having
and order by
) that are necessary to calculate the results correctly. E.g, the query below is ungrouped, so every row corresponds to an event and may contain duplicate rows.select
-- user_id (dimension | the user id set via the setUserId api)
user_id,
-- user_pseudo_id (dimension | the pseudonymous id (e.g., app instance id) for the user)
user_pseudo_id,
-- is_active_user (dimension | whether the user was active (true) or inactive (false) at any point in the calendar day)
is_active_user,
-- user_first_touch_timestamp (dimension | the time (in microseconds) at which the user first opened the app/website)
timestamp_micros(user_first_touch_timestamp) as user_first_touch_timestamp,
-- user_properties.key (dimension | the name of the user property | replace <insert key> with a parameter key or delete where clause to select all)
(select key from unnest(user_properties) where key = '<insert key>') as user_properties_key,
-- user_properties.value.string_value (dimension | the string value of the user property | replace <insert key> with a parameter key or delete where clause to select all)
(select value.string_value from unnest(user_properties) where key = '<insert key>') as user_string_value,
-- user_properties.value.int_value (metric | the integer value of the user property | replace <insert key> with a parameter key or delete where clause to select all)
(select value.int_value from unnest(user_properties) where key = '<insert key>') as user_int_value,
-- user_properties.value.float_value (metric | the float value of the user property | replace <insert key> with a parameter key or delete where clause to select all)
(select value.float_value from unnest(user_properties) where key = '<insert key>') as user_float_value,
-- user_properties.value.double_value (metric | the double value of the user property | replace <insert key> with a parameter key or delete where clause to select all)
(select value.double_value from unnest(user_properties) where key = '<insert key>') as user_double_value,
-- user_properties.value.set_timestamp_micros (dimension | the time (in microseconds) at which the user property was last set | replace <insert key> with a parameter key or delete where clause to select all)
timestamp_micros((select value.set_timestamp_micros from unnest(user_properties) where key = '<insert key>')) as user_set_timestamp_micros,
-- user_ltv.revenue (metric | the lifetime value (revenue) of the user)
user_ltv.revenue as user_ltv_revenue,
-- user_ltv.currency (dimension | the lifetime value (currency) of the user)
user_ltv.currency as user_ltv_currency
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
Calculated user dimensions
- user type
select
-- user type (dimension | either new visitor or returning visitor, indicating if the users are new or returning)
case
when (select value.int_value from unnest(event_params) where event_name = 'session_start' and key = 'ga_session_number') = 1 then 'new visitor'
when (select value.int_value from unnest(event_params) where event_name = 'session_start' and key = 'ga_session_number') > 1 then 'returning visitor'
else null end as user_type,
count(distinct user_pseudo_id) as users
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
group by
user_type
having
user_type is not null
- count of sessions
with prep as (
select
user_pseudo_id,
max((select value.int_value from unnest(event_params) where key = 'ga_session_number')) over (partition by user_pseudo_id rows between unbounded preceding and unbounded following) as max_session_number
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day)))
select
-- count of sessions (dimension | the session index for a user, each session from a unique user will get its own incremental index starting from 1 for the first session)
max_session_number as count_of_sessions,
count(distinct user_pseudo_id) as users
from
prep
group by
max_session_number
order by
max_session_number
Calculated user metrics
- total users, based on
user_pseudo_id
- active users, based on
user_pseudo_id
and an engagement threshold- users based on
user_id
For every query that involves users you have to ask yourself which user concept you want to apply. To not overcomplicate things I have a slight preference for total users, unless insight in engagement is key.
- total users
select
-- users (metric | the total number of users)
count(distinct user_pseudo_id) as users
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- active users
is_active_user
. If you started collecting data after this month, you can use this field to count active users, instead of the more complicated version in the query below. Use this line: count(distinct case when is_active_user is true then user_pseudo_id end) as active_users
.select
count(distinct case when (select value.int_value from unnest(event_params) where key = 'engagement_time_msec') > 0 or (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then user_pseudo_id else null end) as active_users
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- new users (based on total users)
select
-- new users (metric | the number of users who interacted with your site or launched your app for the first time)
count(distinct case when (select value.int_value from unnest(event_params) where key = 'ga_session_number') = 1 then user_pseudo_id else null end) as new_users
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- % new users (based on total users)
select
count(distinct case when (select value.int_value from unnest(event_params) where key = 'ga_session_number') = 1 then user_pseudo_id else null end) / count(distinct user_pseudo_id) as percentage_new_users
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- new sessions
select
count(distinct case when (select value.int_value from unnest(event_params) where key = 'ga_session_number') = 1 then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) else null end) as new_sessions
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- % new sessions
select
-- % new sessions (metric | the percentage of sessions by users who had never visited before)
count(distinct case when (select value.int_value from unnest(event_params) where key = 'ga_session_number') = 1 then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) else null end) / count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) as percentage_new_sessions
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- number of sessions per user (based on total users)
select
-- number of sessions per user (metric | the total number of sessions divided by the total number of users)
count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'))) / count(distinct user_pseudo_id) as number_of_sessions_per_user
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- event count per user (based on total users)
select
-- event count per user (metric | the number of times an individual event (change event_name 'page_view' to event that needs to be counted) was triggered divided by amount of users)
countif(event_name = 'page_view') / count(distinct user_pseudo_id) as event_count_per_user
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
- engaged sessions per user (based on total users)
select
count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end) / count(distinct user_pseudo_id) as engaged_sessions_per_user
from
-- change this to your google analytics 4 export location in bigquery
`ga4bigquery.analytics_250794857.events_*`
where
-- define static and/or dynamic start and end date
_table_suffix between '20240701'
and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))