Extract Multiple User Journey Paths in BigQuery
- Aayush Maggo
- Apr 21, 2022
- 2 min read
Updated: Aug 7, 2023
Designing a great user experience involves understanding how users move through your website or app. This helps us unlock user behavior patterns and find areas to improve.
In this article, I'll show you how to extract user journey paths in Google Analytics (GA) using BigQuery. We'll also add a running percentage to identify the most popular paths.

Note that while you can view similar flows within the GA interface (Audience < User Flow), it may be sampled for large datasets and give you fewer sessions for analysis. Additionally, the navigation summary reports in the Behavior Section have limitations.
The advantage of using BigQuery is the ability to analyze page and event flows together, beyond what is possible in Google Analytics.
Let's go through the steps using the publicly available Google Analytics data set in BigQuery - bigquery-public-data:google_analytics_sample.
Step 1: Creating a Common Table Expression (CTE) We'll use a CTE to create unique session IDs and extract the first five pages visited by users in each session.
WITH
first_five_pages AS ( --- CTE
SELECT
CONCAT(fullVisitorId,"-",CAST(visitstarttime AS string)) AS sessionId,
h.hitNumber,
h.page.pagePath AS landing_page,
LEAD(h.page.pagePath) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS second_page,LEAD(h.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS third_page,LEAD(h.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS fourth_page,LEAD(h.page.pagePath,4) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS fifth_page,CONCAT(trafficsource.source," / ",trafficsource.medium) AS source_medium,FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,UNNEST(hits) AS hWHERE_table_suffix BETWEEN '20170101'AND '20170631'AND totals.visits = 1AND h.type="PAGE" )
Step 2: Adding Filters We'll create another table to filter GA traffic based on conditions we set. For example, we can filter for users who landed organically via Google.
FILTER AS (SELECT *FROM first_five_pagesWHERE hitNumber = 1AND REGEXP_CONTAINS(source_medium, 'google / organic')
)
Step 3: Extracting and Analyzing Data Finally, we'll pull all the page paths in the journey, count the total sessions, and calculate the percentage.
SELECT landing_page, second_page, third_page, fourth_page, fifth_page,
COUNT(sessionId) AS Sessions,
ROUND(COUNT(sessionId) / SUM(COUNT(sessionId)) OVER (),2) AS Percentage_of_SessionsFROM FILTERGROUP BY 1, 2, 3, 4, 5ORDER BY 6 DESCLIMIT 10
Once you run the query, you'll get a table with the page paths, session counts, and percentages. You can analyze the findings by asking questions like:
What are the popular landing pages?
How many users drop off after visiting the landing page?
What are the popular second pages in the journey?
Where are most of the users leaving from?
BigQuery allows you to analyze your website data at a granular level, providing valuable insights into user interactions and behavior. With GA4 offering free BigQuery integrations for all analytics accounts, it's a great time to take your analysis to the next level.
So, let's leverage the power of BigQuery and unlock meaningful insights from your website data!
Comments