top of page
Search

Extract Multiple User Journey Paths in BigQuery

  • Writer: Aayush Maggo
    Aayush Maggo
  • Apr 21, 2022
  • 2 min read

Updated: Aug 7, 2023

Designing a great user experience involves understanding how users move through your website or app. This helps us unlock user behavior patterns and find areas to improve.

In this article, I'll show you how to extract user journey paths in Google Analytics (GA) using BigQuery. We'll also add a running percentage to identify the most popular paths.




Note that while you can view similar flows within the GA interface (Audience < User Flow), it may be sampled for large datasets and give you fewer sessions for analysis. Additionally, the navigation summary reports in the Behavior Section have limitations.

The advantage of using BigQuery is the ability to analyze page and event flows together, beyond what is possible in Google Analytics.

Let's go through the steps using the publicly available Google Analytics data set in BigQuery - bigquery-public-data:google_analytics_sample.

Step 1: Creating a Common Table Expression (CTE) We'll use a CTE to create unique session IDs and extract the first five pages visited by users in each session.

 
WITH
  
first_five_pages AS ( --- CTE
  
SELECT
    
CONCAT(fullVisitorId,"-",CAST(visitstarttime AS string)) AS sessionId,
    
h.hitNumber,
    
h.page.pagePath AS landing_page,
    
LEAD(h.page.pagePath) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS second_page,LEAD(h.page.pagePath,2) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS third_page,LEAD(h.page.pagePath,3) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS fourth_page,LEAD(h.page.pagePath,4) OVER (PARTITION BY fullVisitorId, visitstarttime ORDER BY h.hitNumber) AS fifth_page,CONCAT(trafficsource.source," / ",trafficsource.medium) AS source_medium,FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,UNNEST(hits) AS hWHERE_table_suffix BETWEEN '20170101'AND '20170631'AND totals.visits = 1AND h.type="PAGE" )

Step 2: Adding Filters We'll create another table to filter GA traffic based on conditions we set. For example, we can filter for users who landed organically via Google.


 
FILTER AS (SELECT *FROM first_five_pagesWHERE hitNumber = 1AND REGEXP_CONTAINS(source_medium, 'google / organic')
)

Step 3: Extracting and Analyzing Data Finally, we'll pull all the page paths in the journey, count the total sessions, and calculate the percentage.


 
SELECT landing_page, second_page, third_page, fourth_page, fifth_page,
  
COUNT(sessionId) AS Sessions,
  
ROUND(COUNT(sessionId) / SUM(COUNT(sessionId)) OVER (),2) AS Percentage_of_SessionsFROM FILTERGROUP BY 1, 2, 3, 4, 5ORDER BY 6 DESCLIMIT  10

Once you run the query, you'll get a table with the page paths, session counts, and percentages. You can analyze the findings by asking questions like:

  • What are the popular landing pages?

  • How many users drop off after visiting the landing page?

  • What are the popular second pages in the journey?

  • Where are most of the users leaving from?

BigQuery allows you to analyze your website data at a granular level, providing valuable insights into user interactions and behavior. With GA4 offering free BigQuery integrations for all analytics accounts, it's a great time to take your analysis to the next level.

So, let's leverage the power of BigQuery and unlock meaningful insights from your website data!

Comments


Never Miss a Post. Subscribe Now!

Subscribe now to receive the latest tips, techniques, and insights straight to your inbox.

© 2023 by Aayush Maggo

  • Grey Twitter Icon
bottom of page