Earlier this year our non-profit ran a plant a tree campaign. We had an awesome response. We made two different campaign pages to see which was more effective in converting the website visit to a donation. We wanted to make the A/B analysis real-time in order to improve the campaign descriptions to drive more donations.
The web session data was streamed to Snowflake. Note: web visitor is assigned a Session ID.
The data looks as following (PII fields are redacted):
We had two campaign pages designed for the A/B analysis. The conversion success criteria was
VISIT -> CAMPAIGN_PAGE -> DONATION_PAGE.
The end goal of the A/B analysis was to see which CAMPAIGN page performed better i.e. plant_a_tree_campaign_a.html vs. plant_a_tree_campaign_b.html.
While we usually perform these types of Analysis using Knime, we wanted to see if we can do this in Snowflake in real-time. For this purpose I used the
MATCH_RECOGNIZE query in Snowflake.
Here is the query we ended up using. It is fairly easy to understand and easily extendible for other use cases with more than 3 conversion funnel steps.
select session_id, donated from webstream match_recognize( partition by session_id order by visit_timestamp measures iff(count(*)=3, TRUE, FALSE) as donated ONE ROW PER MATCH PATTERN (VISIT READ DONATE) DEFINE VISIT as iff(URI='index.html', TRUE, FALSE), READ as iff(URI='plant_a_tree_campaign_a.html', TRUE, FALSE), DONATE as iff(URI='donate.html', TRUE, FALSE) );
MATCH_RECOGNIZE Query Structure
- We partitioned the web data by the SESSION_ID, and ORDERed it by the visit_timestamp
- We defined the VISIT as the value of index.html in URI.
- READ as the value of the campaign page. plant_a_tree_campaign_a.html OR plant_a_tree_campaign_b.html
- we defined DONATE as the visit to “donate.html”
- Finally we wanted to match the PATTERN VISIT -> READ (campaign page) -> DONATE
As you can see the query explains itself.
Since we were comparing plant_a_tree_campaign_a.html vs. plant_a_tree_campaign_b.html, we just had to replace that in the READ definition:
select session_id, donated from webstream match_recognize( partition by session_id order by visit_timestamp measures iff(count(*)=3, TRUE, FALSE) as donated ONE ROW PER MATCH PATTERN (VISIT READ DONATE) DEFINE VISIT as iff(URI='index.html', TRUE, FALSE), READ as iff(URI='plant_a_tree_campaign_b.html', TRUE, FALSE), DONATE as iff(URI='donate.html', TRUE, FALSE) );
Now ran these queries on 2 mins worth of streaming data.
We see that we got 4 donations in 2 mins for 5 visitors that saw plant_a_tree_campaign_a.html
whereas we only got 1 donation in 2 mins for 5 visitors that saw plant_a_tree_campaign_b.html
We were able to determine that plant_a_tree_campaign_b.html was not performing that well, and updated it accordingly, and kept performing the A/B analysis on the streaming data.
MATCH_RECOGNIZE was instrumental in performing the real-time A/B analysis of the campaign descriptions and improving them as needed.