Blog

Why, when and how to use a first-touch attribution model

Or: this blog post has made $0

Attribution hero

This blog post has made $0 (as of 7/17/23).

Depending on how long ago it was published, that figure should serve as either commentary on the unpredictable futility of content marketing, or as an incredible proof point of the success of content marketing. Hopefully the latter.

Because this took a long time to write! And even though I mostly wrote it because I hope that you will read it, enjoy it, and learn something valuable from it…

I also sort of wrote it because I hope that after doing all that, you might try out Hex, and maybe some day even start paying us some money.

But without the ability to calculate exactly how many people paid for Hex as a result of this blog, we cannot answer the two most important questions to ask of any marketing activity: “Was it worth it?” and “What should we do differently next time?”

This blog post details the efficient way we built a first-touch marketing attribution model at Hex, why we decided to do it this way, and how you can build your own.

What even is marketing attribution?

Marketing attribution is the process of assigning credit to the activities that drive a customer to convert. There are lots of tools and services you can purchase to do marketing attribution, but if you want the most control over the solution, developing one on your own with SQL is probably the right approach.

There are many different ways to do marketing attribution, but most of them fall into two different categories; single touch and multi touch attribution. Within those two categories, there are many different ways to actually attribute credit in your model.

Here are a few common ways:

  • First-touch attribution - 100% of the attribution goes to the first touch

  • Last-touch attribution - 100% of the attribution goes to the last touch

  • Linear attribution - The attribution is split evenly between all touches

  • Weighted multi-touch attribution - The attribution is split between all the touches, but each touch is weighted differently.

And no, it’s not fake. Although I suppose we did kind of just make it up.

How we landed on first-touch attribution

Attribution is a highly debated topic, filled with many spicy takes. Often times building an attribution model can turn into a huge time sink, so as an analyst it is very important to prioritize speed-to-value with your resources.

If you are building your own attribution model, the method of attribution you choose will be dictated by your current data ecosystem. One of the foundational pieces to most of those attribution models is something called web sessions.

  • Web sessions represent a period in time, that a user interacts with your web site or app.

  • They are typically defined by a series of page views or other actions.

At Hex, we had not developed a model that turned our web traffic into sessions.

That brought us to an important inflection point in the process. We needed to decide if we were going to shift gears and build a sessions model, or find a way to continue to move quickly to provide value.

So we decided to go with a first-touch model. It aligned with our top-of-funnel focus, and allowed us to quickly provide some value and insight into where our conversions are coming.

Okay, let’s build that model!

1. Categorize traffic into channels

Our site traffic comes from many sources, so it is important to group traffic into categories. This is done by using a combination of referrer information and utm attributes to determine these channels.

This article has a great overview of the different types of channels https://support.similarweb.com/hc/en-us/articles/360000807449-Marketing-Channels

You will likely end up doing something like this in your ETL process for each of channel:

when utm_medium in ('social', 'social media')
	or (utm_medium is null
		and (contains(referrer_host, 't.co')
			or contains(referrer_host, 'twitter')
		  or contains(referrer_host, 'linkedin')
		  or contains(referrer_host, 'facebook')
		 ) 
	)
then 'organic social'

2. Match users to their traffic

Once our traffic is categorized into channels, we need to be able to match users to their pre-signup visits. This process is commonly referred to as user stitching, identity stitching or member resolution. I will just refer to it as user stitching.

User stitching is the process of enriching your web traffic or user actions with user data once you know the identity of that user.

Most modern companies have a tool they use for event tracking. Those tools typically issue a UUID for each unique visitor. For converted users, we have our own internal UUID as well. Once a conversion event happens, we can map those UUIDs to each other.

We turned this into a model that can be consumed by our web page models, and we can then use this pairing to match a user to their traffic that predates their signups

select distinct
  anonymous_id -- id from your event tracking tool
  , hex_user_id
from events

anonymous_id

hex_user_id

3e4d9a53-e195-4814-bba2-c9a7ba46c893

c777d892-29cf-4833-9e9d-5d2af1c9f1fa

9ec5e004-04cc-4ef7-88bd-adc488ad68e0

a7869801-fb19-4278-a6f1-b4639d909100

7e16066f-996e-49b6-bf18-8b6c18363b01

a8432a43-f49d-4176-af8a-b044ec04e3de

In the SQL that builds our web page models, we can then join in this mapping model to create a new field that we can later use in the attribution model.

, coalesce(user_mapping.hex_user_id, page_view.anonymous_id) as individual_identifier

We have now created this field called individual_identifier on all of our page views that will be a users Hex user id if we have matched them with their traffic. Once you have done this you have laid the ground work for your first-touch attribution, and all that’s left is driving off into the sunset

3. Create the final model

In the Hex product, multiple users can be in the same workspace, but for this use case we want to limit our analysis to just users who created a new workspace. We can do this by getting the first user in each workspace.

with workspace_creating_user as (
	select 
	  user_id
		, user_created_at
		, workspace_id
	from users
	qualify row_number() over(partition by workspace_id asc) = 1
)

Then, we can do something very similar to get the first page view for a user,

first_touch_page_view as (
	select 
		individual_identifier
		, page_viewed_at
		, cat_channel_group
  from page_views
	qualify row_number() over(partition by individual_identifier asc) = 1
)

Finally we can join these two together, and that will give us a first-touch attribution model that we can use to quantify the most impactful sources of our customers

select 
	user_id
	, user_created_at
	, workspace_id
	, page_viewed_at as first_touch_at
	, cat_channel_group as first_touch_channel
from workspace_creating_user
join first_touch_page_view
	on workspace_creating_user.user_id 
		 = first_touch_page_view.individual_identifier

user_id

user_created_at

workspace_id

first_touch_at

first_touch_channel

3e4d9a53-e195-4814-bba2-c9a7ba46c893

2023-06-06 17:11:32

37e7fd8d-fb27-48e6-b4b3-68fe5c4d02bb

2023-05-31 17:11:32

Paid Search

e07ba14c-45d9-4b91-a489-061376307cd3

2023-06-05 6:18:38

6c5ae5dd-02c9-49a8-9fcf-e8b6ce57c026

2023-05-30 6:18:38

Paid Search

7e16066f-996e-49b6-bf18-8b6c18363b01

2023-06-07 13:59:21

a7869801-fb19-4278-a6f1-b4639d909100

2023-06-01 13:59:21

Organic Search

c777d892-29cf-4833-9e9d-5d2af1c9f1fa

2023-06-08 17:41:42

242e6b97-962f-485a-bfcb-f4bcf980dba1

2023-06-02 17:41:42

Direct Traffic

9ec5e004-04cc-4ef7-88bd-adc488ad68e0

2023-06-09 3:01:59

a8432a43-f49d-4176-af8a-b044ec04e3de

2023-06-03 3:01:59

Paid Social

Now it is time to put your new model to use!

In our case, we wanted to know which channels most of our signups were coming from, and we also wanted to set up some reporting to monitor that over time. The quickest way to get an idea of this is by doing something like signups over time by first-touch channel.

Another metric that many people measure is Conversion to Signup by these different channels. If you are sinking money into one channel, but it is not resulting in many conversions, resources may need to be reallocated to target users who typically come in through a different flow

There can be a lot of nuance in conversion rates that people don’t always consider. I could probably write an entire article about that. But for now some things to keep top of mind are the volumes of your denominators and time to conversion.

One simple rate to look at is a forward looking conversion rate. Something like this:

When doing forward looking conversions, I like to time bound that story. I do this typically by adding some context like a “Conversion Rate in 30 Days” line. You can still see the all time conversion rate, but conversions that have really long tails are hard to predict and action against. So the 30 day time bound serves as a leading indicator of the trend.

Backwards looking conversion rates are another way to do conversion, but I will save that for another day.

Final thoughts

Building a first-touch attribution model is not something an analyst would likely consider their magnum opus, but it may be what best serves your stakeholders in the interim until you can build a more sophisticated model.

To recap:

  • If you are asked to build a marketing attribution model, you first have to decide which type of model you can build with the resources you have.

  • If you do settle on first-touch attribution, remember this is a huge leap from nothing to something very actionable.

  • Remember; you are the one closest to the data, and it is your responsibility to deliver the best speed-to-value solution you can with the resources you have.

This is exactly the kind of analysis that's delightful to do in Hex, with our flexible and polyglot environment for working with SQL and Python. So what are you waiting for? Click "Get Started" to try it out (for free!) and maybe... just maybe... help change the subtitle of this post.