Guide to model training: Part 5 - Reliable remarketing
Nathaniel Tjandra
Growth
TLDR
Got data? Build your own AI model for remarketing with machine learning!
Outline
Recap
Before we begin
What is remarketing?
Predicting success
Model training
Results
Recap
We finished all the processing of our data, so now we’ve made it to the final step of the model training series and will be training data. Based on our needs, we’ll continue with the remarketing campaign to get ready for the holiday rush! Tons of our customers will be on the lookout for holiday gifts and relevant sales, so now it’s up to us to connect them with their wishlist.
We know what gifts you want
Before we begin
For those following along the guide, we’re going to take the resulting data from when we scaled the numbers, categories, and imputed data. Then, we’ll be doing a merge on the data. Read our data preparation guide if you’re not familiar with joining and merging datasets.
What is remarketing?
A lot of developers, myself included, weren't familiar with the term remarketing. In fact, it’s often confused with or synonymously used with
retargeting
which is not
remarketing
. The core difference here is the audience. Retargeting looks for a new audience, while remarketing focuses on the old, or existing users. To define remarketing in the context of this guide, we’ll be looking at data of retired users of our product. By using the data to understand the previous users, we’ll focus our marketing campaign efforts to select individuals who will most likely come back, given a certain promo that meets their needs.
The most common form of day to day remarketing you observe but rarely think about are push notifications. Everyone has a phone and you’re often reminded to open an app you haven’t opened in a while. There are also cases where apps will entice you to come back with promotions like free gifts or credits for joining.
I’ll miss you. Come back!
In this guide, since we have a list of customer emails and email_ids, we’ll be focusing on the more traditional form of remarketing. Email campaigning is essentially the same idea, but with different mediums. We’ll be focusing on using remarketing via email to increase customer activity.
Predicting success
The holidays are coming up, and the campaign is almost here. Everyone will be wanting to purchase luxury items such as wine or chocolates as gifts! We’ll be using machine learning to create more interaction and engagement with customers, to boost already active customers and aim to gain back inactive customers that have taken a hiatus.
To begin, we’ll start by combining finalized information from prior parts and putting them together. For more details on how to join data, read our
guide on combining data. To simplify, I’ve prepared all the datasets from our previous parts into
remarketing_campaign.csv, which originate from these files,
categorical_campaign.csv,
numerical_campaign.csv,
encoder_campaign.csv,
impute_campaign.csv,and
datetime_campaign.csv. This contains everything from what a customer’s household is like, purchase history, and interactions with prior campaigns.
We’ll be offering sales on party items that make great gifts, such as wine and sweets for the holidays. Our campaign is a rerun of last year’s holiday sales. With an understanding of what’s in the campaign, we need to think about which portions of data influence a customer to action on an email campaign offering a sale on gifts. During the holidays, the sale of luxuries is hot, looking to sell wine and sweets. This is
MntWine
and
MntSweet
in our dataset.
Model training
With the data prepared, we can begin slicing them into portions for training and testing. These sets will be used to train data using itself and determine the metrics for model evaluation in the next part.
To start, we’ll identify the data we’ll want to use as our feature vector, aka
input
. Then we’ll determine what to look for as a result, label feature, or
output
. Here, we’ll use everything in our dataset, to try to predict whether they will act on another notification from us. This training set will contain the data, and confirm that the predictions will be correct.
The test set is data that will be tested with the output from training to determine how good a model is. For our model we’ll be using the
Logistic Regression
function from scikit-learn.
Results
When training a classification model, we take note of the metrics at the end—accuracy, precision, recall. To learn more about what these metrics mean, check out our
guide to accuracy, precision and recall. In our next series, we’ll explore what these and more metrics are, how they’re calculated, and what their values tell us about the dataset.
Is this model good or not? We’ll find out in the next series.
Start building for free
No need for a credit card to get started.
Trying out Mage to build ranking models won’t cost a cent.
No need for a credit card to get started. Trying out Mage to build ranking models won’t cost a cent.