Guide to model training: Part 5 - Reliable remarketing

December 7, 2021 · 6 minute read

Nathaniel Tjandra



Got data? Build your own AI model for remarketing with machine learning!


  • Recap

  • Before we begin

  • What is remarketing?

  • Predicting success

  • Model training

  • Results


We finished all the processing of our data, so now we’ve made it to the final step of the model training series and will be training data. Based on our needs, we’ll continue with the remarketing campaign to get ready for the holiday rush! Tons of our customers will be on the lookout for holiday gifts and relevant sales, so now it’s up to us to connect them with their wishlist.

We know what gifts you want




Before we begin

For those following along the guide, we’re going to take the resulting data from when we scaled the numbers, categories, and imputed data. Then, we’ll be doing a merge on the data. Read our data preparation guide if you’re not familiar with joining and merging datasets.

What is remarketing?

A lot of developers, myself included, weren't familiar with the term remarketing. In fact, it’s often confused with or synonymously used with 


which is not 


. The core difference here is the audience. Retargeting looks for a new audience, while remarketing focuses on the old, or existing users. To define remarketing in the context of this guide, we’ll be looking at data of retired users of our product. By using the data to understand the previous users, we’ll focus our marketing campaign efforts to select individuals who will most likely come back, given a certain promo that meets their needs.

The most common form of day to day remarketing you observe but rarely think about are push notifications. Everyone has a phone and you’re often reminded to open an app you haven’t opened in a while. There are also cases where apps will entice you to come back with promotions like free gifts or credits for joining.

I’ll miss you. Come back!




In this guide, since we have a list of customer emails and email_ids, we’ll be focusing on the more traditional form of remarketing. Email campaigning is essentially the same idea, but with different mediums. We’ll be focusing on using remarketing via email to increase customer activity.

Predicting success

The holidays are coming up, and the campaign is almost here. Everyone will be wanting to purchase luxury items such as wine or chocolates as gifts! We’ll be using machine learning to create more interaction and engagement with customers, to boost already active customers and aim to gain back inactive customers that have taken a hiatus.

To begin, we’ll start by combining finalized information from prior parts and putting them together. For more details on how to join data, read our 

guide on combining data

. To simplify, I’ve prepared all the datasets from our previous parts into 


, which originate from these files, 







. This contains everything from what a customer’s household is like, purchase history, and interactions with prior campaigns.

We’ll be offering sales on party items that make great gifts, such as wine and sweets for the holidays. Our campaign is  a rerun of last year’s holiday sales. With an understanding of what’s in the campaign, we need to think about which portions of data influence a customer to action on an email campaign offering a sale on gifts. During the holidays, the sale of luxuries is hot, looking to sell wine and sweets. This is 




in our dataset. 

Model training

With the data prepared, we can begin slicing them into portions for training and testing. These sets will be used to train data using itself and determine the metrics for model evaluation in the next part.

To start, we’ll identify the data we’ll want to use as our feature vector, aka 


. Then we’ll determine what to look for as a result, label feature, or 


. Here, we’ll use everything in our dataset, to try to predict whether they will act on another notification from us. This training set will contain the data, and confirm that the predictions will be correct.

The test set is data that will be tested with the output from training to determine how good a model is. For our model we’ll be using the 

Logistic Regression

function from scikit-learn.


When training a classification model, we take note of the metrics at the end—accuracy, precision, recall. To learn more about what these metrics mean, check out our 

guide to accuracy, precision and recall

. In our next series, we’ll explore what these and more metrics are, how they’re calculated, and what their values tell us about the dataset.

Is this model good or not? We’ll find out in the next series.

Start building for free

No need for a credit card to get started.
Trying out Mage to build ranking models won’t cost a cent.

No need for a credit card to get started. Trying out Mage to build ranking models won’t cost a cent.

 2024 Mage Technologies, Inc.
 2024 Mage Technologies, Inc.