Recommendation Engines for Beginners

Kaitlin Browne
4 min readFeb 18, 2021

A recommendation engine makes suggestions to a user based on the user’s history or based on the behavior of similar users. The user’s history is most easily cataloged when the user is rating their experience with a given item. If a user rates a movie five stars, we can very easily store and reference that rating. The behavior of similar users can be calculated by identifying common interests. A recommendation system is a powerful tool to help put exactly what the user wants in front of them.

Recommendation engines must collect and catalog data to be effective. There are several data points available directly from the user. Activity, ratings, reviews, comments, profile demographics, etc. Other data points implicitly collected include device type, clicks on links, location, date and time.

There are several types of recommendation systems, one of the most popular being a collaborative filtering method. This method makes recommendations based on other users with similar tastes. To calculate the similarity of users, items they’ve both rated are compared.

A famous use case is Netflix. Netflix splits users into thousands of ‘taste groups’, and keeps track of what users watch and when. The company has carefully tagged every piece of content in their library, accurately describing a range of data points. Using this data, machine learning algorithms then calculate the weight of certain metrics. The algorithms decide which is more important: the shows you watched last week or the shows you watched a few minutes of and abandoned, for example. These, and countless other data points, are weighed against one another. Once their significance is measured, another algorithm will use these values to categorize the viewer into a taste group. These taste groups are then recommended shows and movies of specific types. The suggested genres and how the rows are ordered are all changed for each individual viewer. Netflix uses both explicit and implicit data, counting ratings and things like minutes watched, in the calculations.

Netflix’s recommendation system uses a hybrid model featuring both collaborative filtering and content based filtering. This helps overcome ‘cold start’ issues, along with the classic technique of showing users the ‘most popular’ content to provoke initial engagement. Collaborative filtering is a popular way of selecting recommendations because it doesn’t require the meticulous content tagging that content filtering demands. Here we’ll implement a collaborative filtering algorithm to suggest movies based on a users’ current ratings. See it in action on repl.it

First we store the data points:

review_data = {'Jacob': {'The Dark Knight': 5.00,'The Lord of the Rings: Return of the King': 4.29,'Star Wars: The Last Jedi': 5.00,'Avengers: Endgame': 1.},'Emily': {'Black Panther': 4.89,'Toy Story 3': 4.93 ,'The Hunger Games: Catching Fire': 4.87,'The Dark Knight': 1.33,},//...many more user objects
}

We have 16 users who have rated 14 movies on a scale of one to five. Each user has rated a selection of up to seven movies with most rating around four titles.

How do we decide what to recommend? First we start by finding similarity between users.

How do we find similarity between users? We will check what movies they have rated in common, and compare their ratings.

function getCommonMovies(userA, userB){  let common = []  let titlesA = Object.keys(review_data[userA])  let titlesB = Object.keys(review_data[userB])  for (let title in titlesA){      if (titlesB.includes(titlesA[title])){          common.push(titlesA[title])        }  }  return common//we get back an array with the titles that they have both rated}

Now we want to get the reviews for each of those movies, from both of the critics.

Also note that tuples are not really a javascript feature, so we’re creating them here in a round-about way.

It’s common to see these recommendation engines written in python.

function getReviews(userA, userB){  let data = []  let common = getCommonMovies(userA, userB)  for (let movie in common){    let tuple = []    tuple.push(review_data[userA][common[movie]], review_data[userB]      [common[movie]])    data.push(tuple)  }  return data//now we have an array of tuples, so we have each users rating for the common movies}

Now we’re going to use the Euclidean Distance to figure out how close these ratings are. We are using the tuples as an X and Y coordinate, with X being one movie and Y being another.

function euclideanDistance(points){  let sumSquared = 0;  for (let point in points){    squaredDif = [(points[point][0] - points[point][1]) ** 2]    sumSquared += Number(squaredDif)  }  let distance = Math.sqrt(sumSquared)  return distance}

We convert our distance into a similarity by dividing 1 by the distance plus 1. This prevents us from ever dividing by 0.

function similarity(reviews){  return 1/(1+ euclideanDistance(reviews))  }function getUserSimilarity(userA, userB){  let reviews = getReviews(userA, userB)  return (similarity(reviews))}

Our final function is where we put everything together. Follow the comments to see what happens line by line.

This algorithm is based off of the one examined in “How to Build A Recommendation Engine In Python” on Udemy, and carefully coded in javascript by the author. You can check out the repl.it here.

--

--