oIt is partly because of the success of many e-commerce businesses or applications such as Spotify that recommendation systems are a frequent topic of articles and discussions on websites devoted to machine learning.
To put it simply, recommendation is an attempt to look into the future, e.g. to guess which product will be purchased or viewed in the assumed time horizon. “Traditional” recommendation systems are addressed to end users (customers) and concern mostly products such as movies, music or other products offered by online stores.
In this article, I will present a recommendation system for a B2B sales team. That system is meant to support the sales process through a mechanism of suggesting products which should be offered to a specific company.
Recommendation algorithms are usually divided into three main groups:
The majority of popular recommendation methods are characterized by the forecast’s linear dependency on users’ attributes and behaviors. This stems from the will to maintain deployment simplicity on the one hand, and from performance requirements on the other. In our case, however, we do not require immediate response from the system, so we can apply algorithms which would be too time-consuming in a traditional recommendation system.
We can safely assume that when buying products, companies are guided by more objective criteria than a movie rental user who selects yet another movie for a Saturday evening. Those criteria may include certain permanent attributes, such as:
But things do not stop here. You can also take into account events related to the specifics of a given company or those which affect its business, e.g. winning a new contract, hiring employees, restructuring, technology exchange, purchase/deployment of a new IT system etc. However, we usually do not have access to most of that information. That is why the ability to forecast customer needs depends on the degree of correlation between data held by us and the customers’ purchase decisions.
With customers with a specific purchase history, we have important information about amounts they have spent on particular products in a given period. We assume that those amounts are correlated with the customer’s business needs and product prices.
If two similar products meet a customer need in the same manner, it is more beneficial for the seller to recommend the more profitable one. As a result, we recommend products which not only meet customer needs but will also bring us highest revenue/profit.
This method consists in training a regression model which forecasts how much the customer with particular attributes averagely spends on specific products. We then calculate that for all products in the catalog and recommend products characterized by the highest spending forecast.
The success of that method depends on high correlation between known customer attributes and particular products’ purchase volume. If such clear correlation does not exist, the regression model will have a very high prediction error.
In such case, classification may be a better option. Instead of forecasting the precise revenue from selling a particular product, you can forecast a specific value range. This usually leads to better matching the result to data, but it also decreases the usefulness of such forecast.
In practice, the approach presented above will not prove useful in many cases. In their purchase decisions, customers use purchase criteria which are hidden – we do not have access to those data. In addition, the dispersion of values is too high to apply regression or classification.
If we do not know which criteria the customer uses when shopping, it might be a good idea to check the purchases of customers with a similar purchase history. Since they have a similar purchase history, we can expect them to behave similarly also in the future.
To calculate similarity between customers, we create so called purchase vectors for them, in which particular coordinates correspond to particular products and adopt the values of 1 or 0, depending on whether the customer has purchased a given product or not. Instead of binary notation, you can also use the total number of purchased products or total/average amount spent on shopping. The choice of the type of data used in purchase vectors will have an impact on the recommendation system’s effectiveness and should be preceded by comparing different options.
With purchase vectors, you can calculate similarity between customers. The most frequent measure is Cosine similarity, where similarity between customers is the normalized scalar product of their purchase vectors and adopts the values from 1 for identical vectors to 0 for vectors which do not share a single coordinate.
Pearson correlation coefficient also yields good results – the greater the correlation between purchase vectors, the more similar the customers.
The measure of the degree in which we should recommend a specific product to a specific customer is the weighted average from that product’s purchase matrix, where the weight is the similarity between customers. If many similar customers have purchased that product, we will achieve a result approaching one; otherwise, the result will be lower.
A collaborative filtration system may be improved in two aspects: in terms of the manner of calculating similarity between customers and the manner of calculating the recommendation.
Strengthening the similarity between customers who have purchased many identical products usually yields good results.
Let us assume that customer “AB” purchased two products: A and B.
Customer “A” purchased only product A.
Customer “ABCD” purchased four products: A, B, C and D.
Cosine similarity between customers “AB” and “A” amounts to: 1 ÷ √2 = 0,707106781
Cosine similarity between customers “AB” and “ABCD” is the same: 2 ÷ √2 ÷ √4 = 0,707106781
It seems, however, that customers “AB” and “ABCD” are slightly more similar than customers “AB” and “A”. We should take account of such situations.
Products which are frequently purchased by large groups of customers are a weaker indication of similarity between customers than rarely purchased products. It is thus a good idea to introduce weights which are inversely proportional to the number of customers who have purchased a given product.
In some situations it makes sense to introduce diversified weights for products so that specific products have preferences resulting from the sales policy.
It might be a good idea to use industry knowledge and introduce default values for empty cells in the purchase matrix. For that purpose, you may use statistical analysis and, e.g., determine the distribution of the number of product purchases depending on the customer’s industry. With that distribution, you can fill in the gaps in the purchase matrix, which should improve the recommendation effectiveness.
You can also fill in the empty cells without industry knowledge, e.g. using appropriate mean values (predictive mean matching) or using a classifier trained on data from the purchase matrix (imputation-boosted collaborative filtering).
Another idea is to enter higher values for real data than for populated empty cells.
Weights calculated in a standard way have a quasi-linear nature. In many cases, increasing the weights’ contrast by additionally decreasing low and increasing high weights may yield good results.
Traditional methods of measuring recommendation accuracy are based on the difference between product assessment prediction and the actual product assessment given by the customer. They include: Root Mean Squared Error (RMSE) or Mean Absolute Error (MAE).
However, in our case we do not have a product assessment but only zero-one information about whether the product has been purchased. What matters is not whether we have predicted product assessment correctly but whether the customer followed the recommendation and purchased the product. This is best done by using indicators which are usually used for classification: precision and recall. Precision specifies what percentage of recommended products has actually been purchased, while recall tells us what percentage of purchased products has been recommended by us.
Calculating precision and recall is easy if a given recommendation system has already been used for some time. Unfortunately, at the stage of designing it those indicators, calculated based on historical sales data, are encumbered by huge and unpredictable error, as we only have data resulting from sales processes which have not used our recommendation system. The fact that the recommended product has not been purchased does not yet prove that such recommendation would have been ineffective had it actually taken place. We do not know how customers’ purchases would look like if sales used that or another recommendation system.
For these reasons, it is impossible to precisely assess the quality of a recommendation system before it is implemented.
When examining the potential effectiveness of a recommendation system, you also have to assume some time horizon. A reasonable compromise is necessary here, because the longer the time horizon, the better the results – over an endless period of time there is a great chance that the customer will buy practically everything we have to offer.
The recommendation system which we have built for a training company has a hybrid structure. Its core is a collaborative filtration algorithm, complemented by a series of improvements which use selected customer and product attributes.
The above article is a point of departure for further publications describing this case. I invite you to follow our blog.
If you are interested in the practical application of machine learning in business, I also invite you to read an article in which we show how to use an artificial neural network to classify goods.
Author: Mariusz Surma, Senior Analyst, Software House ASC