Automobile Options Association Rule Mining (Data Mining)

In this exercise, I analyze the purchases of automobile options and infer association rules to provide better recommendations for future purchases.

Association Rule Mining is a rule based machine learning method that looks to uncover associations and/or correlations within the available data. This type of analysis is also sometimes called affinity analysis or market basket analysis.

Think of when you purchase something at Amazon and below the item, there is another window that says something like: customer who bought this item, also bought… (frequently bought together things). This is very simply Association Rule Mining


The data from the client included information regarding automobile options within various categories including Engine, Warranty, Wheels, Sound System and Other options. For examples, within the Engine category, customers could select a slower engine, a faster engine and the fastest engine. The client was mainly interested in the following:

What are the winning combinations of those options that I can offer to clients and can guarantee their satisfaction?

Answer: This question was answered in the visualization below!
Please note that there are further explanations below. I also included important information about association rule mining and what the various outputs mean.


Association Rule Mining Concepts

What’s important to understand is that we are not exactly looking to classify something, we are looking for potential relationships. We are trying to predict which purchases are most likely to be purchased together. This analysis can allow our business owner or manufacturer to predict demand and meet those demands faster. 

Association Rule Mining is based on previous data and follows a simple If-Then statement logic, except in slightly more sophisticated nomenclature.

  • Antecedent: The If  part of the statement.
  • Consequent: The Then part of the statement. 
    • For example; if you are examining a laptop on Amazon (antecedent), you are likely to see a laptop case (consequent) in the recommendation bar. 

These rule require several constraints and various measures of significance, including the following:

  • Support: How often do those items appear together?
    • support = P(antecedent AND consequent)
    • i.e. what is the probability that the antecedent AND the consequent are Purchased together.
  • Confidence: How often is the rule true?
    • Confidence = P(antecedent AND consequent)” /”P(antecedent)
    •                       = P( consequent|antecedent )
    • i.e. what is the probability that the antecedent AND the consequent appear together conditional on the probability of the presence of antecedent. 
  • Lift: What is the strength of this association rule?
    • Provides information about the increase in probability of the consequent given the presence of the antecedent.
    • Lift = Confidence / Expected Confident
    • Expected Confidence = the sum of all purchases that include the consequent / the sum of all purchases
    • The lift ratio non-unit output can be translated to say how much more likely we are to see the selection of the consequent item for purchase, given the items present in the antecedent.
  • Note: I tried to provide the simplest explanation here, if you’re curious and would like to learn more, read this excellent post on it. 

Findings

As always we begin by examining the data that we have. In the visualization below, we can clearly see the percentage of customers that purchased various automobile options in several categories.

Now that we see the data, we run the analysis. I chose my input parameters to reflect that I want the strength of the rule to be set to a minimum of 6 purchases/transactions and the confidence to be set at 80%. The visualization below is the resulting information. 


Winning Combinations for Automobile Options Purchases

Our Client asked what are the winning combinations and in the visualization below, we can see that our best combinations are (fastest engine & 16 inch wheels & 5 year warranty) as well as (fastest engine & 16 inch wheels & traction control). This way our client and their marketing team would know that anytime someone wants to buy a car with the fastest engine and 16 inch wheels, they are just as likely to buy a 5 year warranty or traction control. We can say that with 85.71% confidence.

We can also note that traction control seems to be the most popular consequent option that our customers purchase. 

In the visualization below, we see the lift ratio of those transactions. The lift ratio is confidence over expected confidence and shows us how much more likely we are to see the selection of the consequent item for purchase, given the items present in the antecedent.

For example; below we see that customers who purchased a slower engine and traction control also bought the amfm/dvd sound system, 1.33 times more as those who didn’t buy the first two options.

Another example is the customers who bought the 3 year warranty and sunroof, also bought the 16 inch wheels, 1.85 times more than those who didn’t buy the 3 year warranty and sunroof together.


Overall this was a very interesting exercise and shows us the power of association rule mining. Now, our client can forge ahead armed with the power of reading their customer’s minds!


Software 

The Software used here is the Analytic Solver Platform for Education (XLMiner), a comprehensive data mining  Add-in for Excel.  (Here is the online guide for how to use it)

The information above is from the Graduate Certificate in Business Analytics: Descriptive Analytics course at Penn State University.