Sales Data Exploration and Reduction (Data Mining)

In this post, I examine Sales data and apply descriptive data mining techniques to analyze the data set and describe patterns. The main reason for conducting such an analysis is to help business leaders predict the future by informing their decision making process. 

I imagined that our client came to us with this question: 

Question: I have a data set of all my recent sales, Do you see anything there that can help us better target our customers?

Answer: This question was answered in the visualization below!
Please note that there are further explanations about each finding below. I also included my methodology and some explanations about cluster analysis towards the end. 


(Click to Enlarge)

To answer this question properly, the client and I would need a more robust conversation in which we define the question/s in more detail. However, for the sake of this exercise, I assumed that the client wanted us to analyze any patterns or similarities. Therefore, I decided to employ my trusty hierarchical cluster analysis algorithm (more on this towards the end of the article). 


Software and Data

The Software used here is the Analytic Solver Platform for Education (XLMiner), a comprehensive data mining  Add-in for Excel.  (Here is the online guide for how to use it)

The data is fictitious and contains information regarding customers, Gross Profits, Gross Sales, the industry code of each customer as well as the competitive rating associated with each industry (1= least competitive, 5= most competitive).


Question: I have a data set of all my recent sales, Do you see anything there that can help us better target our customers?

Our Cluster Analysis produced 4 clusters for us and the best approach here was employ a pivot table and a couple of visualizations to help us see the information better and hopefully be of great use to our client. Here are some Findings:

Our Analysis produced 4 clusters that vary widely across percent gross profit and competitive rating.

As we can see in the visualization and table below, Cluster 2 has the highest competitive rating across the board with an average of 3.13 and a seemingly small average for percent gross profit at 20%.

However, it is worth noting that clusters 1,3,4 actually each consist of one customer per cluster! This makes it incredibly difficult to derive any statistically significant value from those clusters. This means that although the average percent gross profit in clusters 1 and 4 is higher than cluster 2, we should remember that this information is generated by only 2 customers. This merits further analysis of cluster 2 in the future.


(Click to Enlarge)

(Click to Enlarge)

Cluster 2 generates the most sales and the most profit on average.

we can see in the visualization below that Cluster 2, on average, generates more revenue that all the other clusters combined. This information can be of utmost importance to any sales manager, because now they can employ specific targeting strategies that can save time and money.


(Click to Enlarge)

Cluster 2 also generates the most sales and the most profit in general

Here we used the raw values and summed up the total sales and profits generated by all the clusters. We can clearly see that Cluster 2 generates the most revenue for the company.


(Click to Enlarge)

I separated Cluster 2 from the rest of the graph so as to better see the difference and to not obstruct the scale given the relatively low values in clusters 1,3, and 4. 


(Click to Enlarge)

(Click to Enlarge)

Methodology and Approach to the Question

Although there are several approaches to Data Mining, including classification, association, and cause-and-effect modeling, in this analysis I used the Data Exploration and Reduction approach, i.e. Cluster Analysis. This approach allows us to identify groups that share similar elements. This form of segmentation can truly help us in crafting better and more thoughtful targeting campaigns.

Given that when dealing with big data, applying such analyses can be costly and/or time-consuming, it is generally recommended that we work with a sample. However, given that the data presented was around 60 points only, I decided to use the data as is. 

Additionally, given that we didn’t know how many clusters would be ideal, I decided to go with hierarchical clustering, because this method provided me with more flexibility. 

The hierarchical clustering algorithm unlike the K-means clustering algorithm does not rely on a predetermined Parameter, this means that once the incremental clustering produces a dendogram ( a tree diagram), then we can peel back at the layers and pick the cluster formation that makes more sense to us and our analysis.

A word of caution to those who may listen:

As opposed to many other data-mining techniques, cluster analysis is primarily descriptive, and we cannot draw statistical inferences about a sample using it. In addition, the clusters identified are not unique and depend on the specific procedure used; therefore, it does not result in a definitive answer but only provides new ways of looking at data. Nevertheless, it is a widely used technique.

Evans, James R. (James Robert), 1950-. (2016). Business analytics : methods, models, and decisions. Boston :Pearson,

After deciding to go with hierarchical clustering, I chose to use the Single Linkage agglomeration clustering method, aka the nearest neighbor method, to analyze the data in the Gross Profit, Industry Code and Competitive Rating columns. The results and associated visualizations are further explained above. 

By the way: Here is the Dendorgam produced by the data!

(Click to enlarge)

If you have read so far, Thank you! This exercise was quite a bit of fun to experiment with. In the future, I ‘d like to apply this method to a larger data set and hopefully be able to extract more exciting insights. 

The information above is from the Graduate Certificate in Business Analytics: Descriptive Analytics course at Penn State University.