Overview
Black Friday is the most significant retail shopping day in
America and being able to plan strategy around attracting high-dollar buyers
would be an important advantage for retailers in an increasingly competitive
retail market. Using a Kaggle dataset and Alteryx, I created three predictive
models for finding shoppers likely to spend more than $10,000 on their Black
Friday shopping.
Description of the Data
The dataset for this project was obtained from Kaggle and
uploaded into Alteryx. I used Alteryx to examine the data, looking for missing
values or outliers. None were found in the data, and the data seemed to be very
clean, so little to no cleaning the data was necessary. The dataset contained
537,577 records. Given the cleanliness of the data, I proceeded to explore the
data.
Model Selection
I created multiple models in Alteryx, including boosted tree,
decision tree, and neural network models. The model results are as follows:
-
- The Boosted Tree produced more
accurate results than the decision tree model. The confusion matrix for the decision
tree and boosted tree models are shown below:
Decision Tree:
Boosted Tree:
I also used Alteryx to construct a variable importance plot which can be seen below:
We can see in the plot that gender, occupation, and city category are the most influential factors in predicting whether or not a person will be a high-dollar shopper. However, the chart does not indicate the direction of the correlation. To explore these categories more, I created plots of means. The plots can be found below:
The plots of means indicate that buyers with jobs 12, 15, and 17, buyers from C type cities, and male buyers spend more than others. Using this information, a company seeking to maximize their appeal to high-dollar buyers would target these demographic groups.
Comments
Post a Comment