Improve stock control using cluster analysis

Last updated on Jul 17, 2023

Table of Contents

Objectives

The goal of this project is to:

Categorized fabric by features
Identify the category with higher stock
Predict the sales

Dataset

Data was collected from the previous quarter’s sales results, with a total of 3000 fabrics. The detailed descriptions of the extracted features and target are listed in the table. Fig. 1 provides an example of fabric and its features. Regarding the targets, since the purchased quantity of each fabric varies, we use the percentage of sales instead of the total amount of sales for each fabric.

Feature	Unit	Type	Range	Description
HUE	degree	Numeric	[0, 359]	Hue of the dominant color.
PERCENT	-	Numeric	[0, 100]	Percentage of dominant color.
RANGE	degree	Numeric	[0, 359]	Range of colors.
THEME	-	Categorical	Monochrome, analogous, complementary, triadic, none	Color themes.
SIZE	$cm^2$	Numeric	[0, 1247.35]	Pattern size.
CONTRAST	-	Numeric	[0, 127.5]	Standard deviation in grayscale.
SALES	-	Numeric	[0, 100]	Percentage of sales. Total sales at the end of the month divided by total purchases at the beginning of the month.

Exploratory data analysis

Distribution

We can perform a descriptive analysis by examining the distribution of each feature and its correlation with sales, as shown in the figure.

The percentage of blue and red is high in the main color (HUE). However, HUE is uncorrelated to SALES in Figure, indicating that our customers have no significant preference for the main color of the fabric. Therefore, we will drop this feature in future analyses.

The RANGE and SIZE of the fabric are relatively small. This is reasonable, as it is difficult to design fabric with a wide color range, large pattern size. Regarding THEME, it is evident that the percentage decreases as the theme becomes more intricate. This is reasonable since, similar to RANGE, designing becomes more challenging when there are more limitations on color usage to match the theme.

In terms of SALES performance, the mean is 40% and the distribution is mainly below this mark.

Correlation

The figure shows the correlation between SALES and features. Despite the low correlation, all below 0.5, we can still gain some insight into customer preferences. All features, except for the PERCENT, monochrome, non-specific color themes, are positively correlated with sales.

The result is quite intuitive, figure provides an example. Fabric A has a wider range of colors and larger pattern size, as well as higher contrast. It uses a triadic theme to balance red, blue, and green, resulting in a higher visual appeal than B. Therefore, it is reasonable to assume that A has higher sales than B.

Models

Before conducting quantitative analysis, it’s important to split the data into training and test datasets. This is necessary for verification in the final step of the analysis. Additionally, data standardization is required to ensure that all features have a comparable range.

Dimensionality reduction

As there are both numerical and categorical features, the covariance matrix should be treated differently. For numerical features, we can follow the original definition. However, for categorical features, the covariance matrix is constructed by preserving the $\chi^2$ distance between features. Next, we can perform singular value decomposition on the covariance matrix to find the principal components. In summary, PCA (principal component analysis) and CA (correspondence analysis) are utilized to decrease feature dimensionality.

The results are shown in the figure. We can use three principal components and still preserve 70% of the total variance. The loading of each components are also shown in figure.

Cluster analysis

K-means

After identifying the turning point in the elbow plot, as shown in figure, we have decided to use 3 clusters for further analysis. For each cluster, we calculate the average of its features as shown in the figure.

The first cluster, named “Colorful”, has higher values in RANGE, CONTRAST, and multiple color themes. The second cluster, named “Large pattern”, has the highest value in SIZE. In contrast to the first two clusters, the third cluster has only a large positive value in main color percentage (PERCENT) and monochrome color theme, so it is named “Plain” and should be considered less visually appealing than the first two clusters. The figure provides examples of fabric for each cluster.

Sales and Stock

The histogram and cumulative curve for the percentage of sales are shown in the figure. It is evident that the distribution of “Plain” fabric is concentrated near zero. The portion of sales and stock are shown in figure, the “Plain” fabric take up to 60% of total stock and only contributes to 31.9% of sales. In other words, the root cause of the low sales is the “Plain” fabric. Therefore, if we reduce the total portion of “Plain” fabric in every purchase, we can improve our stock control. The test data leads to the same conclusion (not shown).

Figure: (a) The histogram and cumulative curve, as well as (b) the sale and stock of each cluster, indicate that the “Plain” fabric is the root cause of low sales.

Regression

Another goal is to make predictions on sales. To do this, we can first examine the correlation between sales and components from dimension reduction. Unfortunately, there is only one component reaching -0.5, which means that the current features are still insufficient to make precise predictions.

Machine Learning

Alice Wu

Professor of Artificial Intelligence

My research interests include distributed robotics, mobile computing and programmable matter.

Improve stock control using cluster analysis

Objectives

Dataset

(a)show hide

(b)show hide

(c)show hide

(d)show hide

Exploratory data analysis

Distribution

HUEshow hide

PERCENTshow hide

RANGEshow hide

SIZEshow hide

CONTRASTshow hide

SALESshow hide

THEMEshow hide

Correlation

test

(a)show hide

(c)show hide

Models

Dimensionality reduction

(a)show hide

(c)show hide

Cluster analysis

K-means

(a)show hide

(c)show hide

(a)show hide

Colorfulshow hide

Large patternshow hide

Plainshow hide

Sales and Stock

(a)show hide

(a)show hide

Regression

(a)show hide

(a)show hide

Alice Wu

Professor of Artificial Intelligence

(a)

(b)

(c)

(d)

HUE

PERCENT

RANGE

SIZE

CONTRAST

SALES

THEME

(a)

(c)

(a)

(c)

(a)

(c)

(a)

Colorful

Large pattern

Plain

(a)

(a)

(a)

(a)