Salary Analysis for Data Mining - PA3

Overview

In this app, we explore salary distributions through a histogram (manually computed), boxplots, scatter plots, quantile plots, and a Q-Q plot. We also compare two salary datasets.

Histogram of Salary Data (Dataset 1)

10

Interpretation:
The histogram shows a concentration of salaries at the lower end with a long tail towards the higher salaries. This suggests that while most salaries are modest, a few high values (possibly outliers) create a heavy tail.

Boxplot and Outlier Analysis (Dataset 1)

Discussion:
Removing the outliers reduces the spread of the data and provides a clearer view of the central tendency. The updated boxplot shows a more compact distribution with adjusted quartiles.

Comparison of Two Salary Distributions

Scatter Plot Comparison

Boxplot Comparison

Comparison Interpretation:
The scatter plot and boxplots show that while both datasets cover a similar range, their distributions differ slightly. These differences are visible in their central tendencies and spread.

Quantile Plots for Each Dataset

Observations on the Quantile Plots:
The quantile plots display how the salary values progress across percentiles for each dataset.
- Vertical markers (default) are drawn at the 25th, 50th, and 75th percentiles.
- Horizontal markers show the salary values corresponding to these percentiles.
The interactive marker allows you to select any percentile to further explore the data.

Q-Q Plot Between Datasets

Observation from the Q-Q Plot:
If the plotted points lie close to the 45-degree line, it indicates that the two datasets have similar distributional characteristics. Deviations from this line suggest differences such as heavier tails or shifts in the central location.