Normality

Choose Stats > Normality.

../_images/norm_test1.png

A normality test is a statistical procedure used to determine whether a dataset follows a normal distribution. This is important because many statistical methods assume normality in the data, and violations of this assumption can lead to incorrect conclusions. There are several normality tests available, each with its own methodology and application scenarios.

  • Shapiro-Wilk Test: The Shapiro-Wilk test is one of the most powerful normality tests, particularly effective for small to moderate sample sizes. It calculates a W statistic that measures how closely the sample data matches a normal distribution. A small W value indicates significant deviation from normality, leading to the rejection of the null hypothesis that the data is normally distributed.

  • The Anderson-Darling test: is another robust normality test that builds on the Kolmogorov-Smirnov test by giving more weight to the tails of the distribution. This makes it particularly sensitive to deviations in the tails of the data distribution. It calculates an A² statistic, with larger values indicating greater deviation from normality.

  • p-Value: The p-value shows the probability of null hypothesis is true, which is the samples were from a normal distribution. So when the p-value is smaller than the significance level, the null hypothesis should be rejected.

Differences Between Shapiro-Wilk and Anderson-Darling Tests

Aspect

Shapiro-Wilk Test

Anderson-Darling Test

Sensitivity

Effective for small to moderate sample sizes

Sensitive to deviations in the tails of the distribution

Statistic Used

W statistic

A² statistic

Power

Generally exhibits higher statistical power overall

Also powerful but slightly less so than Shapiro-Wilk

Weighting

Does not specifically emphasize tails

Places more weight on the tails of the distribution

Use Case

Preferred for general use with smaller datasets

Useful when tail behavior is critical

A sample output.

../_images/norm_test2.png

QQ plots compare the quantiles of the sample data against the quantiles of a theoretical normal distribution. This allows for a visual inspection of how closely the data follows normality.

  • Normally Distributed Data: If the data is normally distributed, the points on the QQ plot will roughly form a straight line from the lower left to the upper right corner.

  • Non-Normal Data: Deviations from this straight line indicate departures from normality.

../_images/normal_probability_plot_s_curve_pattern.png ../_images/normal_probability_plot_inverted_s_curve_pattern.png ../_images/normal_probability_plot_downward_curve_pattern.png ../_images/normal_probability_plot_outliers.png

The patterns violate the assumption that the residuals are normally distributed.

  • S-curve implies a distribution with long tails.

  • Inverted S-curve implies a distribution with short tails.

  • Downward curve implies a right-skewed distribution.

  • A few points lying away from the line implies a distribution with outliers.