# How do I determine whether my data are normal?

### From PsychWiki - A Collaborative Psychology Wiki

(Difference between revisions)

Line 2: | Line 2: | ||

*There are three interrelated approaches to determine normality, and all three should be conducted. | *There are three interrelated approaches to determine normality, and all three should be conducted. | ||

*#Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. [[Image:Fe40.png]] - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see [[What are outliers?]], [[Detecting Outliers - Univariate | How do I detect outliers?]], and [[Dealing with Outliers | How do I deal with outliers?]]. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data. <center><table><td>[[Image:V1hn0.png|350px]]</td><td>[[Image:V2hnn0.png|350px]]</td></table></center><br><br> | *#Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. [[Image:Fe40.png]] - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see [[What are outliers?]], [[Detecting Outliers - Univariate | How do I detect outliers?]], and [[Dealing with Outliers | How do I deal with outliers?]]. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data. <center><table><td>[[Image:V1hn0.png|350px]]</td><td>[[Image:V2hnn0.png|350px]]</td></table></center><br><br> | ||

- | *#Look at the values of Skewness | + | *#Look at the values of Skewness. '''Skewness''' involves the symmetry of the distribution. Skewness that is normal involves a perfectly symmetric distribution. A positively skewed distribution has scores clustered to the left, with the tail extending to the right. A negatively skewed distribution has scores clustered to the right, with the tail extending to the left. Skewness is 0 in a normal distribution, so the farther away from 0, the more non-normal the distribution. The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. [[Image:Fe40.png]] - The histogram above for variable1 represents perfect symmetry (skewness) and perfect peakedness (kurtosis); and the descriptive statistics below for variable1 parallel this information by reporting "0" for both skewness and kurtosis. The histogram above for variable2 represents positive skewness (tail extending to the right); and the descriptive statistics below for variable2 parallel this information. Problem -- The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. Luckily, there are more objective tests of normality, described next. <center><table><td>[[Image:V1d5tm.PNG|350px]]</td><td width=50></td><td>[[Image:V2d5tm.PNG|350px]]</td></table></center><br><br> |

*#Look at established tests for normality that take into account both Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of your sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality. If the test is significant (less than .05), then the data are non-normal. [[Image:Fe40.png]] - See the data below which indicate variable1 is normal, and variable2 is non-normal. Also, keep in mind one limitation of the normality tests is that the larger the sample size, the more likely to get significant results. Thus, you may get significant results with only slight deviations from normality when sample sizes are large. <center>[[Image:V1v2tn.PNG]]</center><br><br> | *#Look at established tests for normality that take into account both Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of your sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality. If the test is significant (less than .05), then the data are non-normal. [[Image:Fe40.png]] - See the data below which indicate variable1 is normal, and variable2 is non-normal. Also, keep in mind one limitation of the normality tests is that the larger the sample size, the more likely to get significant results. Thus, you may get significant results with only slight deviations from normality when sample sizes are large. <center>[[Image:V1v2tn.PNG]]</center><br><br> | ||

*#Look at normality plots of the data. “Normal Q-Q Plot” provides a graphical way to determine the level of normality. The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal. [[Image:Fe40.png]] - Notice how the data for variable1 fall along the line, whereas the data for variable2 deviate from the line. <center><table><td>[[Image:V1nqp0.png|350px]]</td><td width=50></td><td>[[Image:V2nqp0.png|350px]]</td></table></center><br><br> | *#Look at normality plots of the data. “Normal Q-Q Plot” provides a graphical way to determine the level of normality. The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal. [[Image:Fe40.png]] - Notice how the data for variable1 fall along the line, whereas the data for variable2 deviate from the line. <center><table><td>[[Image:V1nqp0.png|350px]]</td><td width=50></td><td>[[Image:V2nqp0.png|350px]]</td></table></center><br><br> |

## Latest revision as of 18:42, 13 December 2016

**How do I determine whether my data are normal?**

- There are three interrelated approaches to determine normality, and all three should be conducted.
- Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see What are outliers?, How do I detect outliers?, and How do I deal with outliers?. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data.

- Look at the values of Skewness.
**Skewness**involves the symmetry of the distribution. Skewness that is normal involves a perfectly symmetric distribution. A positively skewed distribution has scores clustered to the left, with the tail extending to the right. A negatively skewed distribution has scores clustered to the right, with the tail extending to the left. Skewness is 0 in a normal distribution, so the farther away from 0, the more non-normal the distribution. The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. - The histogram above for variable1 represents perfect symmetry (skewness) and perfect peakedness (kurtosis); and the descriptive statistics below for variable1 parallel this information by reporting "0" for both skewness and kurtosis. The histogram above for variable2 represents positive skewness (tail extending to the right); and the descriptive statistics below for variable2 parallel this information. Problem -- The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. Luckily, there are more objective tests of normality, described next.

- Look at established tests for normality that take into account both Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of your sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality. If the test is significant (less than .05), then the data are non-normal. - See the data below which indicate variable1 is normal, and variable2 is non-normal. Also, keep in mind one limitation of the normality tests is that the larger the sample size, the more likely to get significant results. Thus, you may get significant results with only slight deviations from normality when sample sizes are large.

- Look at normality plots of the data. “Normal Q-Q Plot” provides a graphical way to determine the level of normality. The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal. - Notice how the data for variable1 fall along the line, whereas the data for variable2 deviate from the line.

- Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see What are outliers?, How do I detect outliers?, and How do I deal with outliers?. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data.

◄ Back to Analyzing Data page