How do I determine whether my data are normal?

From PsychWiki - A Collaborative Psychology Wiki

(Difference between revisions)
Jump to: navigation, search
Doug (Talk | contribs)
Doug (Talk | contribs)
 
Line 2: Line 2:
*There are three interrelated approaches to determine normality, and all three should be conducted.
*There are three interrelated approaches to determine normality, and all three should be conducted.
*#Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. [[Image:Fe40.png]] - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are  normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see [[What are outliers?]], [[Detecting Outliers - Univariate | How do I detect outliers?]], and [[Dealing with Outliers | How do I deal with outliers?]]. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data.  <center><table><td>[[Image:V1hn0.png|350px]]</td><td>[[Image:V2hnn0.png|350px]]</td></table></center><br><br>
*#Look at a histogram with the normal curve superimposed. A histogram provides useful graphical representation of the data. [[Image:Fe40.png]] - To provide a rough example of normality and non-normality, see the following histograms. The black line superimposed on the histograms represents the bell-shaped "normal" curve. Notice how the data for variable1 are  normal, and the data for variable2 are non-normal. In this case, the non-normality is driven by the presence of an outlier. For more information about outliers, see [[What are outliers?]], [[Detecting Outliers - Univariate | How do I detect outliers?]], and [[Dealing with Outliers | How do I deal with outliers?]]. Problem -- All samples deviate somewhat from normal, so the question is how much deviation from the black line indicates “non-normality”? Unfortunately, graphical representations like histogram provide no hard-and-fast rules. After you have viewed many (many!) histograms, over time you will get a sense for the normality of data.  <center><table><td>[[Image:V1hn0.png|350px]]</td><td>[[Image:V2hnn0.png|350px]]</td></table></center><br><br>
-
*#Look at the values of Skewness and Kurtosis. '''Skewness''' involves the symmetry of the distribution. Skewness that is normal involves a perfectly symmetric distribution. A positively skewed distribution has scores clustered to the left, with the tail extending to the right. A negatively skewed distribution has scores clustered to the right, with the tail extending to the left. '''Kurtosis''' involves the peakedness of the distribution. Kurtosis that is normal involves a distribution that is bell-shaped and not too peaked or flat. Positive kurtosis is indicated by a peak. Negative kurtosis is indicated by a flat distribution. Both Skewness and Kurtosis are 0 in a normal distribution, so the farther away from 0, the more non-normal the distribution. The question is “how much” skew or kurtosis render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness and Kurtosis. [[Image:Fe40.png]] - The histogram above for variable1 represents perfect symmetry (skewness) and perfect peakedness (kurtosis); and the descriptive statistics below for variable1 parallel this information by reporting "0" for both skewness and kurtosis. The histogram above for variable2 represents positive skewness (tail extending to the right) and positive kurtosis (high peak); and the descriptive statistics below for variable2 parallel this information. Problem -- The question is “how much” skew or kurtosis render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness and Kurtosis. Luckily, there are more objective tests of normality, described next. <center><table><td>[[Image:V1d5tm.PNG|350px]]</td><td width=50></td><td>[[Image:V2d5tm.PNG|350px]]</td></table></center><br><br>
+
*#Look at the values of Skewness. '''Skewness''' involves the symmetry of the distribution. Skewness that is normal involves a perfectly symmetric distribution. A positively skewed distribution has scores clustered to the left, with the tail extending to the right. A negatively skewed distribution has scores clustered to the right, with the tail extending to the left. Skewness is 0 in a normal distribution, so the farther away from 0, the more non-normal the distribution. The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. [[Image:Fe40.png]] - The histogram above for variable1 represents perfect symmetry (skewness) and perfect peakedness (kurtosis); and the descriptive statistics below for variable1 parallel this information by reporting "0" for both skewness and kurtosis. The histogram above for variable2 represents positive skewness (tail extending to the right); and the descriptive statistics below for variable2 parallel this information. Problem -- The question is “how much” skew render the data non-normal? This is an arbitrary determination, and sometimes difficult to interpret using the values of Skewness. Luckily, there are more objective tests of normality, described next. <center><table><td>[[Image:V1d5tm.PNG|350px]]</td><td width=50></td><td>[[Image:V2d5tm.PNG|350px]]</td></table></center><br><br>
*#Look at established tests for normality that take into account both Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of your sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality. If the test is significant (less than .05), then the data are non-normal. [[Image:Fe40.png]] - See the data below which indicate variable1 is normal, and variable2 is non-normal. Also, keep in mind one limitation of the normality tests is that the larger the sample size, the more likely to get significant results. Thus, you may get significant results with only slight deviations from normality when sample sizes are large. <center>[[Image:V1v2tn.PNG]]</center><br><br>
*#Look at established tests for normality that take into account both Skewness and Kurtosis simultaneously. The Kolmogorov-Smirnov test (K-S) and Shapiro-Wilk (S-W) test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of your sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality. If the test is significant (less than .05), then the data are non-normal. [[Image:Fe40.png]] - See the data below which indicate variable1 is normal, and variable2 is non-normal. Also, keep in mind one limitation of the normality tests is that the larger the sample size, the more likely to get significant results. Thus, you may get significant results with only slight deviations from normality when sample sizes are large. <center>[[Image:V1v2tn.PNG]]</center><br><br>
*#Look at normality plots of the data. “Normal Q-Q Plot” provides a graphical way to determine the level of normality. The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal. [[Image:Fe40.png]] - Notice how the data for variable1 fall along the line, whereas the data for variable2 deviate from the line. <center><table><td>[[Image:V1nqp0.png|350px]]</td><td width=50></td><td>[[Image:V2nqp0.png|350px]]</td></table></center><br><br>
*#Look at normality plots of the data. “Normal Q-Q Plot” provides a graphical way to determine the level of normality. The black line indicates the values your sample should adhere to if the distribution was normal. The dots are your actual data. If the dots fall exactly on the black line, then your data are normal. If they deviate from the black line, your data are non-normal. [[Image:Fe40.png]] - Notice how the data for variable1 fall along the line, whereas the data for variable2 deviate from the line. <center><table><td>[[Image:V1nqp0.png|350px]]</td><td width=50></td><td>[[Image:V2nqp0.png|350px]]</td></table></center><br><br>

Latest revision as of 18:42, 13 December 2016

How do I determine whether my data are normal?






◄ Back to Analyzing Data page

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox