# Should I check for outliers?

### From PsychWiki - A Collaborative Psychology Wiki

(Difference between revisions)

Line 2: | Line 2: | ||

#YES -- Outliers can render your data non-normal. Since normality is one of the assumptions for many of the statistical tests you will conduct, finding and eliminating the influence of outliers may render your data normal, and thus render your data appropriate for analysis using those statistical tests. | #YES -- Outliers can render your data non-normal. Since normality is one of the assumptions for many of the statistical tests you will conduct, finding and eliminating the influence of outliers may render your data normal, and thus render your data appropriate for analysis using those statistical tests. | ||

#NO -- Just because a value is extreme compared to the rest of the data does not necessarily mean it is somehow an anomaly, or invalid, or should be removed. The subject chose to respond with that value, so removing that value is arbitrarily throwing away data simply because it does not fit this “assumption” that data should be “normal”. Conducting research is about discovering empirical reality. If the subject chose to respond with that value, then that data is a reflection of reality, so removing the “outlier” is the antithesis of why you conduct research. | #NO -- Just because a value is extreme compared to the rest of the data does not necessarily mean it is somehow an anomaly, or invalid, or should be removed. The subject chose to respond with that value, so removing that value is arbitrarily throwing away data simply because it does not fit this “assumption” that data should be “normal”. Conducting research is about discovering empirical reality. If the subject chose to respond with that value, then that data is a reflection of reality, so removing the “outlier” is the antithesis of why you conduct research. | ||

- | #MAYBE -- One solution is to analyze your data with the outlier and without the outlier because each analysis tells you separate types of information. [[Image:Fe40.png]] - Imagine a study that asks the American public how many sexual partners they have over their lifetime. See the frequency distribution below for the findings from this hypothetical study. The average number of sexual partners is 45 when you include the "outliers" which said they have 100+ sexual partners in their lifetime, but without the "outliers" the average number of sexual partners is 7. | + | #MAYBE -- One solution is to analyze your data with the outlier and without the outlier because each analysis tells you separate types of information. [[Image:Fe40.png]] - Imagine a study that asks the American public how many sexual partners they have over their lifetime. See the frequency distribution below for the findings from this hypothetical study. The average number of sexual partners is 45 when you include the "outliers" which said they have 100+ sexual partners in their lifetime, but without the "outliers" the average number of sexual partners is 7. Analyzing the data with the outliers is informative because some people do have extreme sexual habits so including the data in the analysis provides a better reflection of reality. However, analyzing the data without the outliers is also informative because it provides an average that better fits the data -- in other words, an average of "45" when you include the outliers does not represent the data since almost no one in the sample is at or around the 45 range; whereas an average of "7" when excluding the outliers provides a good fit with the data. |

- | + | ||

<center>[[Image:Sexpartners_histogram0.png|400px]]</center> | <center>[[Image:Sexpartners_histogram0.png|400px]]</center> | ||

## Revision as of 04:45, 16 February 2008

**Should I check for outliers?**

- YES -- Outliers can render your data non-normal. Since normality is one of the assumptions for many of the statistical tests you will conduct, finding and eliminating the influence of outliers may render your data normal, and thus render your data appropriate for analysis using those statistical tests.
- NO -- Just because a value is extreme compared to the rest of the data does not necessarily mean it is somehow an anomaly, or invalid, or should be removed. The subject chose to respond with that value, so removing that value is arbitrarily throwing away data simply because it does not fit this “assumption” that data should be “normal”. Conducting research is about discovering empirical reality. If the subject chose to respond with that value, then that data is a reflection of reality, so removing the “outlier” is the antithesis of why you conduct research.
- MAYBE -- One solution is to analyze your data with the outlier and without the outlier because each analysis tells you separate types of information. - Imagine a study that asks the American public how many sexual partners they have over their lifetime. See the frequency distribution below for the findings from this hypothetical study. The average number of sexual partners is 45 when you include the "outliers" which said they have 100+ sexual partners in their lifetime, but without the "outliers" the average number of sexual partners is 7. Analyzing the data with the outliers is informative because some people do have extreme sexual habits so including the data in the analysis provides a better reflection of reality. However, analyzing the data without the outliers is also informative because it provides an average that better fits the data -- in other words, an average of "45" when you include the outliers does not represent the data since almost no one in the sample is at or around the 45 range; whereas an average of "7" when excluding the outliers provides a good fit with the data.

◄ Back to Research Tools mainpage