What are outliers?
From PsychWiki - A Collaborative Psychology Wiki
Revision as of 20:53, 7 September 2009 by Doug
- What are outliers?
- Outliers are extreme values as compared to the rest of the data.
- What does "extreme" mean?
- The determination of values as “outliers” is subjective. While there are a few benchmarks for determining whether a value is an “outlier”, those benchmarks are arbitrarily chosen, similar to how “p<.05” is also arbitrarily chosen.
- One benchmark is to use a BOXPLOT to determine "mild" and "extreme" outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores. Extreme outliers are any score more than 3*IQR from the rest of the scores. IQR stands for the Interquartile range, which is the middle 50% of the scores. In other words, an outlier is determined by comparison to the bulk of the scores in the middle. See Detecting Outliers - Univariate for how to determine if outliers exist using a boxplot.
- There are two categories of outliers - univariate and multivariate
- Univariate outliers are extreme values on a single variable. - If you have 10 survey questions in your study, then you would conduct 10 separate univariate outlier analyses, one for each variable. Also, when you average the 10 questions together into a new composite variable, you can conduct univariate outlier analysis on the new variable. Another way you would conduct univariate analysis is by looking at individual variables within different groups. - You would conduct univariate analysis on those same 10 survey questions within each gender (males and females), or within political groups (republican, democrat, other), etc. Or, if you are conducting an experiment with more than one condition, such as manipulating happiness and sadness in your study, then you would conduct univariate analysis on those same 10 survey questions within both groups.
- The second category of outliers is multivariate outliers. Multivariate outliers are extreme combinations of scores on two or more variables. - If you are looking at the relationship between height and weight, then there may be a joint value that is extreme compared to the rest of the data, such as someone with extremely low height but high weight, or high weight but low height, and so forth. You first look for univariate outliers, then proceed to look for multivariate outliers.
◄ Back to Analyzing Data page