Detecting Outliers - Univariate

From PsychWiki - A Collaborative Psychology Wiki

Revision as of 04:55, 16 February 2008 by Stenstro (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
  1. One way is to visually inspect your data with a FREQUENCY DISTRIBUTION. Fe40.png - Imagine a study that asks the American public how many sexual partners they have over their lifetime. See the frequency distribution below for the findings from this hypothetical study. The people who said they have 100+ sexual partners in their lifetime appear disconnected from the rest of the data.
    Sexpartners histogram0.png
  2. One statistical benchmark is to use a BOXPLOT to determine "mild" and "extreme" outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores, and are indicated by open dots. Extreme outliers are any score more than 3*IQR from the rest of the scores. IQR stands for the Interquartile range, which is the middle 50% of the scores. In other words, an outlier is determined by comparison to the bulck of the scores in the middle. Fe40.png - The output below is from SPSS for a variable called "system1". A boxplot is a graphical display of the data that shows: (1) median, which is the middle black line, (2) middle 50% of scores, which is the shaded region, (3) top and bottom 25% of scores, which are the lines extending out of the shaded region, (4) the smallest and largest (non-outlier) scores, which are the horizontal lines at the top/bottom of the boxplot, and (5) outliers. For this variable, there is 1 mild outlier (subject #52) and 1 extreme outlier (subject #18).
    System1 boxplot0.png

◄ Back to Research Tools mainpage

Personal tools