Dealing with Missing Data

From PsychWiki - A Collaborative Psychology Wiki

(Difference between revisions)
Jump to: navigation, search
Afrjb (Talk | contribs)
m (changed type o / grammar error, the word other was changed to order)
 
(3 intermediate revisions not shown)
Line 1: Line 1:
-
'''How do I deal with missing values?'''  Irrespective of whether the missing values are random or non-random, you have three options when dealing with missing values.  
+
'''How do I deal with missing values?'''  You have three basic options when dealing with missing values.  
-
*'''Option 1''' is to do nothing. Leave the data as is, with the missing values in place. This is the most frequent approach, for a few reasons. First, missing values are typically small. Second, missing values are typically non-random. Third, even if there are a few missing values on individual items, you typically create composites of the items by averaging them together into one new variable, and this composite variable will not have missing values because it is an average of the existing data. However, if you chose this option, you must keep in mind how SPSS will treat the missing values. SPSS will either use “listwise deletion” or “pairwise deletion” of the missing values. You can elect either one when conducting each test in SPSS.  
+
*'''Option 1''' is to do nothing. Leave the data as is, with the missing values in place. This is the most frequent approach, for a few reasons. First, the number of missing values are typically small. Second, missing values are typically non-random. Third, even if there are a few missing values on individual items, you typically create composites of the items by averaging them together into one new variable, and this composite variable will not have missing values because it is an average of the existing data. However, if you chose this option, you must keep in mind how SPSS will treat the missing values. SPSS will either use “listwise deletion” or “pairwise deletion” of the missing values. You can elect either one when conducting each test in SPSS.  
*#<u>Listwise deletion</u> – SPSS will not include cases (subjects) that have missing values on the variable(s) under analysis. If you are only analyzing one variable, then listwise deletion is simply analyzing the existing data. If you are analyzing multiple variables, then listwise deletion removes cases (subjects) if there is a missing value on any of the variables. The disadvantage is a loss of data because you are removing all data from subjects who may have answered some of the questions, but not others (e.g., the missing data).  
*#<u>Listwise deletion</u> – SPSS will not include cases (subjects) that have missing values on the variable(s) under analysis. If you are only analyzing one variable, then listwise deletion is simply analyzing the existing data. If you are analyzing multiple variables, then listwise deletion removes cases (subjects) if there is a missing value on any of the variables. The disadvantage is a loss of data because you are removing all data from subjects who may have answered some of the questions, but not others (e.g., the missing data).  
*#<u>Pairwise deletion</u> – SPSS will include all available data. Unlike listwise deletion which removes cases (subjects) that have missing values on any of the variables under analysis, pairwise deletion only removes the specific missing values from the analysis (not the entire case). In other words, all available data is included. [[Image:Fe40.png]] - If you are conducting a correlation on multiple variables, then SPSS will conduct the bivariate correlation between all available data points, and ignore only those missing values if they exist on some variables. In this case, pairwise deletion will result in different sample sizes for each correlation. Pairwise deletion is useful when sample size is small or missing values are large because there are not many values to begin with, so why omit even more with listwise deletion.
*#<u>Pairwise deletion</u> – SPSS will include all available data. Unlike listwise deletion which removes cases (subjects) that have missing values on any of the variables under analysis, pairwise deletion only removes the specific missing values from the analysis (not the entire case). In other words, all available data is included. [[Image:Fe40.png]] - If you are conducting a correlation on multiple variables, then SPSS will conduct the bivariate correlation between all available data points, and ignore only those missing values if they exist on some variables. In this case, pairwise deletion will result in different sample sizes for each correlation. Pairwise deletion is useful when sample size is small or missing values are large because there are not many values to begin with, so why omit even more with listwise deletion.
-
*#In other to better understand how listwise deletion versus pairwise deletion influences your results, try conducting the same test using both deletion methods. Does the outcome change? Also, its important to keep in mind that for each type of test you conduct, you need to identify if SPSS is using listwise or pairwise deletion. Mosts tests allow you to elect your preference, but you should always check your output for the number of cases used in each analysis to identify if pairwise or listwise deletion was used.
+
*#In order to better understand how listwise deletion versus pairwise deletion influences your results, try conducting the same test using both deletion methods. Does the outcome change? Also, its important to keep in mind that for each type of test you conduct, you need to identify if SPSS is using listwise or pairwise deletion. Most tests allow you to elect your preference, but you should always check your output for the number of cases used in each analysis to identify if pairwise or listwise deletion was used.
-
*'''Option 2''' is to delete cases with missing values. [[Image:Fe40.png]] - For every missing value in the dataset, you can delete the subjects with the missing values. Thus, you are left with complete data for all subjects. The disadvantage to this approach is you reduce the sample size of your data. If you have a large dataset, then it may not be a big disadvantage because you have enough subjects even after you delete the cases with missing values. Another disadvantage to this approach is that the subjects with missing values may be different than the subjects without missing values (e.g., missing values that are non-random), so you have a non-representative sample after removing the cases with missing values. Once situation in which I use Option 2 is when particular subjects have not answered an entire scale or page of the study.  
+
*'''Option 2''' is to delete cases with missing values. [[Image:Fe40.png]] - For every missing value in the dataset, you can delete the subjects with those missing values. Thus, you are left with complete data for all subjects. The disadvantage to this approach is you reduce the sample size of your data. If you have a large dataset, then it may not be a big disadvantage because you have enough subjects even after you delete the cases with missing values. Another disadvantage to this approach is that the subjects with missing values may be different than the subjects without missing values (e.g., missing values that are non-random), so you have a non-representative sample after removing the cases with missing values. Once situation in which I use Option 2 is when particular subjects have not answered an entire scale or page of the study.  
*'''Option 3''' is to replace the missing values, called imputation. There is little agreement about whether or not to conduct imputation. There is some agreement, however, in which type of imputation to conduct. [[Image:Fe40.png]] - You typically do NOT conduct Mean substitution or Regression substitution. Mean substitution is replacing the missing value with the mean of the variable. Regression substitution uses regression analysis to replace the missing value. Regression analysis is designed to predict one variable based upon another variable, so it can be used to predict the missing value based upon the subject’s answer to another variable. The favored type of imputation is replacing the missing values using different estimation methods. The “Missing Values Analysis” add-on module in SPSS contains the estimation methods.
*'''Option 3''' is to replace the missing values, called imputation. There is little agreement about whether or not to conduct imputation. There is some agreement, however, in which type of imputation to conduct. [[Image:Fe40.png]] - You typically do NOT conduct Mean substitution or Regression substitution. Mean substitution is replacing the missing value with the mean of the variable. Regression substitution uses regression analysis to replace the missing value. Regression analysis is designed to predict one variable based upon another variable, so it can be used to predict the missing value based upon the subject’s answer to another variable. The favored type of imputation is replacing the missing values using different estimation methods. The “Missing Values Analysis” add-on module in SPSS contains the estimation methods.
 +
 +
 +
 +
 +
 +
----
 +
◄ Back to [[Analyzing Data]] page

Latest revision as of 21:13, 11 September 2009

How do I deal with missing values? You have three basic options when dealing with missing values.






◄ Back to Analyzing Data page

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox