Checking Data Entry

From PsychWiki - A Collaborative Psychology Wiki

Revision as of 03:59, 16 February 2008 by Stenstro (Talk | contribs)
Jump to: navigation, search
  1. People make mistakes. If your data have been entered by hand (by a person), then you need to double-check whether any mistakes were made when transfering into a dataset (software like SPSS, SAS, R+, S, etc.)
  2. Computers make mistakes. If you collected your data online or through an online system (such as surveymonkey or your own hosted site, then the data *should* transfer into the dataset without error, unless the online system was incorrectly set up.
  3. Irrespective of how the mistake occured, mistakes will misrepresent your true data. The purpose of conducting research is to discover reality, so incorrectly entered data thrawt the purpose of research.
  4. Misrepresenting the data that was collected can significantly impact your findings. A single incorrectly entered number can be an outlier or reduce normality or change the findings from your study.

  1. Have two or more people enter the same data and look for discrepancies. Fe40.png - If you enter the data in excel or spss, you can have two or more people enter the data into separate excel files, and then merge them together looking for differences between the two.
  2. Have someone enter the data, and then double-check by randomly picking different segments to look for incorrectly entered data.
  3. Statistical software (like SPSS, SAS, R+, S, etc) can use descriptive analysis to look for numbers that are out of range or errors in data entry. Fe40.png - The output below from SPSS for the variable "system1" shows that a subject put a "13" for the question even though the only correct responses were 1 through 11.
System1 outofrange.PNG

  1. The first step is to identify why it was entered incorrectly. Fe40.png - The output above for variable "system1" shows a "13". Since 13 is an invalid number, you then need to identify why “13” was entered. Did the person entering data make a mistake? Or, did the subject respond with a “13” even though the question indicated that only numbers 1 through 11 are valid?
  2. You can identify the source of the error by looking at the hard copies of the data. Fe40.png - Find the subject who indicated the "13" by sorting the data by that variable. Look at the hard copies of the data for that subject to see if it was the subject who put a "13" or whether the subject responded with a different number than recorded in the dataset.

◄ Back to Research Tools mainpage

Personal tools