REDCap Database Creation Best Practices

This document provides general guidelines for REDCap database design. When creating your database, keep in mind optimizing for data entry and data analysis.

 

  1. The record ID has to be the first variable of the first form
  2. Group similar variables together
  3. Keep forms short
    1. Multiple shorter forms are easy to complete mentally, and they offer more opportunities to save your data
    2. Data entered is not saved until the data enterer clicks “save” or the survey participants clicks “go to next page” or “submit”
  4. Minimize the use of free response fields
    1. Wherever possible, use a multiple choice selection instead of asking data enters to type out the information to minimize non-uniform answers and typos
    2. If you are using free response fields, validate the data type whenever possible
      1. This will help minimize typos
      2. Unvalidated fields are stored as text and imported into statistical analysis programs as text. If you want to analyze the information as a number, your statistician will have to manual change the file types
      3. If it’s applicable, set maximums and minimums for your number fields to help minimize data entry errors. REDCap will let you continue if you enter numbers outside of this range, but it will let you know the number is outside the range. This will help minimize errors while still allowing for outliers
  5. Field notes
    1. Use field notes to help the data enterers enter information correctly
    2. For example, tell them what units a measurement should be entered in
  6. Minimize switching between fields where you type (text boxes, drop down menus) and fields where you use the mouse (such as radio buttons and check boxes)
  7. Break variables in to the key components you want to analyze—for example, if you are collecting addresses and will want to analyze both on the state level and on the city level, make those separate fields
  8. Be consistent in coding your variables
    1. Avoid yes/no and true/false fields, which tend to cause problems if you need to add a third option later—stick to radio buttons and dropdowns
    2. Don’t recode variables once you’ve started collecting data—this will corrupt your data. Just add the next number. The numbers do not have to be in order in the list.
    3. Code “unknowns” as a high number that stands out, like 999
    4. If you have multiple questions that will use the same answer choices, code them all the same
  9. Variable Names
    1. Variable names should be short, alphanumeric, easy to type, and ideally have some level of meaning—these are the labels that REDCap will use to locate your data, and these are what you will be typing over and over when you are doing your analysis
    2. Variable name should not be changed once data collection has begun
    3. If you change a variable name, you will also to change all piping, calculations, and branching logic associated with it
  10. Calculated fields
    1. REDCap is capable of doing calculations, but it is not designed as a statistical tool—save complex calculations for back end analysis
    2. As a rule of thumb, use calculations in REDCap if you need to see that calculation for the data collection process—otherwise, save calculations for back end analysis
    3. You can export your data from REDCap to SPSS, SAS, R, and Stata
    4. The REDCap FAQ is a great resource for seeing how to write calculated fields
  11. Branching logic
    1. Branching logic is a great way to hide fields that will be irrelevant to certain users and streamline and customize the data entry process
    2. Branching logic goes in the child field—the field that you want to hide
    3. You can either use the drag and drop menu or write out advanced branching logic
    4. The REDCap FAQ is a great resource for seeing how to write
  12. The Data Dictionary
    1. The data dictionary is an alternate way to build your project (as opposed to the online design)
    2. You can create or view the data dictionary as a csv file in Excel
    3. The data dictionary is particularly useful when you have to do repetitive work that requires few or small changes
      1. For example: If you need to apply the exact same branching logic to many fields, you can do it in the data dictionary much more quickly with copy and paste
      2. Another example: If you need to have many places to list medications being taken (name, dosage, date), it is much faster to make one set of fields in the Online Designer and then copy and paste the remaining fields from that in the data dictionary. You can use find-replace to change the 1 or 2 characters per row that will need to be changed
  13. Above all, think about how you will enter the data and how you want to analysis it while you’re building your database. If possible, speak with your data enterers and your statistician before you build your database and incorporate their suggestions in your database design