REDCap Best Practices
REDCap Data Management
Introduction to REDCap Best Practices
This resource is intended to provide best practices to University of Toronto researchers who use REDCap in their research, including academic, clinical, administrative, and operational data collection.
- For ethics best practices, refer to the University of Toronto Policy on Ethical Conduct in Research
- For storage best practices, refer to the University of Toronto REDCap Strorage Best Practices
Plan your Data
Formally map every piece of data you collect to your analytic plan and/or reporting requirements. Ensure that the data collection will supply all required data.
Consider your data in terms of the number of variables you are trying to capture. Try to collect as much data as you need to prove or disprove your hypothesis. Be aware of common data collection mistakes:
- Collecting more data than necessary is tedious to sift through and can lack meaningful insight.
- Managing too many data elements may make it easier to overlook critical errors in the data.
Recommendation: Consult with a statistical consultant well before you implement your data collection plan. Statistical consultants can provide a “fresh set of eyes,” identifying problems that may have been overlooked by the Primary Investigator. Additionally, statistical consultants can propose alternative approaches that can vastly improve the power and quality of your analysis.
Describe the Input Data
Always use the Field Label option to describe the data you intend to capture in a data capture field. Field Labels ensure that end-users understand the data and the units of measurement, as well as the format of each field. Use Field Notes mainly to supplement the Field Label. For example, Field Notes can indicate the format of a validated data collection field or the units of measurement to the end-user.
Keep a Code Book
Create a code book that describes each variable by name according to the following criteria:
- type of data – numeric, date/time, character
- units of measurement – grams, feet, micro-grams per deciliter
- purpose of collecting the data and its relationship to other data
Tip: Use the REDCap data dictionary as a starting point for your code book.
Use the REDCap Identifiers function
Refer to “Storing Personally Identifying Information in REDCap Projects” in the University of Toronto REDCap Storage Best Practices companion document for guidance on using REDCap’s identifiers to store personally identifying information in the REDCap database.
Avoid Numbering Fields
Do not use integers to number texts within Field Labels. For example, avoid:
“1. When did you receive your diagnosis?”
Manually numbering fields will conflict with instances of branching logic. Additionally, moving or deleting fields will disrupt the numbering sequence, which will ruin the format of your survey.
Use Branching Logic to check data
- Consider using branching logic to reduce the amount of time spent verifying missing data.
- Only use branching logic if you have a thorough understanding of its functionality.
Data Management Plans
A data management plan or DMP is a formal research-oriented document that provides a layout of the method that the data will be managed before, during, and after a research project is completed.
A current global trend is the requirement of Data Management Plans (DMP) to accompany a researcher’s funding application by government funding agencies. A DMP will organize the data collected within your research project and proactively address the following questions:
- What type of data will you collect, create, link to, acquire and/or record?
- What file formats will your data be collected in? Will these formats allows for data reuse, sharing and long-term access to the data?
- What conventions and procedures will you use to structure, name and version-control your files to help you and other better understand how your data are organized?
- What documentation will be needed for the data to be read and interpreted correctly in the future?
- How will you make sure that documentation is created or captured consistently throughout your project?
- If you are using a metadata standard and/or tools to document and describe your data, please provide a list.
- How and where will your data be stored and backed up during your research project?
- Where will you deposit your data for long-term preservation and access at the end of your research project?
- What data will you be sharing and in what form? (E.g. Raw, Processed, analyzed, final).
- How will responsibilities for managing data activities be handled if substantive changes happen in the personnel overseeing the project’s data, including a change of principal investigator?
- If your research project includes sensitive data, how will you ensure that it is securely managed and accessible only to approved members of the project?
Portage Data Management Plans
Portage is a national, library-based research data management network that coalesces initiatives in research data management to build capacity and to coordinate activities. The aim of Portage is to coordinate and expand existing expertise, services, and infrastructure so that all academic researchers in Canada have access to the support they need for research data management.
Portage is centered around two major components:
- Network of Expertise:
- Research Data Management (RDM) requires specialized knowledge and expertise, which is often missing within institutions
- Infrastructure Platforms:
- Portage is working to connect the various infrastructure and service components needed for data management planning and a national preservation and discovery network
Portage includes a data management plan assistant , a bilingual tool for preparing data management plans (DMPs). The tool follows best practices in data stewardship and walks researchers step-by-step through key questions about their data management. You can access the website and make your own account here: https://assistant.portagenetwork.ca/?locale=en . A template of data management plan from Portage can be found here: Data Management Plan Template
University of Toronto Resources for Best Practices in Data Management for Researchers
Refer to the following University of Toronto resources for non-REDCap specific best practices related to data management:
|Best Practice||For Details, See…|
|Create documentation and metadata for your datasets.||https://onesearch.library.utoronto.ca/researchdata/documentation-metadata|
|Use file formats that ensure long-term access.||https://onesearch.library.utoronto.ca/researchdata/file-formats-long-term-access|
|Use descriptive file names.||https://onesearch.library.utoronto.ca/researchdata/file-management|
|Handle sensitive data appropriately.||https://onesearch.library.utoronto.ca/researchdata/sensitive-data|
Capturing Consent Information for REDCap
If you plan to archive, reuse, or share research data, you must obtain permission from the participants that take part in your studies.Informed consent documentation, typically has two sections you need to consider with regards to data archiving, reuse, and/or sharing:
- Information Sheet: Describe clearly who has access to the data during and after the project
- Consent Form (signature page): Offer clear choices to participants on whether they agree with archiving and reuse of the data from the project. Note that a participant can opt out of these activities, but still participate in your study.
If relevant, use a form to capture consent information. Capture information such as:
- whether the subject consented
- who received or witnessed the subject’s consent
- the date the subject signed the consent form
- whether the subject received a signed copy of the consent form
Tip: Upload the signed consent form.
Refer to University of Toronto resources on consent forms, including sample forms, available at: https://onesearch.library.utoronto.ca/researchdata/consent-forms.
REDCap Data Collection
Reduce the use of Free-text fields
Minimize the use of free text fields, because these can be difficult to analyze. Use categorical response field types (such as drop-down lists, radio buttons, and check-boxes) instead of free text fields (such as text boxes and notes boxes).
Tip: Using multiple choice field types will improve your data analysis. You can augment a categorical response with a text box or notes box to capture additional information.
Do not Mix Data Types
It is possible to mix data types in data entry fields. However, it is not recommended. For example, a researcher may enter a numerical code followed by a comment on the code such as “147 Patients had a cold.” Both the code and the comment may be informative, but they should be placed in separate data entry fields (with data validation where applicable).
Use Data Validation in most cases
When using text box fields, use validation types and set minimum and/or maximum values as much as possible for better data accuracy. Always use date validation for dates.
For inexact dates, enter day, month, and year separately
Most people can tell you their date of birth. However, very few people can tell you the date they first noticed a specific symptom of a disease. However, people may be able to tell you the month and year they noticed symptoms. For this type of data, consider entering the month, day and year in separate columns. For example, a patient may not be able to tell you when he had measles as a child. However, he may be able to limit the range of dates to March of 1986. You can enter this into a database as:
While the day is missing, the other parts of the date may be useful.
Designate Units of Measurement
When creating data-collection instrument fields:
- Clearly identify units of measurement for results such as height and weight.
- Avoid abbreviations of units of measurement.
- Never mix different units in one data entry field. For example, use REDCap Field Notes to ensure everyone knows whether height is measured in feet, inches or meters. In addition to Field Notes, which are displayed on the form, use REDCap Field Units to record the unit of measurement. Although Field Units are not displayed on the forms, they are displayed in the data dictionary, and they are useful for describing each variable for later data analysis (see” Keeping a code book” above).
Use Standard measures and codes
Use existing standard measures instead of developing measures from scratch. Using standard measures enables your findings to be compared meaningfully with those of others, and this enables you to reuse your data-sets later. Use the REDCap Shared Library as one resource for standard instruments.
Important: Do not modify these instruments. Otherwise, they will no longer be validated or comparable. For demographics and health status, consider using instruments from national agencies. If you must develop new metrics based on questionnaires or scales, consider consulting with a psychometrician to ensure these metrics are reasonably validated.
Investigators also increasingly recognize the benefits of incorporating standard representations of data such as laboratory values (like hemoglobin A1c or glucose), diseases (such as sarcoidosis), and symptoms and findings (such as shortness of breath). Incorporating standards in your data-sets will greatly improve their re-usability in the future, making it easier for you to collaborate with others and enabling you to contribute your data to local and national repositories.
|To Record…||Consider Using|
|Sex and Gender||guidelines from Statistics Canada, which is changing the census question on this: https://www12.statcan.gc.ca/census-recensement/2021/road2021-chemin2021/fs-fi/sex-and-gender.cfm|
|Race and Ethnicity||Guidelines from:Canadian Institutes of Health Research (CIHR)|
National Institute of Health (NIH)
|Laboratory values||Logical Observation Identifiers Names and Codes (LOINC)|
|Diseases, Symptoms and Findings||Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-‐CT )International Classification of Diseases, Ninth Revision (ICD-‐9)|
Consistent Numerical Codes
Use numerical codes consistently throughout your survey. For example, if in one response, unknown is coded as 99, unknown should be coded as 99 throughout the database.
Note: Generally, Yes = 1 and No = 0. The numerical code does not affect the display order of choices in the REDCap data entry form.
Avoid missing values
There may be several different reasons that values may be missing, and those reasons may be relevant to your study. Leaving a value blank is ambiguous. A researcher who decide to leave a blank in a data-set, may unknowingly throw away a chance to collect useful data. For example, a person may have a missing value for a thyroid scan. This value may be missing for one of the following reasons:
- The subject still needs to take the test.
- The subject forgot to take the test.
- You forgot to record the value.
- The person does not have a thyroid gland.
Consider whether Yes or No are the only choices for a given question. For example, a simple Yes/No answer may not suffice for questions such as, “Have you ever had disease X?”. Other answers to consider may be:
- I don’t know.
- I was tested for that, but I do not remember the results.
- I do not want to answer this question.
If data are missing or unknown, you can either:
- Include reasons in your categorical responses.
- Mark the data as missing, and include a text box to record the reason for the missing value.
Group related variables on short forms
Form Names correspond to individual data entry web pages. Forms are groupings of variables within the database. Put variables collected together on the same form to improve data entry workflow. For example, putting demographics together and labs together on separate forms makes data entry more reliable. Keep forms short to minimize risk of data loss (by saving more often when completing a form). If possible, group field types that minimize changing from keyboard to mouse.
Managing REDCap Project Status
- Before changing your REDCap project status from Development to Production:
- Sufficiently test your project.
- Obtain an independent review of your project.
- Test branching logic and calculations.
- Change your REDCap project status from Development to Production before starting data collection.
- After changing your REDCap project status from Development to Production. You are ready to start data collection.
- Do not renumber your response options. Enumeration changes can corrupt data.
- Do not rename variables, because this can cause data loss.
- Do not delete records or events, because this can cause data loss.
Important: You cannot test branching logic and calculations after changing your project to Production status.
There are multiple levels of auditing on REDCap. The table below describes three reoccurring auditing scenarios in REDCap.
|REDCap administrators||REDCap user logsREDCap project logs||REDCap administrators can view the audit logs for all projects on the server. Additionally, they have access to all information about user logins, inputs of project data, changes, and deletions of projects.|
|REDCap project owners or delegates||REDCap project logs||By default, the project owner has this user right, and the project owner can grant this right to other users in the project. The project owner has access to information about input of project data, changes, and deletions.|
|Operational and Security IT staff at the University of Toronto||REDCap server-level logs||Operational and Security IT staff have access to the information about network traffic, such as: login credentials, login origin, and duration. These activities contribute to a responsive and performing instance of REDCap. They have no access to information about participants or responses within REDCap.|