Learning Objectives
Following this assignment students should be able to:
- understand the basic rules of tidy data
- implement quality control for data entry in spreadsheets
- create an SQL database by importing data
Reading
Data Organization
Quality Assurance and Control (choose one)
- Webpage: Quality Assurance and Control in Excel
- Video (11 min): Data Validation in Excel
Lecture Notes
- Course Introduction
- Demo Code for Where Students Can Get in the Course
- Accessing Excel Online at UF
- Tidy Data
- Data Entry
Exercises
Improving Messy Data (50 pts)
A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.
Download an untidy version of some of the Portal Project data, which includes information on the site, date, species identification, weight and sampling plot (within the site) for some small mammals.
Think about what could be improved about this data and write down answers to the following questions:
-
Describe five things about this data that are not tidy and how you could fix each of those issues.
-
Could this data easily be imported into a programming language or a database in its current form?
-
Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?
-
Data entry validation in Excel (50 pts)
Create a spreadsheet in Excel for data entry. It should have five columns: Date, Site, Species, Mass, and Length.
Set the following data validation criteria to prevent invalid data from getting entered:
- The Date column should be set so that it doesn’t convert dates to other formats.
- Use data validation so that Site can only be one of the following
A1
,A2
,B1
,B2
. Set the error message on this validation criteria to provide information on what the valid values are. - Use data validation so that Species can only be one of the following
Dipodomys spectabilis
,Dipodomys ordii
,Dipodomys merriami
. Set the error message on this validation criteria to provide information on what the valid values are. - Use data validation so that Mass can only be a decimal greater than or equal to zero but less than or equal to 500. Set the error message on this validation criteria to provide information on what the valid values are.
- Length should be an integer (i.e., a whole number) between 1 and 10. Set the error message on this validation criteria to provide information on what the valid values are.
Check that the validation rules and data formating are working, but do not include any entered data in the final file.
Save this file as
Expected outputs for Data entry validation in Excel:data_entry_form.xlsx
.Clean Up Untidy Data (optional)
A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.
Download an untidy version of some of the Portal Project data, which includes information on the data, species identification, weight and sampling plot for some small mammals.
Convert the data into a more tidy format.
Expected outputs for Clean Up Untidy Data: