Improving Messy Data (Tidy Data)
A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.
Download an untidy version of some of the Portal Project data, which includes information on the site, date, species identification, weight and sampling plot (within the site) for some small mammals.
Think about what could be improved about this data and write down answers to the following questions:
-
Describe five things about this data that are not tidy and how you could fix each of those issues.
-
Could this data easily be imported into a programming language or a database in its current form?
-
Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?