Reproducible Quantitative Methods
Part 1: Data
How to handle your data to make your work more efficient and reproducible, how to handle common problems with data coming from other sources
Week 1 - Introduction to reproducibility and open science frameworks
Week 2 - Best practices for spreadsheets/ Learning to use data produced by others
Week 3 - Introduction to Metadata / Data and scientific authorship
Week 4 - Cleaning up messy data / Identifying 'grey' data sources
Part 2: Analysis
Applying reproducibility principles to common statistical and visualization approaches.
Week 5 - Intro to scripting in R/ Version control in R with Github
Week 6 - Programming in R / Licensing data and software for reuse
Week 7 - Programming in R, continued /Authorship and citation practices for non-manuscript research products
Week 8 - Student-directed processing and analysis of project data /Data and code sharing challenges
Part 3: Communication
Using technology to make our work accessable to others and to work better together.
Week 9 - Making better plots / Visualization for outreach and communication
Week 10 - Project workshop time, Github for project management / Scientific publication and accessibility
Week 11 - Project workshop time
Week 12 - Project workshop time / Scientific collaboration
Part 4: Opening Your Work
Inviting the world to contribute to the scientific enterprise
Week 13 - Project workshop time / Participatory models for bigger science
Week 14 - Preparing a paper for publication / The future of open science and reproducible research
The RQM Course
Almost every graduate student has a “Now what?” moment during their thesis, and this moment often occurs after a student has collected data and now has to analyse it. Additionally, new (or newly enforced) requirements from federal funders are holding our scientific outputs, including data and code, to more rigorous reproducibility standards, but has offered little guidance on how individual labs and research projects should change their workflows.
Because of poor quality in many data sources, data scientists estimate they spend up to 80% of their time ‘data munging’- that is, cleaning, quality checking, and documenting data that they’re trying to use for their insights. The reality is, most data producers (a group which includes most experimental scientists) do not have specific training in data handling. This leads to decision paralysis, inefficiency, and the potential for incredible losses of information at the interface between observations and analysis- and takes the joy out of data-driven discovery. Training initiatives that address these issues are in high demand- workshops for Software Carpentry and Data Carpentry -organizations that offer workshops to train scientists in efficient software and data science skills- are usually at capacity and waitlisted within days of initial advertising.
This course directly builds on the principles laid out in Software Carpentry and Data Carpentry workshops, but provides students with a more immersive, long term experience in the form of a project-based learning approach. Project-based learning hybridizes a traditional lecture with a student-led working group, which allows the course to be effectively customized to directly apply the principles to real data and real problems. We provide the added incentive of including the students on a publication resulting from their work- giving them concrete training in applying these skills in a way that is relevant to their field. The course takes a two-pronged approach- approximately 2/3 of class time is given to applied tools training using a project data set, and the remaining 1/3 of class time is used to discuss the more philosophical aspects of modern, technologically- enabled science (e.g. how do we handle authorship on manuscripts supported by data compiled from a variety of sources? Is software a research product?).
How to use this guide
You: a data enthusiast with a love of teaching! This guide is intended for use by scientists interested in leading students through a project-driven data science adventure. The main thing you need is some familiarity with open workflows and some form of project data. Please contact me if you're interested in teaching. I'm happy to help you work through the bugs.If you find typos, broken links, or want to suggest changes, please submit an issue or pull request to this repo. A good starting point is to fork that repo and make this course your own. You can also use this guide as a template for a student-facing webpage that will serve as a syllabus and guide for the people on the other side of the classroom.
About this guide
This guide was created by Christie Bahlai, a computational ecologist at Kent State University, while supported by a fellowship from the Mozilla Science Lab.
This work is licensed under a Creative Commons Attribution 4.0 International License.