Conference : rOpenSci #runconf17 (70 attendees of invite only working on a range of #rstats projects together for two days in downtown LA, US)
I worked on a project called “Research compendium” to help automated checking of best practices in the analysis process.
Why this project? After I attended and presented at the Alan Turing institute in June 2016 at the “Improving data analytics process” workshop the project has many synergies with what was discussed there as well as my experiences with the analytical process from academia and industry. If we can automate as many things in the data analysis process including checking best practises to drive improvements and automation of many areas where currently possible.
Checkers
Description : checkers is a framework for reviewing analysis projects. This package provides both automated checks for best practises as well as a descriptive guide for best practices.
Concept : checkers package runs automated tests, using extensions on the goodpractice package (which is a package used to check best practice for building R package). Also Review guide and framework for analysis best practice.
I worked on the framework and review guide for best practice.(readme and google doc)
We looked at the different areas of analysis process and came up with
- Data
- Script/Code (organization/structure)
- Analysis Tasks
- Package/Organisational
- Visualisation/Reporting
We then discussed and describe the checks and best practices required. Then we created a framework for automation levels and tiers of best practise (see below) to aid prioritization of checker packages automatable and “must have” checks.
“Review checklist framework”
Automation Levels
- Fully automatable: Can be checked automatically by checkers
- Semi-automatable: Needs a human to provide commands on specific checks; can be done using custom implementations of checkers/goodpractice
- Human-powered: Analyst uses guidelines to make sure analysis and report fit best practice for specific context
Tiers
- Must have : These elements are required for reliable and trustworthy analyses.
- Nice to have : These elements are recommended for best practice and reproducibility and should be strongly considered.
- Recommended : These elements are ideal best practice.
Analysis Review Guide : Google doc in development
.@alice_data (& @nj_tierney) visualizing how we might make #rstats users happy with a research compendium #runconf17 pic.twitter.com/w5caJ3rUEk
— Jennifer Thompson (@jent103) May 25, 2017
Research compendium #runconf17 pic.twitter.com/IcNxfVQF1K
— Scott Chamberlain (@sckottie) May 25, 2017
Creating “Analysis Guide” test/checks to automate research workflow at #runconf17 with @jent103 @nj_tierney @mollyllewis @DeCiccoDonk pic.twitter.com/IEcHzjZFLb
— Alice Data (@alice_data) May 25, 2017
Amazing work progress being made for #researchcompendium what is automatable checks ❤️💙💚🖊at #runconf17 – check out https://t.co/6xoGMLraN2 pic.twitter.com/mRWLI0HG1u
— Alice Data (@alice_data) May 26, 2017
Offering gratitude to the white board for its service before it moves on to other research compendium duties #runconf17 pic.twitter.com/D0D2K6xC6n
— Jennifer Thompson (@jent103) May 26, 2017
Team #researchcompendium review our checker examples & best practise guidelines with @noamross @jent103 @mollyllewis @nj_tierney #runconf17 pic.twitter.com/GGvEtjbRbr
— Alice Data (@alice_data) May 26, 2017
.@alice_data (pausing mid-tweet) and @DeCiccoDonk presenting the checkers package #runconf17 pic.twitter.com/8vSy4Tcp1r
— Jennifer Thompson (@jent103) May 27, 2017
@alice_data presenting the checker package at #runconf17 pic.twitter.com/QUr2Nhw8yL
— Claudia Vitolo (@clavitolo) May 27, 2017
Check out “checker” framework for reviewing analysis of projects https://t.co/tlxlOczzcV #runconf17 pic.twitter.com/pwhyzOXYsj
— Alice Data (@alice_data) May 27, 2017