rOpenSci #runconf17 : checker project

Conference : rOpenSci #runconf17 (70 attendees of invite only working on a range of #rstats projects together for two days in downtown LA, US)

I worked on a project called “Research compendium” to help automated checking of best practices in the analysis process. 

Why this project? After I attended and presented at the Alan Turing institute in June 2016 at the “Improving data analytics process” workshop the project has many synergies with what was discussed there as well as my experiences with the analytical process from academia and industry. If we can automate as many things in the data analysis process including checking best practises to drive improvements and automation of many areas where currently possible.

Checkers

Description : checkers is a framework for reviewing analysis projects. This package provides both automated checks for best practises as well as a descriptive guide for best practices.

Concept : checkers package runs automated tests, using extensions on the goodpractice package (which is a package used to check best practice for building R package). Also Review guide and framework for analysis best practice.

I worked on the framework and review guide for best practice.(readme and google doc)

We looked at the different areas of analysis process and came up with

  • Data
  • Script/Code (organization/structure)
  • Analysis Tasks
  • Package/Organisational
  • Visualisation/Reporting

We then discussed and describe the checks and best practices required. Then we created a framework for automation levels and tiers of best practise (see below) to aid prioritization of checker packages automatable and “must have” checks.

“Review checklist framework”

Automation Levels

  • Fully automatable: Can be checked automatically by checkers
  • Semi-automatable: Needs a human to provide commands on specific checks; can be done using custom implementations of checkers/goodpractice
  • Human-powered: Analyst uses guidelines to make sure analysis and report fit best practice for specific context

Tiers

  • Must have : These elements are required for reliable and trustworthy analyses.
  • Nice to have : These elements are recommended for best practice and reproducibility and should be strongly considered.
  • Recommended : These elements are ideal best practice.

Analysis Review Guide : Google doc in development

Image uploaded from iOS (8)Image uploaded from iOS (6)

Image uploaded from iOS (5)

Image uploaded from iOS

Image uploaded from iOS (4)Image uploaded from iOS (1)

Image uploaded from iOS (3)image-uploaded-from-ios-2.jpg

Advertisements