Skip to content

Project Data Organization

Gabriel A. Devenyi edited this page Apr 26, 2019 · 2 revisions

Documentation Day comes but twice(ish) a year, but documenting never ends!

Below is the ratified CoBrA Lab Documentation Protocol. Happy documenting!

Project Data Organization

The purpose of this document is to describe the minimum organization and documentation expected of all CoBrALab members for research projects they are undertaking.

Top-level directory: <Recognizable name of project>

  • something that others would recognize if at all possible
  • no spaces

Notes: Use relative paths in scripts as opposed to absolute, so that the entire project folder can be moved elsewhere and still operate

Contents

README

  • describes purpose of project
  • contact information of any collaborators
  • specific organizational details of directory structure not covered by this document
  • chronological log of major events in the project

raw_data/

  • README per data type
    • Describe specific naming conventions for the project of raw data as well as how it is organized
    • Describe what data is present
    • A log of any unexpected events or deviations from proper data collection procedures for specific measurements or subjects
  • Some kind of quality control document
    • CSVs or similar corresponding to the quality control of the raw data
    • Should contain notes as to why data is excluded

preprocessed/

  • Data from raw_data/ preprocessed or otherwise transformed into more usable form
  • README per data type
    • Document what actions were taken
    • E.g. for behavioural data explain manual reformatting
    • Document renaming or such
    • Okay to exclude raw data QC failures

derivatives/

  • Contains processing pipelines/results from things such as MAGeT/CIVET
  • README per processing type
    • Document commands used
    • Version of pipeline (date downloaded for MAGeT)
    • Inputs used for pipeline
      • Which files (what kind)
      • Which atlases, etc
  • Quality control output per data type
    • CSV or equivalent
    • Notes describing failure mode

analysis/

  • README describing the general methods of analysis
    • Versions of software (modules) used
  • R or python scripts (or other)
    • Scripts must run from beginning to end to reproduce figures from scratch
  • Keep old scripts and name with date
    • Don't delete old attempts at analysis if not fruitful

paper/

  • Copy of submitted paper version
  • Submission letter
  • BibTeX export of references used in paper submission
  • Copy of referee reports
  • Replies to referees
  • Copy of resubmit

paper/figures/

  • Figure files submitted with paper
  • README describing any manual work undertaken on figured generated from scripts (changes made, merging done, etc)

For collaborator projects communications/

  • Print to PDF copies of emails between you and the collaborator going over project details
Clone this wiki locally