Datatracker

Datatracker is a Python package designed for logging and tracking files and code within a project. It logs each script as an entry and records input and output data files in sequence. Datatracker also manages versioning of files and scripts, identifying the most recent version.

Installation

Install Datatracker using pip:

pip install -U datatracker

Or, install directly from Github:

pip install git+ssh://git@github.com/TarjinderSingh/datatracker

Learning from Templates

To understand how to use Datatracker, refer to the template scripts in this repository. These scripts demonstrate the various features of Datatracker. For instance, view the header.py scriptt here.

Syntax and explanation

New entries

Each script you run is an "entry". An entry requires a tag, description, category, and module.

tag is a unique identifier to the script in question and should be clear what the general purpose and output of the script is. (ie Merge is not what we want to see here)
description needs to be one or two sentences equivalent of the Git commit message that thoroughly describes the general purpose and output of the script.
category indicates the general step of analysis the script belongs to.
module is the sub-category for which the script belongs to. Type category_template in interactive Python for an idea of the appropriate categories and modules are.

Within an entry, you define InputFiles and OutputFiles:

tag is a unique identifier to the File in question and should be clear what the general purpose and output of the script is. (ie Merge is not what we want to see here).
description for a file is a one or two sentences equivalent of the Git commit message that thoroughly describe the general purposes of the File at hand.

from datatracker import *
tr = Tracker()

os.environ['VERSION'] = '0.1.0'

entry = Entry(tag='filter-common-variants',
              description='Filtering common variants in new GWAS data set.',
              category='Processing',
              module='Variant QC')

infile = entry.add(
    InputFile(tag='raw-plink-file',
              path='gs://bucket/raw-plink-file.bed',
              description='Raw PLINK file.'))


outfile = entry.add(
    OutputFile(tag='filt-plink-file',
               path='gs://bucket/raw-plink-file.bed',
               description='Filtered PLINK file.'))

tr.save(entry)

Viewing Existing Entries=

from datatracker import *
tr = Tracker()

tr.table

Using Existing Entries for Pipeline

infile = entry.add(InputFile(entry_tag='filter-common-variants', tag='raw-plink-file', database=tr))

Filtering and Removing Entries

# filter to entry
tr.filter(tr.entry.tag_version == 'import-array_0.1.6')

# remove entry
tr.remove(tr.entry.tag_version == 'import-array_0.1.6')

Exporting to Pandas and Excel

df = tr.explode()
df = tr.explode('filt-plink-file')

df = tr.to_pandas()
df = tr.table

df.to_excel('spreadsheet.xlsx')

Data artifacts

infile = entry.add(InputFile(path='gs://checkpoint-cache/tmp/1.bed'))

License

MIT License (see repository)

Maintainer

TJ Singh @ CUIMC

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
datatracker		datatracker
template		template
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
db.json		db.json
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datatracker

Installation

Learning from Templates

Syntax and explanation

New entries

Viewing Existing Entries=

Using Existing Entries for Pipeline

Filtering and Removing Entries

Exporting to Pandas and Excel

Data artifacts

License

Maintainer

About

Releases

Packages

Languages

License

TarjinderSingh/datatracker

Folders and files

Latest commit

History

Repository files navigation

Datatracker

Installation

Learning from Templates

Syntax and explanation

New entries

Viewing Existing Entries=

Using Existing Entries for Pipeline

Filtering and Removing Entries

Exporting to Pandas and Excel

Data artifacts

License

Maintainer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages