SingleCell

Single cell bioinformatics pipeline

                       _____ _             _       _____     _ _ 
                      / ____(_)           | |     / ____|   | | |
                     | (___  _ _ __    __ | | ___| |     ___| | |
                      \___ \| | '_ \ / _ \| |/ _ \ |    / _ \ | |
                      ____) | | | | | (_| | |  __/ |___|  __/ | |
                     |_____/|_|_| |_|\__, |_|\___|\_____\___|_|_|
                                      __/ |                      
                                     |___/   for  N E X T F L O W 
                                              
                     Support: jtremblay514@gmail.com
                   Home page: github.com/jtremblay/SingleCell

Background

This is a simple pipeline to process single cell sequencing libraries of the 10X genomics type. Briefly, each library has to be demultiplexed so that we have one set of R1 and R2 fastqs per library. Then reads 3' reads (i.e. .R2) are mapped (STAR aligner) against the corresponding reference genome. Reference genome has to be formatted before mapping the reads. For instance for the mouse genome, we'd have the following command:

module load STAR/2.7.10b
STAR \
    --runMode genomeGenerate \
    --runThreadN 8 \
    --genomeDir ./ \
    --genomeFastaFiles ../refdata-gex-mm10-2020-A/fasta/genome.fa \
    --sjdbGTFfile ../refdata-gex-mm10-2020-A/genes/genes.gtf

Then make sure that the singlecell.config file is all setup properly. For instance:

         raw_reads = "$projectDir/raw_reads/*_R{1,2}_001.fastq"
         outdir = "$projectDir/output/"
         ref_genome = "$INSTALL_HOME/databases/single_cell/refdata-gex-mm10-2020-A_alt"
         whitelist = "$INSTALL_HOME/databases/single_cell/3M-february-2018.txt"
         umi_length = 12
         # The next parameters are essentially for the generation of the .h5 (hdf5) file generated
         # at the end of the pipeline.
         gene_info = "$INSTALL_HOME/databases/single_cell/refdata-gex-mm10-2020-A_alt/geneInfo.tab"
         chemistry = "Chromium V3"
         genome_info = "gex-mm10-2020-A"

Execution

Nextflow can then be run:

module load nextflow/22.10.7.5853 java/17.0.2
nextflow run -c ./singlecell.config ./singlecell.nf -resume

Results

The results are located in the ./ouput/star/ folder. The important files are the .h5 files which contains the abundance of features (rows) x cells (columns). These files can be read for example using the Seurat::Read10X_h5() function (Seurat library).

Downstream analyses

I've included a scrnaseq Seurat workflow as a quick example on how to get started. Will progressively add more parts in the near future:

https://jtremblay.github.io/scrna_tutorial_part_1.html

https://jtremblay.github.io/scrna_tutorial_part_2.html

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
modules		modules
README.md		README.md
singlecell.config		singlecell.config
singlecell.nf		singlecell.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SingleCell

Background

Execution

Results

Downstream analyses

About

Releases

Packages

Languages

jtremblay/SingleCell

Folders and files

Latest commit

History

Repository files navigation

SingleCell

Background

Execution

Results

Downstream analyses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages