Additional files: How are my short read and or fasta files supposed to be formatted or formulated?

INPUT

Input Additional Files: BAM and/or FASTA

This section is dedicated to explaining the additional files: short-read and/or fasta-reference.

Here I will show examples of how to format the additional files, most pertinent for our case, the short-read sequencing files and the fasta-reference.

1. Short-read sequencing:

Optionally, one can add short-read bulk RNA sequencing data and use it to confirm the existence of the splice site signals.
If this is desired, then the input file requires preprocessing to: align the raw sequencing reads; remove possible multi-alignment, supplementary reads, and secondary alignment. This preprocessing is similar to how the long-read sequencing input under the Fastq and the SAM subsection.
Ideally, the input file is a compressed BAM-file ([Short-read-Input.bam]) formated as following:

2. Fasta reference file:

Another additional option is to add the reference genome of your targeted organism for the detection of canonical GU/AG splice site dinucleotides.
My recommendation is to download the reference file from NCBI and have it in standard formating. Standard formating implies that the headers of the chromosomes/organisms are annotated through the use of the \> ("more than") symbol while the nucleotide sequences themselves are approximately ~108 nucleotides long (105-110 nt):
If you want to use multichromosomal organism or an amalgamation reference fasta file containing the reference sequence for multiple organisms, then the header needs to be formated as following:

Index

1. Input - Input sequences: How is my input supposed to be formatted/formulated?

2. Input - Input reference: How is my reference supposed to be formatted/formulated?

3. Input - Optional additions: How to format the optional short-read bulk RNA sequencing data and/or the reference genome in fasta format?

4. Output - Output files: How is the output supposed to be interpreted and used?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional files: How are my short read and or fasta files supposed to be formatted or formulated?

INPUT

Input Additional Files: BAM and/or FASTA

This section is dedicated to explaining the additional files: short-read and/or fasta-reference.

Index

Clone this wiki locally