JCcirc v1.0.0 (circRNA assembler through integrated junction contigs) is a computational tool that utilizes both back-splice junction (BSJ) and junction contig (JC) features to reconstruct full-length sequences of circular RNAs from RNA-seq datasets. JCcirc integrates junction reads and junction contigs for the assembly of all circRNAs. The BSJ feature is employed to accurately determine the boundaries of circRNAs, while the JC feature acts as an extension of junction reads, exhibiting superior performance in assembling circRNAs with low expression levels.
Workflow of JCcirc
JCcirc is implemented in Perl under Linux system.
A de novo transcript assembler (one of them)
Aligner
JCcirc works with six input files. A GTF annotation file, pair-end RNA-seq data, a contig file was generated by de novo assembler, a genome sequence file, and a circRNA junction list.
Contig file can be obtained by de novo transcript assemblers
Trinity (trinity/inchworm.DS.fa)
SPAdes (spades/K31/transcripts.fasta)
SOAPdenovo-Trans (SOAP/soap.contig)
CircRNA lists containing circRNA location (chromosome, start, end), host gene, strand, junction reads ID.
Command:
perl JCcirc.pl -C circ -G genome -F annotation -O out_dir -P 8 --read1 read_1.fq --read2 read_2.fq --contig contig.fa -D 0
Arguments:
-C, --circ
input circRNA file, which includes chromosome, start site, end site, host gene, strand, and junction reads ID (required).
-O, --output
directory of output (required).
-G, --genome
FASTA file of all reference sequences. Please make sure this file is
the same one provided to the prediction tool (required).
-F, --annotation
gene annotation file in gtf format. Please make sure this file is
the same one provided to the prediction tool.
-P, --thread
set number of threads for parallel running (required).
--read1
RNA-Seq data, read_1 paired-end, fastq format).
--read2
RNA-Seq data, read_2 paired-end, fastq format).
--contig
contig sequences (required).
-D, --difference
the difference in support numbers between adjacent fragments when generating circRNA isoforms, default is 0 (recommend setting to 0, 1, or 2, the larger number means stricter).
-H, --help
show this help information.
- The RNA-seq data should be paired-end, and the same file when running de novo assembly.
- The GTF annotation file should be the same one when running JCcirc and its upstream software.
- Parameter difference|D recommend setting to 0, 1, or 2. If the intron length in the genome is short, set a large D value. For example, human data can be set to 0, and plant data can be set to 2.
- Two columns of fragment_final.txt (split by tabs)
(1) circRNA location
(2) Location of circRNA fragments on genome
- circ_full_seq.fa is the assembly result of circRNA full-length sequences.
Please contact Jingjing Zhang (zhangjj@siat.ac.cn) for questions and comments.