Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A list of things to do #57

Open
cmatKhan opened this issue Sep 1, 2023 · 0 comments
Open

A list of things to do #57

cmatKhan opened this issue Sep 1, 2023 · 0 comments

Comments

@cmatKhan
Copy link
Member

cmatKhan commented Sep 1, 2023

  1. Create a test interval tree object that can be used to develop downstream processes without waiting for the actual interval tree implementation

  2. implement a interval tree constructor which takes the n GTF and n fasta, and also the reference genome that was used to create these transriptomes

    1. maybe the reference genome should be optional -- don't know what the landscape is like in terms of reference guided vs reference free methods for long read RNAseq
  3. Create something like the current IsoformLibrary that takes the interval tree and the fasta files and can extract "clusters" and sequences (not sure if this will be useful or not, but i think it would be)

  4. Write a method which classifies coordinate mismatches at the transcript level -- this will take some thinking to come up with classifications and definitions of those classifications. A single tx might have multiple labels, too

    1. There are a lot of places we can reference for this -- the best i can think of is the gffCompare docs. They define these categories
  5. A "identical transcript" (suitable for pairwise-alignment) should be defined something like as follows: a Transcript where every exon overlaps by a user defined amount (eg, 95%)

  6. It is these identical transcripts where the sequence comparison should happen. BUT that sequence comparison should exclusively be over places where two exons overlap. There should never be a time that we are aligning across splice sites, for instance

  7. figure out how to report all of this information -- there will likely be multiple outputs. This requires thinking about users and what they want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant