Final-Year-Project

Authors

Abhik Banerjee.
Agniswar Roy.

Problem Statement

miRNA Target Prediction by adapting it to Next Sentance Prediction and assessing the performance of SOTA NLP architectures on the problem.

Dataset

We are using the MBSTAR dataset which contains the human miRNA with the seed sequence from mRNA along with a score. This dataset is available at this link

The dataset has been divided into 2 parts for MLM and NSP tasks. The MLM partition is made larger with some of the sequences from NSP tasks also being available in this dataset. This will be used for training separate AlBERT encoders on miRNA and mRNA sequence. The NSP downstream task will contain a custom NSP head where the output from these encoders will be fed. The NSP partition also has some sequences to test out the understanding of the encoders after the pre-training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Final-Year-Project

Authors

Problem Statement

Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Final-Year-Project

Authors

Problem Statement

Dataset