SubmiRine

SubmiRine is an open-source software package for predicting microRNA target site variants (miR-TSVs) from clinical genomic data sets that measure miRNA expression, gene expression, and genotype. The main benefits of SubmiRine are that it allows for de novo prediction of miR-TSVs from custom data sets - such as those that can be generated from large-scale clinical genomics projects - and provides a methodology to prioritize predicted miR-TSVs by their relative probability of being functional. Thus, SubmiRine enables researchers to perform miR-TSV prediction efficiently and systematically on genome-scale data sets and narrow down the list of candidates to a manageable set for further validation.

SubmiRine contains two main modules: SubmiRine_Search and SubmiRine_Compare. The User Guide contains details on these modules and how to run them. In the standard workflow, two pre-processed data sets (representing the clinically relevant model and background model, respectively) are each run with SubmiRine_Search to predict all candidate miRNA target sites as well as candidate miR-TSVs, which are predicted based on differences in predicted target sites between alleles of the same 3'UTR. The input files required by SubmiRine_Search are two FASTA files: one with 3'UTR allele sequences of interest (with allele-specific expression included in each defline), and one with mature miRNA sequences of interest (with relative expression included in each defline). The output files produced by each run of SubmiRine_Search can then be fed into the SubmiRine_Compare module in order to compare the predicted miR-TSVs from the clinically relevant model against those predicted in the background model.

To enable prioritization of candidate miR-TSV predictions, SubmiRine_Compare produces an output file of miR-TSVs predicted in the clinically relevant model sorted by what we refer to as the SubmiRine SLP (Sum of Log-scaled Probabilities) score. As the name insinuates, the SLP score is the natural log-scaled product of six empirical probabilities computed for each candidate miR-TSV. These six empirical probabilities reflect the probability of a candidate miR-TSV having a particular scoring metric in a non-functional miR-TSV, as defined by the background model. Internally, SubmiRine utilizes an implementation of TargetScan6 context+ scores (Garcia et al. 2011) from the miRmap framework (Vejnar et al. 2012) to score individual miRNA target sites. Using these context+ scores, SubmiRine generates both a raw ("binary") and miRNA abundance-weighted ("empirical") score, and these scores are used as the metrics for computing the six empirical probabilities underlying the SLP score. Qualitatively, the six probabilites represent the predicted strength of each candidate miRNA target site, the magnitude of the variant's effect on the target site, and the availability of and competition for the miRNA predicted to regulate the underyling gene.

Please refer to the SubmiRine manuscript for more detail on the methodology:

Last Modified: Monday, 01-May-2017 15:49:01 EDT

-