Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 27;43(13):6191-206.
doi: 10.1093/nar/gkv587. Epub 2015 Jun 3.

Sequence-independent characterization of viruses based on the pattern of viral small RNAs produced by the host

Affiliations

Sequence-independent characterization of viruses based on the pattern of viral small RNAs produced by the host

Eric Roberto Guimarães Rocha Aguiar et al. Nucleic Acids Res. .

Erratum in

Abstract

Virus surveillance in vector insects is potentially of great benefit to public health. Large-scale sequencing of small and long RNAs has previously been used to detect viruses, but without any formal comparison of different strategies. Furthermore, the identification of viral sequences largely depends on similarity searches against reference databases. Here, we developed a sequence-independent strategy based on virus-derived small RNAs produced by the host response, such as the RNA interference pathway. In insects, we compared sequences of small and long RNAs, demonstrating that viral sequences are enriched in the small RNA fraction. We also noted that the small RNA size profile is a unique signature for each virus and can be used to identify novel viral sequences without known relatives in reference databases. Using this strategy, we characterized six novel viruses in the viromes of laboratory fruit flies and wild populations of two insect vectors: mosquitoes and sandflies. We also show that the small RNA profile could be used to infer viral tropism for ovaries among other aspects of virus biology. Additionally, our results suggest that virus detection utilizing small RNAs can also be applied to vertebrates, although not as efficiently as to plants and insects.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the pipeline for virus detection based on long and small RNAs. Different RNA fractions were utilized for the construction of small and long RNA libraries. Sequenced reads were processed to enrich for potential virus sequences. Processed reads were then utilized for contig assembly and extension. Contigs were characterized using both sequence-based and pattern-based strategies. Viral contigs were further validated by RT-PCR and Sanger sequencing. See text for details.
Figure 2.
Figure 2.
Small RNA sequencing identifies viral sequences more efficiently than long RNAs. (A) Comparison of number of contigs and size of largest contig in each small RNA library using different size ranges of small RNAs in the assembly step. (B) Proportion of contigs assembled in each library with significant similarity to reference sequences. The origin of contigs is classified by taxon and includes unknown sequences. (C) Size distribution of viral (red), non-viral (blue) and unknown contigs (grey) for each library. P-values for the difference between viral and non-viral contig sizes are indicated (Student t-test). (D) Viral RNA sequences were detected by RT-PCR from total RNA extracted from three separate pools of Drosophila, Aedes and Lutzomyia populations. Sanger sequencing of PCR products showed high identity to the sequence determined by our metagenomics approach as shown in the right column (not done; nd). (E) Comparison of processing time, number of contigs and frequency distribution of contig sizes for small and long RNA libraries shown in grey and black, respectively. (F) Coverage of PCLV and HTV genome segments by contigs assembled in each small and long RNA libraries from mosquitoes. Biological replicate samples are shown in blue, green and red.
Figure 3.
Figure 3.
Small RNA size profile can classify uncharacterized viral contigs. (A) Small RNA size profile of previously characterized virus segments identified by sequence similarity searches. Blue and red represent small RNAs in the positive and negative strands, respectively. (B) Hierarchical clustering of viral contig sequences assembled in fruit fly, mosquito and sandfly libraries. Clustering was based on Pearson correlation of small RNA size profile shown as a heatmap. Clusters with more than one contig are indicated on the left vertical bar and numbered according to the order in which they appear from top to bottom. Clusters were defined by Pearson correlation above 0.8. (C) Contig Aae.92 and the segment corresponding to the HTV RdRP that grouped together by similarity of the small RNA size profile in panel (B) show perfect correlation of expression in individual mosquitoes as determined by RT-PCR. Results are representative of 46 individual mosquitoes that were analysed. The endogenous gene Rpl32 was used as control for the RT-PCR.
Figure 4.
Figure 4.
Small RNA pattern-based analysis identifies viral contigs without known relatives in reference databases. (A) Hierarchical clustering of viral and unknown contig sequences assembled in fruit fly, mosquito and sandfly libraries. Clustering was based on Pearson correlation of the small RNA size profile shown as a heatmap. Clusters with more than one contig are indicated on the left vertical bar and numbered according to the order in which they appear from top to bottom. Clusters were defined by Pearson correlation above 0.8. (B) Detection by RT-PCR in two separate pools of sandflies shows that contig sequences in Clusters 2 and 17 mimic the expression of RdRP segments of LPRV1 or LPRV2, respectively. The same pools of Lutzomyia longipalpis (pool1 and pool3) analysed in Figure 2D were used.
Figure 5.
Figure 5.
The presence of virus-derived piRNAs with a ping-pong signature is indicative of ovary infection. (A) About 24–29 nt small RNAs derived from PCLV show a 10 nt overlap between sense and antisense strands and U enrichment at position 1 and A enrichment at position 10 consistent with piRNAs generated by the ping-pong amplification mechanism found in the insect germline. (B) Both PCLV and HTV are detected in individual mosquitoes but only PCLV is present in ovaries as determined by RT-PCR. Results are representative of eight ovaries of individual mosquitoes that were analysed. The endogenous gene Rpl32 was used as control for the RT-PCR.
Figure 6.
Figure 6.
Virus detection based on large-scale sequencing of small RNAs is applicable to animals and plants. (A) Percentage of contigs assembled from published small RNA libraries from insects, plants and vertebrate animals with significant similarity against reference sequences. The origin of contigs is classified by taxon and includes unknown sequences. (B) Size distribution of contigs corresponding to viral (red), non-viral (blue) or unknown sequences (grey) in each library. P-values for the difference between contig sizes are indicated (Student t-test). (C) Hypothetical genome organization of MNV based on ORF and small RNA analysis of contigs AaeS.81, AaeS.82 and AaeS.83 identified in this study. (D) Hierarchical clustering of viral and unknown contig sequences assembled in published libraries. Clustering was based on Pearson correlation of the small RNA size profile shown as a heatmap. A single cluster with more than one contig is indicated on the left vertical bar as defined by correlation above 0.8. A sub-cluster highlighted in red contains small RNA profiles of three contigs that show Pearson correlation above 0.998. (E) Coverage of SARS-CoV, EMCV, TuMV and SGIV genomes by contigs assembled in RNA libraries from mouse lungs, ES cells, Arabidopsis and fish GP cells, respectively. (F) Size distribution of contigs and raw sequenced reads derived from SARS-CoV in long (black) or small (grey) RNA libraries from infected mouse lungs. (G) Number of raw reads and contigs sequences derived from viruses in long and small RNA libraries prepared from SARS-CoV infected mouse lungs. The number above bars indicates the percentage of viral reads and contigs sequences relative to the total. Fold enrichment or depletion of virus sequences comparing contigs to raw reads is shown.

Similar articles

Cited by

References

    1. Edwards R.A., Rohwer F. Viral metagenomics. Nat. Rev. Microbiol. 2005;3:504–510. - PubMed
    1. Djikeng A., Kuzmickas R., Anderson N.G., Spiro D.J. Metagenomic analysis of RNA viruses in a fresh water lake. PLoS One. 2009;4:e7264. - PMC - PubMed
    1. Riesenfeld C.S., Schloss P.D., Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet. 2004;38:525–552. - PubMed
    1. Victoria J.G., Kapoor A., Dupuis K., Schnurr D.P., Delwart E.L. Rapid identification of known and new RNA viruses from animal tissues. PLoS Pathog. 2008;4:e1000163. - PMC - PubMed
    1. Willner D., Furlan M., Haynes M., Schmieder R., Angly F.E., Silva J., Tammadoni S., Nosrat B., Conrad D., Rohwer F. Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One. 2009;4:e7370. - PMC - PubMed

Publication types

-