Figure 2
Composition of the Assemblage Genome Sequences as Determined by Similarity to Known DNA and Protein Sequences
(A) The percent of “known” sequences compared to the SEED and environmental databases. A sequence was considered “known” if it had a significant similarity (E < 10−5) to the SEED, else “environmental” if it had a similarity to any environmental database, and else “unknown”.
(B) Breakdown of the “known” sequences into viral (both eukaryotic and bacteriophages), prophage, Bacteria, Archaea, or Eukarya.