New! May 2023 Release of Stand-Alone PGAP

New! May 2023 Release of Stand-Alone PGAP

We are happy to announce the release of a new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP) with many exciting new features.

Improved user interface

This version has an improved user interface that takes the genome FASTA file and associated organism name directly on the command line. For example, to annotate a Vibrio cholerae genome sequence in the file Vchol.fasta:

pgap.py -r -g Vchol.fasta -s 'Vibrio cholerae' -o Vchol.annot

For more details visit our Quick Start page.

Additional output files for better interoperability

In addition to the GFF, GenBank, and protein FASTA annotation files that PGAP has always produced, it now provides:

  • annot_cds_from_genomic.fna: nucleotide sequences in FASTA format of all coding sequence (CDS) features annotated on the assembly, based on the genome sequence.
  • annot_translated_cds.faa: protein sequences in FASTA format of CDS features annotated on the genomic records. The sequences are the conceptual translation of the nucleotide sequence provided in the annot_cds_from_genomic.fna.gz file.
  • annot_with_genomic_fasta.gff: annotation in GFF format followed by the ## FASTA pragma and the genomic sequence(s) in FASTA format. This makes the file directly useable by Roary.
More Gene Ontology (GO) terms in the annotation

PGAP assigns function to predicted proteins based on hits to Protein Family Models, such as protein profile HMMs, Blast hits, and domain architectures. New in this release, GO terms and Enzyme Commision (EC) numbers associated with domain architectures are inherited by the annotated proteins. On average, 50% of proteins annotated on a genome are annotated with at least one GO term.

And, as every previous release, this release comes with incremental improvements by expert curators of the Protein Family Model collection that drives the precision of PGAP’s structural and functional annotation.

Stay up to date

Follow us on Twitter @NCBI and join our mailing list to keep up to date with PGAP and other NCBI news.  

We want to hear from you!

Please try this new version and share your experience with us.

One thought on “New! May 2023 Release of Stand-Alone PGAP

  1. That’s great news! The new version of the stand-alone Prokaryotic Genome Annotation Pipeline (PGAP) seems to have many exciting new features that will make it easier for users to annotate their genomes. I’m particularly impressed with the improved user interface that takes the genome FASTA file and associated organism name directly on the command line. This will definitely save users time and effort.

    I also appreciate the additional output files for better interoperability. The nucleotide sequences in FASTA format of all coding sequence (CDS) features annotated on the assembly, based on the genome sequence, and protein sequences in FASTA format of CDS features annotated on the genomic records are very useful.

    It’s great to see that PGAP assigns function to predicted proteins based on hits to Protein Family Models, such as protein profile HMMs, Blast hits, and domain architectures. The fact that GO terms and Enzyme Commision (EC) numbers associated with domain architectures are inherited by the annotated proteins is also impressive.

    Overall, it seems like this new release comes with incremental improvements by expert curators of the Protein Family Model collection that drives the precision of PGAP’s s. I’m excited to see what else PGAP has in store for us in the future!

    Regards,

    Michael Wichkoski

Leave a Reply