RefSeq Release 216

RefSeq Release 216

RefSeq release 216 is now available online, from the FTP site, and through NCBI’s new resource, Datasets.

This full release incorporates genomic, transcript, and protein data available as of January 9, 2023, and contains 342,395,932 records, including 249,868,639 proteins, 49,869,497 RNAs, and sequences from 128,299 organisms. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

Increase in size of ASN.1 files

As announced earlier, the size limit of uncompressed ASN.1 files was increased from 500Mbto 2Gb per ASN.1 file. The size of non-ASN.1 files will also increase. This change will reduce the total number of files in the release.

Plasmid sequences

The set of sequences included in the plasmid bin was revised to add in plasmids from WGS sequences.

Conserved Domains annotation

The delivery of Conserved Domains Database (CDD) features on RefSeq proteins has been improved, resulting in a large increase in the number of proteins with conserved domain annotation. See our previous post for more about the CDD.

New nomenclature for Eukaryotic Genome Annotations

Version 10.1 of the NCBI Eukaryotic Genome Annotation Pipeline was released on December 14, 2022. Starting with this release, annotations will be named after the assembly accession and date on which the annotation was started. For example, the name of the annotation for assembly GCF_016801865.2 as of December 2022, is GCF_016801865.2-RS_2022_12.

New eukaryotic genome annotations

This release includes new annotations generated by NCBI’s eukaryotic genome annotation pipeline for 33 species, including:

Update of prokaryote phylum names

As announced in November 2022, NCBI Taxonomy will begin to update phylum names for prokaryotes in January 2023. Informal phylum names in long use (e.g., Firmicutes, Proteobacteria) will change to newly formalized names (e.g. Bacillota, Pseudomonadota, respectively). This update affects over 40 NCBI TaxIDs at phylum rank. The rollout will take several weeks to complete. Note that the flatfiles in the next RefSeq release (March 2023) may contain a partial update of phylum names.

Leave a Reply