Tag: RefSeq

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes

Are you interested in more functional information about protein-coding genes? We’ve expanded NCBI RefSeq’s Eukaryote Genome Annotation Pipeline (EGAP) to include Gene Ontology (GO) terms computed for most protein-coding genes. We are using the latest version of InterProScan, which now includes analysis based on PANTHER reference trees, on all NCBI RefSeq eukaryotic genomes. That means having comprehensive GO data with inferred biological process, molecular function, and cellular component terms matched with high-quality RefSeq annotations across hundreds of taxa to help drive your research. The data is available on individual records in NCBI’s Gene resource, NCBI Gene FTP, or in community standard .gaf formatted files with each RefSeq genome release on our FTP site.  Continue reading “Gene Ontology (GO) Terms for NCBI RefSeq Eukaryotic Genomes”

RefSeq Release 221

RefSeq Release 221

RefSeq release 221 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of November 6, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 404,657,610 records
  • 300,054,945 proteins
  • 57,882,313 RNAs
  • sequences from 143,819 organisms 

Continue reading “RefSeq Release 221”

Now Available! Compare NCBI RefSeq and UniProt Datasets

Now Available! Compare NCBI RefSeq and UniProt Datasets

Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both datasets has led to large gaps in the data. We have developed a new process to compare the two datasets, first looking for 100% identical proteins and then checking the remaining sequences for similar matches in related taxa. The result is mapping information now covering over 170 million RefSeq proteins across the tree of life. 

You can find links to related UniProt accessions on individual NCBI Gene records. The entire dataset is available on our FTP site  Continue reading “Now Available! Compare NCBI RefSeq and UniProt Datasets”

New Annotations in RefSeq!

New Annotations in RefSeq!

In July, August, and September, the NCBI Eukaryotic Genome Annotation Pipeline released fifty-six new annotations in RefSeq!

New Annotations
  • Achroia grisella (moth)
  • Acipenser ruthenus (sterlet)
  • Ahaetulla prasina (snake)
  • Alligator mississippiensis (American alligator)
  • Ammospiza caudacuta (bird)
  • Ammospiza nelsoni (bird)
  • Anopheles bellator (mosquito)
  • Anopheles coustani (mosquito)
  • Anopheles ziemanni (mosquito)
  • Arachis stenosperma (eudicot)
  • Carassius carassius (crucian carp)
  • Centropristis striata (black seabass)
  • Cornus florida (flowering dogwood) (pictured)
  • Corylus avellana (European hazelnut)
  • Corythoichthys intestinalis (scribbled pipefish) Continue reading “New Annotations in RefSeq!”
Upcoming Changes to Virus Data Resources at NCBI

Upcoming Changes to Virus Data Resources at NCBI

Effective June 2024, NCBI Virus will replace legacy virus web resources 

Coming soon! As part of our ongoing effort to enhance your experience and modernize our services, several of our legacy virus-related web resources will be replaced by NCBI Virus – our community portal for viral sequence data. NCBI Virus is more comprehensive, modernized, and has more powerful features and analysis tools than our legacy resources.  

What will change?

Below is a list of the legacy virus resources that will be replaced by NCBI Virus. The list includes a description of features that will continue to be supported through NCBI Virus:  Continue reading “Upcoming Changes to Virus Data Resources at NCBI”

Introducing the New NCBI Datasets Genome Annotation Table

Introducing the New NCBI Datasets Genome Annotation Table

As part of our ongoing effort to modernize and improve your experience, we are excited to introduce the new NCBI Datasets genome annotation table. You can now quickly and easily access annotated gene and protein sequences annotated by NCBI RefSeq or GenBank submitters.  

Features & Benefits
  • Easier than ever to search and download data for annotated genes  
  • Download gene, transcript and protein sequences, and metadata 
  • Annotation tables are available for ~7500 eukaryotic and ~1.5M prokaryotic annotated genomes   
  • Annotation data is now available for both RefSeq and GenBank submitted annotations 
  • Filter by gene type, gene name, and chromosome or location on the genome 

Continue reading “Introducing the New NCBI Datasets Genome Annotation Table”

Comparing Yeast Species Used in Beer Brewing and Bread Making

Comparing Yeast Species Used in Beer Brewing and Bread Making

Using the NIH Comparative Genomics Resource (CGR) to gain knowledge about less-researched organisms 

The scientific community relies heavily on model organism research to gain knowledge and make discoveries. However, focusing solely on these species misses valuable variation. Comparative genomics allows us to use knowledge from a model species, such as Saccharomyces cerevisiae, to understand traits in other, related organisms, such as Saccharomyces pastorianus or Saccharomyces eubayanus. Applying this information may provide valuable insight for other less-researched organisms. The National Institutes of Health (NIH) Comparative Genomics Resource (CGR) offers a cutting-edge NCBI toolkit of high-quality genomics data and tools to help you do just that.  Continue reading “Comparing Yeast Species Used in Beer Brewing and Bread Making”

RefSeq Release 220

RefSeq Release 220

RefSeq release 220 is now available online and from the FTP site. You can access RefSeq data through NCBI Datasets.

What’s included in this release?

As of September 5, 2023, this full release incorporates genomic, transcript, and protein data containing:

  • 391,350,361 records
  • 289,333,423 proteins
  • 56,423,426 RNAs
  • sequences from 141,099 organisms 

Continue reading “RefSeq Release 220”

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

Now Available! Updated Bacterial and Archaeal Reference Genomes Collection

An updated bacterial and archaeal reference genome collection is available! This collection of 18,343 genomes was built by selecting exactly one genome assembly for each species among the 312,000+ prokaryotic genomes in RefSeq, except for E. coli for which two assemblies were selected as reference.

The criteria for selecting the reference assembly for a given species include assembly contiguity and completeness and quality of the RefSeq annotation. 

What’s new?
  • 790 species were added to the collection
  • 199 species are represented by a better assembly (compared to the April 2023 release)
  • 70 species were removed because of changes in NCBI Taxonomy or uncertainty in their species assignment 

Continue reading “Now Available! Updated Bacterial and Archaeal Reference Genomes Collection”

New Annotations in RefSeq!

New Annotations in RefSeq!

In April, May, and June, the NCBI Eukaryotic Genome Annotation Pipeline released eighty-two new annotations in RefSeq!

Highlights:

  • Homo sapiens (human) T2T-CHM13v2.0 now includes many more alternative splice variants
  • Homo sapiens (human) GRCh38.p14 includes all transcripts from MANE v1.2, and includes over 78,000 new RefSeq Functional Element (RefSeqFE) features added since our last annotation in 2022
  • Mus musculus (house mouse) GRCm39 integrates curation for over 3,000 genes and 14,000 transcripts since September 2020
  • Rattus norvegicus (Norway rat) mRatBN7.2, including curation of over 5000 genes since our last annotation in 2021

New annotations: Continue reading “New Annotations in RefSeq!”