Tag: Assembly

NCBI Datasets: Easily Access and Download Sequence Data and Metadata

NCBI Datasets: Easily Access and Download Sequence Data and Metadata

Effective June 2024, NCBI Datasets will replace legacy Genome and Assembly web resources 

As part of our ongoing effort to enhance your experience and modernize our services, NCBI will gradually replace the legacy Genome and Assembly resources with the newly introduced NCBI Datasets resource. NCBI Datasets is a continually evolving platform designed to provide easy and intuitive access to NCBI’s sequence data and metadata. 

  • The legacy Genome and Assembly web resources will no longer be available after June 2024
  • There will be no changes to how you access the databases using E-Utilities or EDirect 

Continue reading “NCBI Datasets: Easily Access and Download Sequence Data and Metadata”

New & Improved NCBI Datasets Genome and Assembly Pages 

New & Improved NCBI Datasets Genome and Assembly Pages 

Legacy pages now redirect 

Effective July 10, 2023, NCBI’s Assembly and Genome record pages now redirect to new NCBI Datasets pages. As previously announced, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.   

The following pages have been updated:
  • The NCBI Assembly record pages now redirect to the new NCBI DatasetsGenomerecord pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST.  
  • The NCBIGenome record pages now redirect to the NCBI DatasetsTaxonomyrecord pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.   

During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024.   Continue reading “New & Improved NCBI Datasets Genome and Assembly Pages “

Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP

Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP

Do you currently access genome assembly data through the FTP site? We are consolidating information provided in the ASSEMBLY_REPORTS and GENOME_REPORTS directories on the genomes FTP site to simplify access and ensure that you have the most accurate, up to date, and consistently reported data.  

The assembly_summary files in the ASSEMBLY_REPORTS directory are gaining information in newly added columns 24-38, including statistics about the assembly (size, GC content, genome size, and number of sequences) as well as details about the provided annotation (number of genes, annotation name and date). See example below (Table 1). Check out the README for more details about the contents of the summary files.  Continue reading “Important Update! Changes to ASSEMBLY_REPORTS and GENOME_REPORTS on FTP”

Download Assembled Genome Data Programmatically with NCBI Datasets

Download Assembled Genome Data Programmatically with NCBI Datasets

As previously announced, NCBI’s Assembly and Genome record pages will be redirected to new NCBI Datasets pages in June 2023. The NCBI Datasets Command Line Interface (CLI) tools provide easy, straightforward programmatic downloads of assembled genome sequence data. We invite you to check them out and let us know what you think! 

Features & Benefits of NCBI Datasets
  • Get assembled genome sequence, annotation, and metadata, including transcripts and proteins, in one easy step. 
  • Querying is easy and flexible! Retrieve data using organism name, assembly accession, or BioProject accession. 
  • Request data for multiple assemblies in one request – it is now simpler and faster to download large amounts of data. 
  • Metadata is derived from multiple databases and metadata schemas are documented. 

Continue reading “Download Assembled Genome Data Programmatically with NCBI Datasets”

New & Improved NCBI Datasets Genome and Assembly Pages

New & Improved NCBI Datasets Genome and Assembly Pages

Legacy pages will be redirected effective July 2023

In July 2023, NCBI’s Assembly and Genome record pages will be redirected to new Datasets pages as part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data 

We will update the following pages:
  • The NCBI Assembly pages will be redirected to the new DatasetsGenome pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. 
  • The NCBIGenome pages will be redirected to the DatasetsTaxonomy pages that provide a taxonomy-focused portal to genes, genomes and additional NCBI resources.  
  • During this transition, you will have the option to return to the legacy Genome and Assembly pages. 

Continue reading “New & Improved NCBI Datasets Genome and Assembly Pages”

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum

As reported in the journal Plant Disease,  a recent collaboration between National Library of Medicine’s NCBI and the U.S. Department of Agriculture’s Animal and Plant Health Inspection Service (APHIS) analyzed public sequence records for the fungal genus Colletotrichum, an important group of fungal plant pathogens that are a significant threat  to food production. Colletotrichum species are challenging to identify accurately, and public sequences may contain out of date taxonomic information. The study improved the accuracy of species names assigned to Colletotrichum database sequences, verified a comprehensive set of reliable reference markers for the genus, and produced a multi-marker tree as well as the genome based interactive tree shown in Figure 1.

Figure 1.  Views from genome assembly derived multi-protein distance tree that shows the analysis of publicly available Colletotrichum genomes. The interactive tree is available online. You can browse, search, download, and export the tree. As an example search, you can demonstrate that assembly GCA_002901105.1 was incorrectly labeled as Colletotrichum gloeosporioides.  Searching the tree for the name “Colletotrichum gloeosporioides” highlights two clades.  Clicking the node for the Truncatum species complex and clicking “Show descendants” expands the clade and shows that assembly GCA_002901105.1, which was labelled as gloeosporioides, clusters with the Truncatum species complex. You can find more details on the tree building process in the supplementary material for the publication and on GitHub.

Continue reading “Fungal species identification using DNA: an NCBI and USDA-APHIS collaboration with a focus on Colletotrichum”

Come see NCBI at the ASM Microbe Conference 2022

Come see NCBI at the ASM Microbe Conference 2022

The American Society of Microbiology (ASM) Microbe conference is back, and scheduled to take place in-person, June 9th-13th in Washington, D.C.

NCBI staff member Dr. Michael Feldgarden will be recognized by ASM with an award for his research. Other NCBI staff will present posters on NCBI resources and will also be available at our booth (#1128) to address your questions. Drop by to see what’s new and provide your feedback. We hope to see you there! Check out NCBI’s schedule of activities:  Continue reading “Come see NCBI at the ASM Microbe Conference 2022”

NCBI’s Genome Data viewer now displays both NCBI RefSeq and submitted assemblies

NCBI’s Genome Data Viewer (GDV) now supports visualization and analysis of nearly 400 submitter-annotated chromosome-level assemblies from the INSDC (GenBank/ENA/DDBJ). These submitter-annotated assemblies join more than 1,200 NCBI RefSeq-annotated assemblies available in GDV for hundreds of eukaryotes, spanning fungi, plants, fish, insects, and all major model organisms.

Figure 1 shows a GenBank apple assembly (GCA_004115385) displayed in GDV.

Figure 1. Submitter-annotated Malus domestica (apple) assembly displayed in GDV. GDV provides submitter-provided gene annotation, as well as some additional tracks including interspersed repeats identified by RepeatMasker and six-frame translations (not shown). Red boxes indicate useful tools and panels including a search box, an exon navigator, and interfaces to add user data and conduct NCBI BLAST searches. 

Continue reading “NCBI’s Genome Data viewer now displays both NCBI RefSeq and submitted assemblies”

A new service to evaluate the quality of your assembled genome!

A new service to evaluate the quality of your assembled genome!

Are you wondering about the quality of a human, mouse or rat genome that you have assembled?

We offer a new service for evaluating the completeness, correctness, and base accuracy of your human, mouse or rat genome assembly compared to a reference assembly. You simply provide NCBI with one or more assemblies in FASTA format and we will do an annotation-based evaluation of the genome(s) using the expert-curated, high-confidence RefSeq transcripts for the species.

Continue reading “A new service to evaluate the quality of your assembled genome!”

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI

This post is in support of the CDC’s Fungal Disease Awareness Week — September 20-24, 2021.

The impact of fungal diseases on human health has often been neglected, but increased association of fungal infections with severe illness and death during the COVID-19 pandemic has brought fungal diseases into the spotlight.

According to the CDC, the most common fungal co-infections in patients with COVID-19 include aspergillosis or invasive candidiasis including healthcare-associated infection from Candida auris.  Other reported diseases are mucormycosis, coccidioidomycosis and cryptococcosis. Aspergillosis is commonly caused by Aspergillus fumigatus, mucormycosis by Rhizopus species, coccidioidomycosis by Coccidioides immitis and C. posadasii and cryptococcosis by Cryptococcus neoformans.

This post explores several NCBI resources that have relevant information about the fungal pathogens implicated in these COVID-19 related illnesses.

Assembled genomes

Correctly identified and annotated genome assemblies are available for the fungal taxa implicated as co-infections in COVID-19 patients are summarized in table below.  These and  many other fungi are also available as curated RefSeq genome assemblies.

Continue reading “Fungal Disease Awareness Week: fungal pathogen data and literature at NCBI”