NCBI Datasets: Easily Access and Download Sequence Data and Metadata

NCBI Datasets: Easily Access and Download Sequence Data and Metadata

Effective June 2024, NCBI Datasets will replace legacy Genome and Assembly web resources 

As part of our ongoing effort to enhance your experience and modernize our services, NCBI will gradually replace the legacy Genome and Assembly resources with the newly introduced NCBI Datasets resource. NCBI Datasets is a continually evolving platform designed to provide easy and intuitive access to NCBI’s sequence data and metadata. 

  • The legacy Genome and Assembly web resources will no longer be available after June 2024
  • There will be no changes to how you access the databases using E-Utilities or EDirect 

Why are we making this change? 
  • To provide a streamlined experience that integrates genome, organism, and gene information 
  • To help you retrieve large datasets that enable big data analyses 
  • To deliver data and metadata together and support better reuse and attribution 
  • To provide you with a single entry point to genome datasets
Features & Benefits of NCBI Datasets 
  1. Comprehensive Data: Access assembled genome sequences, annotations, and metadata, including transcripts and proteins from a single webpage 
  2. Flexible Search Options: Easily retrieve data using organism names, assembly, WGS, or BioProject accessions 
  3. Scalable Data Retrieval: Request data for multiple genomes and file types in a single request, simplifying and expediting the download of large datasets 
  4. Well-Documented Metadata: Metadata is sourced from multiple databases, and metadata schemas are thoroughly documented
  5. Interoperable metadata formats: Metadata formats are machine-readable and easily converted to human-readable forms
  6. Taxonomy-Focused Portal: Access genes, genomes, and other NCBI resources through a taxonomy-focused portal
  7. Consistency: Enjoy access to consistent data access across web and programmatic interfaces 
Stay up to date 

NCBI Datasets is part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.     

Join our mailing list to keep up to date with NCBI Datasets and other CGR news. 


If you have questions or would like to provide feedback, please reach out to us at   


4 thoughts on “NCBI Datasets: Easily Access and Download Sequence Data and Metadata

  1. You can continue to provide both the old assembly resources as well as the new format. Completely deprecating the assembly web resources (and especially command line resources) will break a lot of people’s workflows.

    1. Thanks for the comment. FTP and access to Assembly data through FTP and the E-Utilities API / EDirect will remain exactly the same as before, so this change should not affect bioinformatic workflows only the web pages.

  2. PLEASE PLEASE PLEASE make it simpler to get a URL to directly command line download one specific file (e.g. assembly .fasta or annotation .gff) that the legacy page provides, rather than the cumbersome URL arguments that generate a .zip file with additional metadata files in multiple subfolders.

Leave a Reply