Data-driven regularization lowers the size barrier of cryo-EM structure determination

Kimanius, Dari; Jamali, Kiarash; Wilkinson, Max E.; Lövestam, Sofia; Velazhahan, Vaithish; Nakane, Takanori; Scheres, Sjors H. W.

doi:10.1038/s41592-024-02304-8

Download PDF

Article
Open access
Published: 11 June 2024

Data-driven regularization lowers the size barrier of cryo-EM structure determination

Dari Kimanius^1,2,
Kiarash Jamali¹,
Max E. Wilkinson^3,4,5,
Sofia Lövestam¹,
Vaithish Velazhahan^1,6,
Takanori Nakane⁷ &
…
Sjors H. W. Scheres ORCID: orcid.org/0000-0002-0462-6540¹

Nature Methods (2024)Cite this article

5303 Accesses
53 Altmetric
Metrics details

Subjects

Abstract

Macromolecular structure determination by electron cryo-microscopy (cryo-EM) is limited by the alignment of noisy images of individual particles. Because smaller particles have weaker signals, alignment errors impose size limitations on its applicability. Here, we explore how image alignment is improved by the application of deep learning to exploit prior knowledge about biological macromolecular structures that would otherwise be difficult to express mathematically. We train a denoising convolutional neural network on pairs of half-set reconstructions from the electron microscopy data bank (EMDB) and use this denoiser as an alternative to a commonly used smoothness prior. We demonstrate that this approach, which we call Blush regularization, yields better reconstructions than do existing algorithms, in particular for data with low signal-to-noise ratios. The reconstruction of a protein–nucleic acid complex with a molecular weight of 40 kDa, which was previously intractable, illustrates that denoising neural networks will expand the applicability of cryo-EM structure determination for a wide range of biological macromolecules.

Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN

Article 08 March 2024

Topaz-Denoise: general deep denoising models for cryoEM and cryoET

Article Open access 15 October 2020

Model building of protein complexes from intermediate-resolution cryo-EM maps with deep learning-guided automatic assembly

Article Open access 13 July 2022

Main

Despite rapid progress in cryo-EM technology in the past decade¹, many biological macromolecules of interest are still too small to allow reliable structure determination. To limit the damage that electrons cause to biological structures of interest, cryo-EM images are taken using low doses of electron radiation, leading to high levels of experimental noise. The noise in the images impedes their alignment, resulting in an ill-posed optimization problem in which many reconstructions (which might be noisy or artifactual) are equally probable, given the data. The ill-posedness of the reconstruction imposes a minimum size barrier for cryo-EM structure determination, because smaller complexes yield images with lower signal-to-noise ratios. Although this barrier has been overcome in experiments involving the formation of complexes between small targets and other proteins², the formation of sufficiently rigid complexes is often difficult. Here we explore a computational method that lowers the size barrier for existing cryo-EM datasets.

Even for ill-posed reconstruction problems, the correct solution can still be identified through the incorporation of prior knowledge. Most cryo-EM structures are calculated using explicit regularization of a likelihood function in Fourier space, which assumes cryo-EM reconstructions are smooth in real space^3,4,5. Although we know much more about the structures of biological macromolecules beyond just the fact that their density varies smoothly, it has been difficult to incorporate richer sources of prior knowledge into the optimization process. Denoising convolutional neural networks can incorporate complex prior knowledge into an iterative optimization process⁶. By training a denoising network on simulated pairs of noisy and ground-truth images, we have previously provided proof of principle that prior knowledge about protein structures can be exploited to improve cryo-EM structure determination⁷. However, we also observed problems with overfitting and the hallucination of protein-like features in the resulting reconstructions. Moreover, because experimental cryo-EM structures often comprise regions of well-ordered proteins and nucleic acid domains alongside less structured regions, including, for example, membrane patches or flexible domains, it was not clear how ground-truth pairs for experimental cryo-EM data could be generated.

Here, we demonstrate how a pre-trained denoising convolutional neural network, trained and deployed in an application-specific manner inspired by the noise2noise approach⁸ (Fig. 1 and Methods), can improve cryo-EM structure determination using experimental data. Through this approach, which we call Blush regularization, we improve reconstructions across a variety of existing cryo-EM datasets, including one for a protein–nucleic acid complex that was too small for analysis using existing methods.

**Fig. 1: Schematic illustration of Blush regularization and slices of example volumes.**

Results

Blush regularization improves reconstruction without overfitting

We first tested Blush regularization on a cryo-EM dataset (EMPIAR-10330)⁹ for the Plasmodium falciparum chloroquine resistance transporter (PfCRT)¹⁰. This dataset has been used as a standard to demonstrate the performance of several approaches in reducing overfitting during cryo-EM refinement^11,12. Standard refinement using regularized likelihood optimization in RELION, which we refer to as the baseline, yields an overall resolution of 3.8 Å for this data set.

Application of Blush regularization (Fig. 2) yielded an overall resolution estimate of 3.4 Å. In the last iteration, spectral trailing, a heuristic method that prevents overfitting by limiting the spatial frequency at which information from the denoiser is used (Methods), was applied with a cut-off at 3.5 Å. Compared with the baseline reconstruction, local resolution improved for most regions of the map, with a corresponding increase in visible side-chain densities. The improvement in resolution, as measured by half-map Fourier shell correlation (FSC), was confirmed by FSCs between both maps and the atomic model that was deposited for this dataset (Protein Data Bank (PDB): 6UKJ). Throughout this paper, FSCs between the map and atomic model were calculated using Servalcat¹³. We also assessed the relative quality of both maps by application of our automated model-building software ModelAngelo¹⁴, which generated a model with 84% completeness in the baseline map and 97% completeness in the Blush map. Model completeness is defined as the percentage of residues that match the reference model with a Cα distance of 3 Å or less.

**Fig. 2: Single-particle reconstruction of the PfCRT dataset.**

To assess the potential for overfitting by the denoiser, we also performed a phase-randomization test¹⁵. We applied Blush regularization without spectral trailing for refinement of the PfCRT dataset with phase randomization beyond 4-Å resolution. Although spectral trailing was not used, no overfitting was observed. Switching off spectral trailing led to a marginal improvement in the quality of reconstruction, as quantified by the FSC between the map and the atomic model (Fig. 2d). These results indicate that the denoiser can prevent overfitting for this dataset, even without spectral trailing. In general, we still recommend running Blush regularization with spectral trailing, because the benefits of switching it off are small and overfitting could be more prominent for other datasets. Consequently, in the following sections, we present results obtained only using spectral trailing.

Blush expands the applicability of cryo-EM reconstruction

We subsequently assessed the broader applicability of Blush regularization by applying it to four types of structures and refinement methods.

First, we tested Blush regularization on a small membrane protein, Ste2, which is a dimeric G-protein-coupled receptor (GPCR)¹⁶ (Fig. 3 and Extended Data Table 1). Full-length monomeric Ste2 has a molecular weight of 47.85 kDa, which includes a long disordered carboxy-terminal tail that comprises 125 amino acids. The total mass of the ordered dimeric Ste2 that contributes to alignment is roughly 67 kDa, most of which lies embedded in a detergent micelle.

**Fig. 3: Single-particle reconstruction of the Ste2 dataset.**

The dataset used was acquired from a similar complex to that in PDB entry 7QB9, reported in ref. ¹⁶, but with different biochemical conditions affecting the stability of the structure. Alignment of images of Ste2 is difficult because few protein features extend from the smooth detergent micelle. Baseline reconstruction yielded a map with an overall resolution of 3.8 Å, with limited densities for side chains. Application of Blush regularization led to a structure with an overall resolution of 3.4 Å. Spectral trailing ensured that no information from the denoiser was inserted beyond 3.7-Å resolution. Compared with the baseline reconstruction, the density of the transmembrane helices is improved. Loops at the top and bottom of the structure are still relatively poorly resolved, probably owing to molecular flexibility. In agreement with the visibility of improved side-chain densities and local resolution estimates, the completeness of models built by ModelAngelo in these maps improved from 19% to 43%.

Second, we evaluated the performance of Blush regularization in multi-body refinement¹⁷, in which partial signal subtraction is used to align independently moving domains within a larger complex. Reconstructions from subtracted images were included in the training set for the denoiser. Moreover, signal subtraction reduces the amount of signal in each image, placing stringent limitations on the minimal size of domains that can be aligned. We applied Blush regularization in multi-body refinement of a publicly available dataset (EMPIAR-10180) for the Saccharomyces cerevisiae pre-catalytic spliceosomal B complex¹⁸ (Fig. 4). Using four bodies, one each for the core, the foot, the helicase and the SF3b regions, Blush regularization improved the quality of reconstructions of all domains compared with baseline multi-body refinement, as measured by local resolution, half-map FSCs and FSCs with the reference atomic model (PDB: 5NRL). The improvements in resolution were largest in the helicase and SF3b regions, which are the most flexible and thus the hardest to reconstruct. The improvements in resolution were reflected by automated model building in ModelAngelo, which increased model completeness of the entire complex from 32% to 48%. In particular, the model completeness for the SF3b region was improved from 3% to 29%.

**Fig. 4: Multi-body reconstruction of the spliceosome dataset.**

Third, we assessed the performance of Blush regularization for a biological assembly that was different than the types of structures that the denoiser was trained on: the first intermediate amyloid (FIA) that forms during the in vitro assembly of recombinant tau (residues 297–391)¹⁹. This dataset is also publicly available (EMPIAR-11720). Unlike any of the structures in the training set, the FIA has helical symmetry (Fig. 5). It is an amyloid filament, with parallel β-strands repeating every 4.7 Å in the direction of the helical axis. Besides deviating from the types of structures in the training set, the FIA is also one of the smallest amyloid structures solved to date, with only 15 ordered residues in each of two opposing β-sheets. Baseline helical refinement yielded a 5.0-Å-resolution map, in which the density for β-strands along the helical axis was not separated, and no atomic model could be built. Blush regularization improved the resolution to 2.8 Å, and ModelAngelo built all 15 ordered residues in the resulting map.

**Fig. 5: Helical reconstruction of the FIA, colored by local resolution, for the baseline.**

Fourth, we applied Blush to the small anti-CRISPR associated protein 2 (Aca2) bound to RNA, which has a total molecular weight of 40 kDa (Fig. 6 and Extended Data Table 1). Using different classification and refinement strategies in baseline RELION and CryoSPARC, we could not obtain a reliable reconstruction. Although an initial model generated using the standard VDAM algorithm in RELION²⁰ suffered from anisotropy, the first three-dimensional (3D) classification using Blush regularization resulted in one class with recognizable protein features. Similar 3D classifications without Blush regularization did not yield recognizable protein features. Refinement of the corresponding class yielded a better initial model for a second 3D classification, from which a single class was selected for subsequent CTF refinement²¹ and particle polishing²¹. A 3D classification was performed without alignment, followed by a final 3D refinement. Blush regularization was used for all 3D classifications with alignment and 3D refinements. The resolution of the final map was 2.5 Å, with ModelAngelo successfully building 97% of the protein sequence and 33 out of 42 nucleotides.

**Fig. 6: Single-particle reconstruction of the Aca2–RNA complex with a molecular weight of 40 kDa.**

Discussion

In a previous approach using noise2noise, implemented in the M software²², a new neural network is trained for each dataset that it is applied to, using only half-maps from the same dataset. As such, the neural network in the M software can learn only features that are specific to the dataset at hand. By contrast, we pre-train a single neural network on a diverse set of high-resolution half-maps from the EMDB. Our pre-trained network improves cryo-EM reconstructions for a wide variety of macromolecular complexes, suggesting that it has learned useful features about cryo-EM structures in general. In addition, although our approach was inspired by noise2noise, it blends the unsupervised elements from noise2noise training with new application-specific elements, such as recycling and supervised masks in Fourier space and in real space. An interesting avenue for future research could be a combination of the two approaches, in which the pre-trained Blush network is fine tuned using the half-maps of the dataset at hand, using techniques similar to those implemented in M.

We previously attempted to incorporate prior knowledge about protein structures by training a denoiser on pairs of noisy and ground-truth maps that were calculated from atomic models, and observed problems with overfitting and hallucinations⁷. Similar problems could explain why the application of the DeepEMhancer neural network²³ inside the iterative reconstruction algorithm of RELION had to be restricted to only a few iterations at the end of refinement²⁴. The approach in this paper reduces the risk of hallucinations of protein-like features in reconstructions by using a neural network that is trained only on experimental cryo-EM half-maps, that is without the atomic models or the geometrical restraints that are used to describe them.

Instead of forcing the map to resemble densities derived from atomic models, our denoiser is trained to introduce more subtle modifications to cryo-EM maps, such as smoothing out density in solvent regions or in detergent micelles. The network also removes artifacts that are commonly encountered in difficult cryo-EM refinements, for example anisotropic densities that result from uneven angular distributions, or radially extending, streaky features that are often observed in overfitted maps (Figs 1f,g). Our findings illustrate that, although the effect of a single application of the denoiser is relatively small, its cumulative impact over several iterations enhances the performance of cryo-EM structure determination across a diverse range of test cases. As the ability of machine-learning methods to extract knowledge from large datasets improves, it could be tempting to leverage more structural information about biological macromolecules in the reconstruction process. However, doing so could ultimately diminish one of the most powerful ways of assessing whether a reconstruction is correct: the presence of expected features in the map. We thus anticipate that the cryo-EM community will continue to explore the question of how much prior knowledge should inform the reconstruction process, and how much should be kept aside for validation.

In the framework of Blush regularization, the denoiser replaces the filter operation that constrains the power of Fourier-space components in the baseline algorithm. As a result, the FSC between independently refined subsets is no longer used to define a 3D Wiener filter that is applied to the intermediate reconstructions. Instead, this FSC is used to determine a resolution cut-off (ρ), beyond which the Fourier components of the two denoised half-maps are set to zero. Because Fourier components near the resolution estimate of the final map will not have been affected by the denoiser, overestimation of resolution owing to the denoiser cannot happen directly.

Although spectral trailing represents the first attempt to prevent overestimation of resolution when using information-rich priors in cryo-EM reconstruction, it might not be the optimal solution. In fact, as exemplified by the PfCRT dataset (Fig. 2), spectral trailing can lead to underestimation of resolution. Future exploration of the damping effect of the network in Fourier space could lead to better approaches to safeguard against overestimation of resolution. Other research topics that might be worth exploring include the adaptation of the VDAM algorithm²⁰ in Relion to also use Blush regularization, which may improve initial model generation. In fact, provided that they allow modification of real-space maps, a wide range of cryo-EM methods could be improved by Blush regularization, ranging from standard refinement approaches in alternative software packages to approaches for dealing specifically with structural heterogeneity, for example^25,26,27.

In all our tests, the performance of Blush regularization surpassed or matched that of the baseline implementation in RELION. We observed the largest differences for cases in which the baseline approach tended to overfit the data. Consequently, Blush regularization will be most useful for refinements of datasets with low signal-to-noise ratios, such as those of small complexes or complexes embedded in thick ice layers, multi-body refinements involving relatively small bodies and refinements of maps exhibiting pronounced variations in local resolution. For example, Blush regularization allowed reconstruction of an amyloid with only 30 residues in its ordered core, and of the Aca2–RNA complex with a molecular weight of 40 kDa. Although nucleic acids result in higher signal-to-noise ratios than do proteins, 40 kDa approaches predicted minimal sizes for a protein that is amenable to cryo-EM structure determination^28,29. These results demonstrate that denoising convolutional neural networks expands the applicability of cryo-EM structure determination .

Methods

Rationale

The noise2noise framework⁸ facilitates the training of a denoising convolutional neural network in the absence of explicit access to ground-truth images. Instead, it relies on pairs of noisy images to extract information about their shared signal. Here, we present an application-specific approach that incorporates this aspect from the noise2noise framework. We trained a denoiser on a set of 422 pairs of noisy half-maps that we downloaded from the EMDB³⁰. We selected only entries with reported resolutions higher than 4 Å for which both unfiltered half-maps were deposited. Maps with obvious artifacts, for example those associated with overfitting, and maps of a structure that was already present in the training set were eliminated during manual curation.

We tailored data augmentation and training of the denoiser to integrate with the iterative expectation-maximization algorithm for cryo-EM reconstruction. All pairs of half-maps, ${x}_{i}^{(k)}\in {{\mathbb{R}}}^{N}$, with k ∈ {0,1}, were re-scaled to a uniform voxel size of 1.5 Å, and augmented by generating new pairs ${y}_{i}^{\;k},{\bar{y}}_{i}^{\;k}\in {{\mathbb{R}}}^{N}$:

$${y}_{i}^{(k)}={{{{\rm{H}}}}}_{C,\;A}\left[{x}_{i}^{(1-k)}+{e}^{(1-k)}\right],$$

(1)

$${\bar{y}}_{i}^{(k)}={{{{\rm{H}}}}}_{\bar{C},\;A}\left[{x}_{i}^{(k)}\odot {M}_{i}+{{{\rm{h}}}}\left({x}_{i}^{(k)}\right)\odot (1-{M}_{i})\right],$$

(2)

where $e\in {{\mathbb{R}}}^{N}$ is random colored noise, M_i ∈ [0, 1]^N is a smooth mask encapsulating the molecules of interest, ⊙ represents voxel-wise multiplication and h(.) is a low-pass filter to 15 Å. H_C,A[.] applies an anisotropic Gaussian filter with covariance matrix C, an affine transform A that includes rotation and translation, a crop to a patch of 64³ voxels and a voxel-value standardization. Data augmentation was achieved through random assignments of $C,\bar{C},A,e$ and r.

By using a range of resolution cut-offs for C and $\bar{C}$, the denoiser explicitly learns to handle maps with varying resolutions. This is necessary for its application inside the iterative expectation-maximization algorithm, which typically starts at relatively low resolutions and gradually progresses to higher resolutions. Although using a lower resolution cut-off for C than for $\bar{C}$ could have produced a network that enhances the resolution of the half-maps, similar to deblurring networks³¹, we opted not to do so to minimize the risk of hallucinations in high-resolution features.

Using different degrees of anisotropy in C and $\bar{C}$, the denoiser learns to deal with the artifacts that arise from non-uniform orientational distributions, and random orientations and affine transformations in A lead to invariance with respect to rotations, translations and intensity scale. Although initial versions of our training protocol did not include masks, we observed that the resulting networks would learn to smoothen densities in disordered regions, such as the solvent or detergent micelles, which would improve image alignments. To amplify these effects, we then implemented the supervised masking approach with M_i and h(.). By filling disordered regions with a 15-Å-resolution low-pass filtered version of the map, as opposed to a straightforward voxel-wise multiplication with the mask M_i, higher density values in regions with disordered molecules, such as detergent micelles, are maintained.

By re-scaling all maps to a common voxel size of 1.5 Å, and then cropping maps to patches of 64³ voxels, the network can be trained on and applied to maps of any size. To apply the denoiser to maps that are larger than one patch, overlapping patches can be denoised independently.

Training the denoiser

Our denoiser (f_θ) consists of a U-net with approximately 13 million trainable parameters (θ) (Fig. 1). It is trained using residual learning³² and with a dropout rate of 50% (ref. ³³). Instance normalization³⁴ is used to handle small mini-batches (${{{\mathcal{B}}}}$), with b = 8 samples from the training dataset, during training. We minimize the following loss:

$${{{\mathcal{L}}}}=\frac{1}{2b}\mathop{\sum}\limits_{i\in {{{\mathcal{B}}}}}\mathop{\sum}\limits_{k\in \{0,1\}}{\left\Vert\; {\bar{y}}_{i}^{(k)}-{f}_{\theta }\left({{{{\rm{R}}}}}_{r}\left[\;{f}_{\theta },{y}_{i}^{(k)}\right]\right)\right\Vert }^{2},$$

(3)

where R_r[f_θ,y] returns the output of the denoiser f_θ after recursively calling it r ∈ {0, …, 5} times with ${y}_{i}^{\;k}$ as the initial input. This enables the denoiser to recognize and suppress artifacts brought about by its repeated usage, thereby limiting the amplification of artifacts in the reconstruction that are introduced by the denoiser during subsequent iterations of the expectation-maximization algorithm⁷.

Training for 950,000 steps took six days using a single Nvidia A100 GPU.

Iterative denoising with spectral trailing

We refer to the application of our pre-trained denoiser within the iterative expectation-maximization algorithm as Blush regularization. In our original work, with simulated data, we incorporated the denoiser into the L₂ regularization in the M-step, on the basis of the approximation that the prior function is ‘close’ to a Gaussian⁷. In this work, we do not make formal claims about the role of the denoiser within a Bayesian framework. Instead, our approach is motivated by empirical observations.

Although one effect of the denoiser is that it tends to dampen Fourier components at higher spatial frequencies, the amount by which it does so is not well defined. Therefore, we use a heuristic method, here referred to as spectral trailing, to prevent overfitting in 3D autorefinement and multi-body refinement. First, we calculate the FSC between two independently refined half-maps before the denoiser is applied, and determine the ρ value at which the solvent-corrected FSC drops below 0.143. We then apply the denoiser to both half-maps and subsequently apply a low-pass filter at a spatial frequency that is two Fourier shells (each shell is one Fourier voxel wide) lower than ρ. If ρ exceeds the Nyquist frequency of the denoiser, here set to 3 Å, the remaining Fourier shells at higher frequencies are populated with the reconstruction from the standard regularization in Fourier space. The resulting denoised, low-pass-filtered maps are then used as references for alignment in the next iteration. The denoiser is not applied to the output of the final refinement step.

Blush regularization has been implemented in the open-source software RELION-5, using a combination of C++ and PyTorch. It can be used for 3D classification, multi-body refinement and 3D autorefinement jobs, including those for particles with point-group or helical symmetry. For 3D classification for data that are separated into independent half-sets, the filtered map from the regularized likelihood approach is used as input for the denoiser. No additional low-pass filtering is applied. In this job type, the denoiser is also applied in the last iteration.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The full list of EMDB entries that were used to train the denoiser, along with the manually curated masks, can be downloaded from https://zenodo.org/records/10553452 (ref. ³⁵). The Aca2–RNA dataset has been submitted to EMPIAR (EMPIAR-11918).

Code availability

Blush regularization has been implemented in the open-source software RELION-5, which is distributed for free under the GPLv2 license and can be downloaded from https://github.com/3dem/relion. Additionally, code used in the training procedure of the Blush denoiser model is available at https://github.com/dkimanius/blush-training.

References

Kühlbrandt, W. The resolution revolution. Science 343, 1443–1444 (2014).
Article PubMed Google Scholar
Wu, X. & Rapoport, T. A. Cryo-EM structure determination of small proteins by nanobody-binding scaffolds (legobodies). Proc. Natl Acad. Sci. USA 118, e2115001118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Scheres, S. H. W. A Bayesian view on cryo-em structure determination. J. Mol. Biol. 415, 406–418 (2012).
Article CAS PubMed PubMed Central Google Scholar
Scheres, S. H. W. Relion: implementation of a bayesian approach to cryo-em structure determination. J. Struct. Biol. 180, 519–530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryosparc: algorithms for rapid unsupervised cryo-em structure determination. Nat. Methods 14, 290–296 (2017).
Article CAS PubMed Google Scholar
Romano, Y., Elad, M. & Milanfar, P. The little engine that could: regularization by denoising (red). SIAM J. Imaging Sci. 10, 1804–1844 (2017).
Article Google Scholar
Kimanius, D. et al. Exploiting prior knowledge about biological macromolecules in cryo-EM structure determination. IUCrJ 8, 60–75 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lehtinen, J. et al. Noise2noise: learning image restoration without clean data. Preprint at arXiv https://doi.org/10.48550/arXiv.1803.04189 (2018).
Iudin, A. et al. Empiar: the electron microscopy public image archive. Nucleic Acids Res. 51, D1503–D1511 (2023).
Article PubMed Google Scholar
Kim, J. et al. Structure and drug resistance of the Plasmodium falciparum transporter PfCRT. Nature 576, 315–320 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ramlaul, K., Palmer, C. M., Nakane, T. & Aylett, C. H. S. Mitigating local over-fitting during single particle reconstruction with sidesplitter. J. Struct. Biol. 211, 107545 (2020).
Article PubMed PubMed Central Google Scholar
Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-em reconstruction. Nat. Methods 17, 1214–1221 (2020).
Article CAS PubMed Google Scholar
Yamashita, K., Palmer, C. M., Burnley, T. & Murshudov, G. N. Cryo-EM single-particle structure refinement and map calculation using servalcat. Acta Crystallogr. D Biol. Crystallogr. 77, 1282–1291 (2021).
Article CAS Google Scholar
Jamali, K., Kimanius, D. & Scheres, S. H. W. A graph neural network approach to automated model building in cryo-EM maps. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=65XDF_nwI61 (ICLR, 2023).
Chen, S. et al. High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24–35 (2013).
Article CAS PubMed PubMed Central Google Scholar
Velazhahan, V., Ma, N., Vaidehi, N. & Tate, C. G. Activation mechanism of the class Dfungal GPCR dimer STE2. Nature 603, 743–748 (2022).
Article CAS PubMed PubMed Central Google Scholar
Nakane, T., Kimanius, D., Lindahl, E. & Scheres, SjorsH. W. Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in relion. eLife 7, e36861 (2018).
Article PubMed PubMed Central Google Scholar
Plaschka, C., Lin, Pei-Chun & Nagai, K. Structure of a pre-catalytic spliceosome. Nature 546, 617–621 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lövestam, S. et al. Disease-specific tau filaments assemble via polymorphic intermediates. Nature 625, 119–125 (2024).
Article PubMed Google Scholar
Kimanius, D., Dong, L., Sharov, G., Nakane, T. & Scheres, S.H. W. New tools for automated cryo-EM single-particle analysis in relion-4.0. Biochem. J. 478, 4169–4185 (2021).
Article CAS PubMed Google Scholar
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in relion-3. eLife 7, e42166 (2018).
Article PubMed PubMed Central Google Scholar
Tegunov, D., Xue, L., Dienemann, C., Cramer, P. & Mahamid, J. Multi-particle cryo-EM refinement with m visualizes ribosome-antibiotic complex at 3.5 Å in cells. Nat. Methods 18, 186–193 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sanchez-Garcia, R. et al. Deepemhancer: a deep learning solution for cryo-EM volume post-processing. Commun. Biol. 4, 874 (2021).
Article PubMed PubMed Central Google Scholar
Ramirez-Aportela, E., Carazo, J. M. & Sorzano, C. O. S. Higher resolution in cryo-EM by the combination of macromolecular prior knowledge and image-processing tools. IUCrJ 9, 632–638 (2022).
Punjani, A. & Fleet, D. J. 3DFlex: determining structure and motion of flexible proteins from cryo-EM. Nat. Methods 20, 860–870 (2023).
Herreros, D. et al. Estimating conformational landscapes from cryo-EM particles by 3D zernike polynomials. Nat. Commun. 14, 154 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kimanius, D., Jamali, K. & Scheres, S. H. W. Sparse Fourier backpropagation in cryo-EM reconstruction. Adv. Neural. Inf. Process. Syst. 35, 12395–12408 (2022).
Google Scholar
Henderson, R. The potential and limitations of neutrons, electrons and X-rays for atomic resolution microscopy of unstained biological molecules. Q. Rev. Biophys. 28, 171–193 (1995).
Article CAS PubMed Google Scholar
Dickerson, J. L., Lu, Peng-Han, Hristov, D., Dunin-Borkowski, R. E. & Russo, C. J. Imaging biological macromolecules in thick specimens: the role of inelastic scattering in cryoem. Ultramicroscopy 237, 113510 (2022).
Article CAS PubMed Google Scholar
Lawson, C. L. et al. Emdatabank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).
Article CAS PubMed Google Scholar
Albluwi, F., Krylov, V. A. & Dahyot, R. Image deblurring and super-resolution using deep convolutional neural networks. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) 1–6 (IEEE, 2018).
Zhang, K., Zuo, W., Chen, Y., Meng, D. & Zhang, L. Beyond a Gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 26, 3142–3155 (2017).
Article PubMed Google Scholar
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. Preprint at arXiv https://doi.org/10.48550/arXiv.1207.0580 (2012).
Ulyanov, D., Vedaldi, A. & Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. Preprint at arXiv https://doi.org/10.48550/arXiv.1607.08022 (2016).
Kimanius, D. Blush training dataset masks. Zenodo 10.5281/zenodo.10553451 (2024).

Download references

Acknowledgements

We thank J. Schwab, K. Yamashita, C.-B. Schönlieb and O. Öktem for helpful discussions; J. Grimmett, T. Darling and I. Clayson for help with high-performance computing; the EM facility at the Medical Research Council Laboratory of Molecular Biology for support with cryo-EM; N. Birkholz and P. Fineran for input into the design and production of Aca2–RNA complexes, funded by Bioprotection Aotearoa, Centre of Research Excellence (Tertiary Education Commission, New Zealand); and E. Brignole and C. Borsa for the smooth running of the MIT.nano cryo-EM facility, established in part with financial support from the Arnold and Mabel Beckman Foundation. M.E.W. is grateful to F. Zhang for funding support. T.N. is a member of the JEOL YOKOGUSHI Research Alliance Laboratories. This work was supported by the Medical Research Council as part of the United Kingdom Research and Innovation (MC_UP_A025_1013 to S.H.W.S.); the European Union’s Horizon 2020 research and innovation program (under grant agreement no. 895412 to D.K.); a Helen Hay Whitney Foundation Postdoctoral Fellowship (to M.E.W.); the Howard Hughes Medical Institute (to M.E.W.) and a Research Fellowship at Gonville and Caius College of Cambridge University (to V.V.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. For the purpose of open access, the MRC Laboratory of Molecular Biology has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising.

Author information

Authors and Affiliations

MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, UK
Dari Kimanius, Kiarash Jamali, Sofia Lövestam, Vaithish Velazhahan & Sjors H. W. Scheres
CZ Imaging Institute, Redwood City, CA, USA
Dari Kimanius
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Max E. Wilkinson
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Max E. Wilkinson
Howard Hughes Medical Institute, Cambridge, MA, USA
Max E. Wilkinson
School of Medicine, Stanford University, Stanford, CA, USA
Vaithish Velazhahan
Institute for Protein Research, Osaka University, Suita-shi, Osaka, Japan
Takanori Nakane

Authors

Dari Kimanius
View author publications
You can also search for this author in PubMed Google Scholar
Kiarash Jamali
View author publications
You can also search for this author in PubMed Google Scholar
Max E. Wilkinson
View author publications
You can also search for this author in PubMed Google Scholar
Sofia Lövestam
View author publications
You can also search for this author in PubMed Google Scholar
Vaithish Velazhahan
View author publications
You can also search for this author in PubMed Google Scholar
Takanori Nakane
View author publications
You can also search for this author in PubMed Google Scholar
Sjors H. W. Scheres
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.K. designed and implemented Blush regularization, ran most experiments and analyzed the results. K.J. contributed to data preprocessing. M.E.W. contributed and analysed the Aca2–RNA dataset, and contributed to analysis of the PfCRT dataset. S.L. contributed the FIA dataset. V.V. contributed the Ste2 dataset. T.N. analysed the results. S.H.W.S. supervised the project and contributed to RELION integration. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Dari Kimanius or Sjors H. W. Scheres.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Cryo-EM datasets of Ste2 and Aca2. Details on cryo-EM data collection and processing of the Ste2 and Aca2 datasets are presented

Full size table

Supplementary information

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kimanius, D., Jamali, K., Wilkinson, M.E. et al. Data-driven regularization lowers the size barrier of cryo-EM structure determination. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02304-8

Download citation

Received: 27 October 2023
Accepted: 08 May 2024
Published: 11 June 2024
DOI: https://doi.org/10.1038/s41592-024-02304-8