Learn more: PMC Disclaimer | PMC Copyright Notice
cryo_fit: Democratization of Flexible Fitting for Cryo-EM
Associated Data
Abstract
Cryo-electron microscopy (cryo-EM) is becoming a method of choice for describing native conformations of biomolecular complexes at high resolution. The rapid growth of cryo-EM in recent years has created a high demand for automated solutions, both in hardware and software. Flexible fitting of atomic models to three-dimensional (3D) cryo-EM reconstructions by molecular dynamics (MD) simulation is a popular technique but often requires technical expertise in computer simulation. This work introduces cryo_fit, a package for the automatic flexible fitting of atomic models in cryo-EM maps using MD simulation. The package is integrated with the Phenix software suite. The module was designed to automate the multiple steps of MD simulation in a reproducible manner, as well as facilitate refinement and validation through Phenix. Through the use of cryo_fit, scientists with little experience in MD simulation can produce high quality atomic models automatically and better exploit the potential of cryo-EM.
Introduction
Cryo-electron microscopy (cryo-EM) has been changing the landscape of structural biology. Cryo-EM was selected as the method of the year 20151 and its major contributors received the Nobel prize in 20172. Indeed, it has been used to determine biomolecular structures of a wide range of molecular weights (ranging from 64 kDa3,4 to 150 MDa5), achieving up to 1.8 Å resolution6,7. Recent discoveries of improved focusing waves8 and the ability to record super-resolution (beyond the Nyquist frequency)9 are expected to further improve resolution. Among the four methods of cryo-EM (single particle analysis, cryo-tomography, electron crystallography, and micro-electron diffraction), single particle analysis (SPA) is most popular (78% among all cryo-EM methods)10.
For SPA, flexible fitting has been an essential tool for better elucidating dynamical aspects of biological complexes5,11. These dynamical aspects produce conformational heterogeneity, one of the important causes of lower resolution in cryo-EM reconstructions. If elucidated, this can help reveal crucial structure-function relationships involving dynamics12. Flexible fitting has been used to capture dynamics11 with a larger radius of convergence than standard refinement,13 as large as 34 Å14. Additionally, flexible fitting can describe multiple conformational changes (supporting Fig. S1). One popular technique for flexible fitting employs molecular dynamics simulations based on classical molecular mechanics force fields, which, among many other examples, have been used to simulate tRNA movement into the ribosome15. We initially developed a hybrid molecular dynamics method that uses a physics-based all-atom force field together with secondary structure restraints from an initial model to fit to cryo-EM maps. This method16 is implemented in the gromacs17 suite of programs that provides access to all gromacs functionalities such as enhanced sampling. In principle, our method allows one to choose the complexity of the physical model, from a reduced representation to an all-atom representation, while fitting a cryo-EM map.
However, molecular dynamics simulation is a highly technical field in its own right and, like other molecular dynamics based cryo-EM map fitting programs18,19,20,21,22, the technique has been challenging to use by those who are not experts in molecular dynamics. Therefore, we automated all procedures of our hybrid molecular dynamics method and integrated them into Phenix23, a widely used structural biology software suite with a graphical user interface (GUI). Our new method for streamlined automatic flexible fitting of atomic models into a cryo-EM map is named cryo_fit. As a part of Phenix, cryo_fit makes use of all pre- and post- processing functionalities of Phenix providing the user a unified platform for cryo-EM structure construction and interpretation. Cryo_fit is particularly useful when attempting to fit novel atomic structures into cryo-EM maps, e.g. cases where the starting atomic models are significantly different from the native conformation (Fig. 1, ,4).4). When the initial model does not require large changes to fit the map, refinement can be used. However, if larger changes are required (that are beyond convergence radius of gradient-driven minimization) then cryo_fit is recommended (Fig. S2). The cryo_fit module is expected to be most useful when an atomic model is approximately placed into the map, and large conformational changes are required for fitting (i.e., model-to-map fit is not sufficient for gradient-based refinement to be successful). Importantly, the current implementation of cryo_fit can only handle standard amino- and nucleic-acid based molecules (proteins, RNA, DNA), while non-standard entities or ligands are excluded from fitting. Excluded entities can be modeled at later stages after the macromolecule has been fitted into the map by cryo_fit.
Procedures
The algorithms used in cryo_fit are explained in detail in Kirmizialtin et al.16 Briefly, while phenix.real_space_refine sums data-based and restraints-based components in a target function13, cryo_fit uses a molecular dynamics potential function including a term proportional to the correlation between the experimentally determined cryo-EM map and a map calculated from the current model, an ab initio molecular dynamics potential, and an all-atom structure-based potential to preserve the stereochemistry of initial models while exploring alternative conformations16. Several different measures of the correlation between map and model can be used, as described recently24. The correlation coefficient in cryo_fit is based on those originally used for cryo-EM fitting by Tama and co-workers and is similar to previous implementations14,16,19,25:
where ρsim(i,j,k) is a simulated map computed from atomic coordinates using the same grid spacing as the cryo-EM map assuming a Gaussian distribution for each atom defined by:
Here, ρsim(i,j,k, rl) is proportional to the electron density (which is directly related to the electrostatic potential measured in cryo-EM) at grid site (i,j,k) due to atom l at rl and 2σ is the resolution of the cryo-EM map. This technique of determining ρsim(i,j,k, rl) is used in other flexible fitting methods and was chosen for ease of calculation since it is used in the MD force field potential26; however, we note that more accurate FFT (fast fourier transform)-based methods may be used, such as those implemented elsewhere in Phenix23.
This article is focused on the implementation of cryo_fit in Phenix and its utility in biomolecular complexes comprised of RNA, DNA and proteins. Similar to other Phenix applications, in developing cryo_fit we prioritized ease of installation and use. Cryo_fit requires an atomic model of a structure (PDB or mmCIF formatted file) approximately placed into a cryo-EM map (ccp4 formatted binary map or a map in Situs27 format).
There are several approaches to build a starting atomic model before using cryo_fit. For example, an initial atomic model can be built from a map and sequence information using phenix.map_to_model28. Alternative approaches such as a homology modeling or a pathwalker29 (among other programs) derived model can be used to provide cryo_fit a starting atomic model.
If an initial atomic model is available but is not placed into the map yet, then tools such as phenix.dock_in_map30 can be used to automatically dock a model. Alternatively, UCSF Chimera31 provides manual translation/rotation of the model and a ‘Fit in Map’ function. To allow for easier interpretation, we recommend automatically sharpening or blurring the cryo-EM map with phenix.auto_sharpen32.
Once the initial atomic model and map files are prepared, cryo_fit can be executed on the command line,
or in the Phenix GUI (supporting Fig. S3).
Cryo_fit automatically performs a number of steps, from processing input files and generating various geometrical restraints to performing MD simulations in order to optimize the model to map fit. The model to map fit is quantified by the correlation coefficient between the cryo-EM map and the simulated map calculated from the atomic model (see Discussion section). Cryo_fit runs molecular dynamics in the presence of the cryo-EM term (proportional to the correlation between the experimental electric potential map and a density map calculated from the atomic model) with neutralized charges as a default option to improve speed.
Domain decomposition33 and molecular dynamics time steps are automatically assigned34 to ensure numerical stability over a large numbers of time steps. Depending on the computing hardware used, cryo_fit automatically utilizes multiple cores for optimal performance. Automatic choice of the relative weighting between the map and geometry restraints is used while maintaining acceptable stereochemistry. By default, cryo_fit uses simulated annealing in combination with molecular dynamics to escape local energy minima effectively16. The benefit of this was verified by our benchmark with five pairs of deposited models and maps (unpublished). Cryo_fit outputs the top three fitted models that have the highest model-to-map correlation, as well as molecular dynamics trajectory files, which can be visualized using PyMOL35, UCSF Chimera31, UCSF ChimeraX36, and VMD37 (an example trajectory is given in supporting information movie S1). Analyzing MD trajectories may be particularly useful to reveal possible conformational changes. The resulting fitted model may require some further refinement using phenix.real_space_refine13 to improve model geometry (e.g., eliminate residue rotamer or Ramachandran plot outliers). Continuous model and model-to-map fit validation24 is also desirable.
Results
Cryo_fit has been applied to a variety of systems, as small as the GTPase associated center (GAC) of the ribosome (~6,000 atoms) and as large as the intact ribosome (~200,000 atoms). Cryo_fit can fit both protein and nucleic acid atomic models into cryo-EM maps (Fig. 2–3). Running cryo_fit with default parameters was sufficient to improve cross-correlation values relative to a UCSF Chimera (‘Fit in Map’) fitted model of a protein and DNA complex (H1.0b-bound 197 bp nucleosome; PDB code 5nl0) from the Protein Data Bank (PDB)38 (Fig. 3).
For fittings that require large conformational changes (RMSD by PyMOL ≥ 7.0 Å), we calculated CCcryo_fit and CCmask with 4 benchmark pairs of starting models and maps (Fig. 4). The value of 7.0 Å was chosen by clustering the data according to RMSD. For phenix.real_space_refine, we turned on secondary structure restraints that are determined by default. The workflow that resulted in the highest correlation, on average, was the application of phenix.cryo_fit followed by the application of phenix.real_space_refine. This was true when measuring correlation with CCcryo_fit and when measuring the correlation with CCmask.
For fittings that require smaller conformational changes (RMSD by PyMOL < 7.0 Å), we calculated CCcryo_fit and CCmask with 8 benchmark pairs of starting models and maps (Fig. 5). Other than one pair (1gru/1046) whose cryo-EM map has low resolution (24.0 Å), cryo_fit produced either similar or improved fits. Thus, independent of the method that the original authors used (manual/semi-automatic/automatic), cryo_fit produces either comparable or better results in terms of fitting the cryo-EM map. Note that most of these fittings are achieved with default settings that can be changed to improve the cross-correlation scores based on the cryo-EM map properties. When measuring correlation with CCcryo_fit, the workflow of phenix.cryo_fit followed by phenix.real_space_refine resulted in the highest correlation values on average (0.60). In these same cases, when measuring by CCmask, phenix.real_space_refine alone resulted in the highest fit (0.70). This means that as we expected, gradient-based refinement only is sufficient for local fittings that require smaller conformational changes (Supporting Fig. S2). It should be noted that closer comparisons can be made by using phenix.real_space_refine with simulated annealing enabled.
Discussion
In our test cases, cryo_fit successfully fit most protein, DNA and RNA systems into cryo-EM maps. Here, we note that cryo_fit can be extended to lipids and carbohydrates when the parameters are added to the force field, which will be implemented in the future. Currently, unusual/unknown residues (such as UNK) or ligands are excluded from fitting39.
As previously shown (Fig. 5), cryo_fit does not always find a better cross-correlation (CC) value than other methods. During the process of development and implementation of cryo-EM map fitting software, we often found that large improvements of the correlation (both CCcryo_fit and CCmask) can degrade the model geometry, while severe restrictions in model geometry may also degrade the correlation. This necessitates a balance between the weight between the CCcryo_fit term and model geometry in the MD potential. However, as we assume that model geometry is adequate (as we have seen in supporting Table S1–2), these particular cases (i.e., when the CC value is not significantly increased even by cryo_fit) can be understood as follows: 1) the initial model is already well fit to the given cryo-EM map; 2) the cryo-EM map has low resolution; 3) the cryo-EM map needs to be filtered to remove unresolved regions; 4) the system is trapped in local energy/configuration minima and requires more time steps than those used in the default scheme for fitting, or 5) cryo_fit reduces overfitting, very much like crystallography modeling increases Rwork to reduce Rfree-Rwork gap.
One can forcefully increase cross-correlation by using a very high map weight. However, this may result in overfitting, distorting the stereochemistry of the model. The user can specify (e.g., in the case of an intrinsically disordered domain) high values of emweight_multiply_by. Then, cryo_fit will start with very high bias toward the cryo-EM map. In the case of low-resolution maps, it is reasonable to rely on the physically realistic force field more than forcing the model to fit into a less informative map. In certain cases, preceding the final fitting of the actual cryo-EM map by a preliminary step, fitting to a filtered map can be used to further improve cross-correlation values16,19. By filtering out less-resolved/dynamic region/map, the fitting procedure helps to produce large conformational changes (when necessary) by decreasing the number of local minima; the output of this step can then be used to fit the unfiltered map. When the system is trapped in local minima, longer simulation times can also be helpful (e.g., a larger value for number_of_steps_for_cryo_fit). In addition, one can perform local refinement on problematic regions using software packages such as ISOLDE40, a ChimeraX36 bundled package that allows a user to directly tug atoms with a mouse interface or applying/removing restraints for manual inspection and fixing local errors. Rosetta based41,42 or rosetta+MD simulation based43 approaches may also be useful in this case.
Interestingly, the various measures of correlation between the modeled atomic structure and the cryo-EM map report on different aspects of agreement and disagreement between model and map. A comprehensive benchmark study examined correlations related to (i) highest value points the model-calculated map (CCvolume and CCpeaks), (ii) masked regions of the map with the capability of more localized fits (CCmask), and (iii) a measure covering the entire map (CCbox), in an analogous fashion to CCcryo_fit24. CCmask provides an excellent method for more localized fits since CCmask can use map values inside a mask calculated around the macromolecule only. CCbox and CCcryo_fit can be useful in the cases of large-scale conformational changes, where the trajectory of the molecule is not known a priori (e.g., > 50Å motions of tRNAs required to fit intermediate states of tRNAs from the classical states of tRNAs within the ribosome). Generally speaking, CCbox is useful for determining how well a given atomic model matches the entire volume of the map. For example, if only one chain is modeled, while the full system consists of many chains, then CCbox will depict this. Just as cryo_fit uses the entire map for the CC calculation (e.g. CCcryo_fit, Kirmizialtin et al.16 eq. 4), most MD simulation based cryo-EM map fitting programs use correlation measures similar to Eq. 119,21,25 or related equations (e.g., similar to Eq. 1, but subtracting the average from ρsim and ρEM)44,45. It should be emphasized the CCmask is the standard measure of model-to-map fit that is a more focused, localized and is a more accurate method of measuring the correlation.
To accelerate scientific discoveries of biological molecular structures and dynamics, all aspects of cryo-EM need to be readily accessible to structural biologists. This will require consistent and reproducible outcomes, and procedural automations, such as automated blotting46, direct ice thickness measurement in the single particle collection workflow47, prediction of image-quality of particles48, automated data acquisition49,50,51, beam induced motion correction52, contrast transfer function correction53,54, map reconstruction and refinement12,55,56, determination of map local resolution57, map segmentation58, map sharpening32, atomic model building28,29, comparison of time-averaged density maps from molecular dynamics simulations44, atomic model refinement13,59, and atomic model validation60,61. It is hoped that cryo_fit helps democratize global and local flexible map fitting procedures in these automation endeavors.
Supplementary Material
1
Movie S1. Structural rearrangement of adenylate kinase upon ligand binding (from “open” to “close” conformation)64,65. Rendered by cryo_fit and UCSF Chimera (a movie file is available).
2
Acknowledgments
Authors appreciate beta-testers of cryo_fit and their valuable feedbacks. This work was supported by NIH grants: NIGMS R01-GM072686 (to KS), and NIGMS P01-GM063210 (to PDA). This work was supported in part by US Department of Energy under Contract No. DE-AC02-05CH11231 (Lawrence Berkeley National Laboratory). KS and DK were supported in part by LANL LDRD for portions of cryo_fit. SK was supported by NYU Abu Dhabi Faculty Research Fund (ST181).
Footnotes
Program Development and Availability
Other than the Phenix GUI interface and simple command line execution with cctbx libraries, all cryo_fit code is either embedded in C language based gromacs 4.5.5 extensions or as python 2.7 based scripts to automate the input, output and execution of cryo_fit. cryo_fit can be installed into macOS and linux (CentOS and Ubuntu), and all codes are available at https://github.com/cryoFIT/cryo_fit. The gromacs_cryo_fit installation file is located at https://github.com/cryoFIT/cryo_fit_install. Documentation is available at www.phenix-online.org62.