Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 13:9:877000.
doi: 10.3389/fmolb.2022.877000. eCollection 2022.

AlphaFold Models of Small Proteins Rival the Accuracy of Solution NMR Structures

Affiliations

AlphaFold Models of Small Proteins Rival the Accuracy of Solution NMR Structures

Roberto Tejero et al. Front Mol Biosci. .

Abstract

Recent advances in molecular modeling using deep learning have the potential to revolutionize the field of structural biology. In particular, AlphaFold has been observed to provide models of protein structures with accuracies rivaling medium-resolution X-ray crystal structures, and with excellent atomic coordinate matches to experimental protein NMR and cryo-electron microscopy structures. Here we assess the hypothesis that AlphaFold models of small, relatively rigid proteins have accuracies (based on comparison against experimental data) similar to experimental solution NMR structures. We selected six representative small proteins with structures determined by both NMR and X-ray crystallography, and modeled each of them using AlphaFold. Using several structure validation tools integrated under the Protein Structure Validation Software suite (PSVS), we then assessed how well these models fit to experimental NMR data, including NOESY peak lists (RPF-DP scores), comparisons between predicted rigidity and chemical shift data (ANSURR scores), and 15N-1H residual dipolar coupling data (RDC Q factors) analyzed by software tools integrated in the PSVS suite. Remarkably, the fits to NMR data for the protein structure models predicted with AlphaFold are generally similar, or better, than for the corresponding experimental NMR or X-ray crystal structures. Similar conclusions were reached in comparing AlphaFold2 predictions and NMR structures for three targets from the Critical Assessment of Protein Structure Prediction (CASP). These results contradict the widely held misperception that AlphaFold cannot accurately model solution NMR structures. They also document the value of PSVS for model vs. data assessment of protein NMR structures, and the potential for using AlphaFold models for guiding analysis of experimental NMR data and more generally in structural biology.

Keywords: AlphaFold; X-ray crystal structure analysis; artificial inteligence; automated structure determination; protein NMR; protein structure prediction; structure validation.

PubMed Disclaimer

Conflict of interest statement

GTM is a founder of Nexomics Biosciences, Inc. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Schematic description of RPF-DP scores. In this analysis, the graph G with nodes corresponding to all 1H’s and edges representing all short (e.g., <5 Å) 1H-1H distances in a structure model (left), is compared with a graph GANOE (right), in which nodes again correspond to all 1H’s, and edges describe all possible assignments for each NOESY cross peak. True positives (TPs) are edges common to both G and GANOE, false positives (FPs) are edges present in G but not in GANOE, and false negatives (FNs) are the set of edges in GANOE representing the multiple possible assignments of a NOESY cross peak, none of which are present in G. These metrics are used to compute recall (R), precision (P), and F-measure as shown in the figure and outlined in the Methods Section. The F-measure is the harmonic mean of the recall and precision. The Discriminating Power (DP) is a normalized F-measure corrected to account for the F-measure expected for a random-coil chain (DP = 0) and the best F-measure possible considering the completeness of the NMR data (DP = 1.0). Since NOESY data is restricted to short distances (e.g., <5 Å), true negatives (TNs, peaks not expected from the model and not observed in the NOESY data) can dominate these statistics and are not included in these recall, precision, and F-measure metrics. Figure and legend are adopted from Huang et al. (2021).
FIGURE 2
FIGURE 2
Plots of DP score vs. GDT for NMR and AlphaFold models. For each model, the DP score compares model vs. NMR NOESY peak list data, and the GDT score is a measure of similarity to the NMR conformer with best DP score (Huang et al., 2021). Plots are provided for (A) target T1055 (511 CASP14 models; linear correlation coefficient r2 = 0.66) (B) target T1027 (520 CASP14 models; r2 = 0.51) (C) target T1029_original (529 CASP14 models; r2 = 0.05), and (D) target T1029_revised (529 CASP14 models; r2 = 0.87). Open circles are values for CASP14 prediction models (excluding AF models), red squares are the NMR structure models deposited in the PDB, and blue triangles are AF prediction models. In panels (C) and (D), the original NMR structures of target T1029, before revised analysis of NOESY data, are indicated by yellow squares. Negative DP scores are returned for a few models which fit the NMR data more poorly than expected for a random coil conformation (models with DP < 0, not shown) and were not include in the calculations of linear correlation coefficients. These data are replotted from reference Huang et al. (2021).
FIGURE 3
FIGURE 3
ANSURR Correlation vs. RMSD scores for NMR and AlphaFold prediction models. (A) CASP14 target T1055, (B) T1027, (C) T1029 (data shown for both T1029_original and T1029_revised NMR structures), and (D) target T1027_trimmed (residues 36–75 and 96–145) in which coordinates are trimmed to remove the structurally not-well defined (i.e., unreliable) polypeptide segments. As in Figure 2, in each panel, the open circles are CASP 14 prediction models (excluding AF models), red squares are the final NMR structure models, including T1029_revised, blue triangles are AF prediction models, and yellow squares [in panel (C)] are for the original NMR structure of target T1029, i.e., T1029_original.
FIGURE 4
FIGURE 4
Plots of ANSURR composite score vs. GDT for NMR and AlphaFold models. The data of Figure 3 were replot to compare the sum of ANSURR correlation and RMSD scores vs. GDT (Huang et al., 2021). Plots are provided for CASP14 targets (A) T1055 (linear correlation coefficient r2 = 0.35), (B) T1027 (r2 = 0.47), (C) T1029 (r2 = 0.57; data shown for both T1029_original and T1029_revised NMR structures), and (D) T1027_trimmed (residues 36–75 and 96–145, r2 = 0.11) in which coordinates are trimmed to remove the structurally not-well defined or unreliably predicted polypeptide segments. In each panel, the open circles are the CASP 14 prediction models (excluding AF models), red squares are the final NMR structure models deposited in the PDB, blue triangles are AF prediction models, and yellow squares are for the original NMR structures of target T1029, before revised analysis of NOESY data.
FIGURE 5
FIGURE 5
Comparison of AlphaFold, NMR and X-ray crystallography models. Superimposed backbone structures of solution NMR structures (NMR, blue), X-ray crystal structures (X-ray, grey), and AlphaFold prediction models (AF, red) for six proteins selected from the NESG NMR/X-ray pairs database. Below each superimposition is a matrix of backbone structurally-similarity statistics. The upper diagonal provides GDT-TS scores, and the lower diagonal Cα backbone RMSDs. The diagonals (with values in red) are Cα RMSD’s within the corresponding superimposed conformer ensemble relative to the medoid conformer. These models are compared only for residues that are both “well-defined” in the NMR ensemble and “reliably predicted” in the AlphaFold models, as indicated in Table 1. For NMR and AF model ensembles, the coordinates of the medoid conformers are compared. For NMR structures refined with RDC data (i.e., targets RpR324, SgR209C, and SrR115C) the image provided is for the medoid conformer of the structure determined with these RDC data.
FIGURE 6
FIGURE 6
AF structures have excellent fit to RDC data. Comparison of experimentally measured 15N-1H RDC data (plotted on x-axis) and values computed from experimental or prediction models using PDBStat (Tejero et al., 2013). The data points are for (blue) NMR models determined without RDC data, (green) NMR models refined with 15N-1H RDC data, (red) AlphaFold prediction models, and (gold) X-ray crystal structures. For NMR and AlphaFold model ensembles, the medoid conformer of the well-defined regions (as indicated in Table 1) were compared. The linear correlation coefficient (R2) for each data set is shown in the inset.
FIGURE 7
FIGURE 7
Detailed comparison of solution NMR, X-ray crystal, and AF models of target RpR324. (A) Backbone ribbon representation of solution NMR structure of RpR324 (medoid conformer from PDB ID 2LPK, apo AcpXL) (blue), overlayed with AlphaFold structure calculated as monomer (red), and X-ray crystal structure (PDB ID 3LMO) (grey). (B) Overlay of AlphaFold structure calculated as monomer (red) and calculated as a dimer (orange) using AlphaFold-multimer software. (C) Comparison of (left) AlphaFold dimer (orange) with α3 helix highlighted in cyan (left), with (right) X-ray crystal structure (grey), with α3 helix highlighted in magenta. (D) Overlay of protomers from dimeric AlphaFold model (orange) with X-ray crystal structure (grey), illustrating the significant difference in orientation of α3 helix.

Similar articles

Cited by

References

    1. Anishchenko I., Pellock S. J., Chidyausiku T. M., Ramelot T. A., Ovchinnikov S., Hao J., et al. (2021). De Novo protein Design by Deep Network Hallucination. Nature 600 (7889), 547–552. 10.1038/s41586-021-04184-w - DOI - PMC - PubMed
    1. Baek M., Anishchenko I., Park H., Humphreys I. R., Baker D. (2021a). Protein Oligomer Modeling Guided by Predicted Interchain Contacts in CASP14. Proteins 89 (12), 1824–1833. 10.1002/prot.26197 - DOI - PMC - PubMed
    1. Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G. R., et al. (2021b). Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 373 (6557), 871–876. 10.1126/science.abj8754 - DOI - PMC - PubMed
    1. Bhattacharya A., Tejero R., Montelione G. T. (2007). Evaluating Protein Structures Determined by Structural Genomics Consortia. Proteins 66 (4), 778–795. 10.1002/prot.21165 - DOI - PubMed
    1. Buchan D. W. A., Jones D. T. (2018). Improved Protein Contact Predictions with the MetaPSICOV2 Server in CASP12. Proteins 86 (Suppl. 1), 78–83. 10.1002/prot.25379 - DOI - PMC - PubMed

LinkOut - more resources

-