Learn more: PMC Disclaimer | PMC Copyright Notice
Molecular Docking: A powerful approach for structure-based drug discovery
Abstract
Molecular docking has become an increasingly important tool for drug discovery. In this review, we present a brief introduction of the available molecular docking methods, and their development and applications in drug discovery. The relevant basic theories, including sampling algorithms and scoring functions, are summarized. The differences in and performance of available docking software are also discussed. Flexible receptor molecular docking approaches, especially those including backbone flexibility in receptors, are a challenge for available docking methods. A recently developed Local Move Monte Carlo (LMMC) based approach is introduced as a potential solution to flexible receptor docking problems. Three application examples of molecular docking approaches for drug discovery are provided.
Introduction
The completion of the human genome project has resulted in an increasing number of new therapeutic targets for drug discovery. At the same time, high-throughput protein purification, crystallography and nuclear magnetic resonance spectroscopy techniques have been developed and contributed to many structural details of proteins and protein–ligand complexes. These advances allow the computational strategies to permeate all aspects of drug discovery today [1-5], such as the virtual screening (VS) techniques [6] for hit identification and methods for lead optimization. Compared with traditional experimental high-throughput screening (HTS), VS is a more direct and rational drug discovery approach and has the advantage of low cost and effective screening [7-9]. VS can be classified into ligand-based and structure-based methods. When a set of active ligand molecules is known and little or no structural information is available for targets, the ligand-based methods, such as pharmacophore modeling and quantitative structure activity relationship (QSAR) methods can be employed. As to structure-based drug design, molecular docking is the most common method which has been widely used ever since the early 1980s [10]. Programs based on different algorithms were developed to perform molecular docking studies, which have made docking an increasingly important tool in pharmaceutical research. Various excellent reviews on docking have been published in the past [5, 11-14], and many comparison studies were conducted to evaluate the relative performance of the programs [15-18].
The molecular docking approach can be used to model the interaction between a small molecule and a protein at the atomic level, which allow us to characterize the behavior of small molecules in the binding site of target proteins as well as to elucidate fundamental biochemical processes [19]. The docking process involves two basic steps: prediction of the ligand conformation as well as its position and orientation within these sites (usually referred to as pose) and assessment of the binding affinity. These two steps are related to sampling methods and scoring schemes, respectively, which will be discussed in the theory section.
Knowing the location of the binding site before docking processes significantly increases the docking efficiency. In many cases, the binding site is indeed known before docking ligands into it. Also, one can obtain information about the sites by comparison of the target protein with a family of proteins sharing a similar function or with proteins co-crystallized with other ligands. In the absence of knowledge about the binding sites, cavity detection programs or online servers, e.g. GRID[20, 21], POCKET [22], SurfNet [23, 24], PASS [25] and MMC [26] can be utilized to identify putative active sites within proteins. Docking without any assumption about the binding site is called blind docking.
The early elucidation for the ligand-receptor binding mechanism is the lock-and-key theory proposed by Fischer [27], in which the ligand fits into the receptor like lock and key. The earliest reported docking methods [10] were based on this theory and both the ligand and receptor were treated as rigid bodies accordingly. Then the “induced-fit” theory [28, 29] created by Koshland takes the lock-and-key theory a step further, stating that the active site of the protein is continually reshaped by interactions with the ligands as the ligands interact with the protein. This theory suggests that the ligand and receptor should be treated as flexible during docking. Consequently, it could describe the binding events more accurately than the rigid treatment.
Considering the limitation of computer resources, docking has been performed with a flexible ligand and a rigid receptor for a long time, and remains the most popular method in use [7, 30-35]. Recently many efforts have been made to deal with the flexibility of the receptor [36-42], however, flexible receptor docking, especially backbone flexibility in receptors, still presents a major challenge for available docking methods. In our study, we propose a Local Move Monte Carlo (LMMC) approach as a potential solution to flexible receptor docking problems.
Theory of docking
Essentially, the aim of molecular docking is to give a prediction of the ligand-receptor complex structure using computation methods. Docking can be achieved through two interrelated steps: first by sampling conformations of the ligand in the active site of the protein; then ranking these conformations via a scoring function. Ideally, sampling algorithms should be able to reproduce the experimental binding mode and the scoring function should also rank it highest among all generated conformations. From these two perspectives, we give a brief overview of basic docking theory.
Sampling algorithms
With six degrees of translational and rotational freedom as well as the conformational degrees of freedom of both the ligand and protein, there are a huge number of possible binding modes between two molecules. Unfortunately, it would be too expensive to computationally generate all the possible conformations. Various sampling algorithms have been developed and widely used in molecular docking software (Table 1).
Table 1
Some sampling algorithms discussed in this paper.
Algorithms | Characteristic | Reference |
---|---|---|
Matching algorithms | Geometry-based, suitable to VS and database enrichment for its high speed | [43-45] |
Incremental construction | Fragment-based and docking incrementally | [30, 49, 50] |
MCSS | fragment-based methods for the de novo design | [55, 56] |
LUDI | fragment-based methods for the de novo design | [57] |
Monte Carlo | Stochastic search | [58, 59] |
Genetic algorithms | Stochastic search | [31, 32, 64] |
Molecular dynamics | For further refinement after docking | [68-70] |
Matching algorithms (MA) [43-45] based on molecular shape map a ligand into an active site of a protein in terms of shape features and chemical information. The protein and the ligand are represented as pharmacophores. Each distance of the pharmacophore within the protein and ligand is calculated for a match; new ligand conformations are governed by the distance matrix between the pharmacophore and the corresponding ligand atoms. Chemical properties, like hydrogen-bond donors and acceptors, can be taken into account during the match. Matching algorithms have the advantage of speed; thus they may be used for the enrichment of active compounds from large libraries [7]. Matching algorithms for ligand docking are available in DOCK [10], FLOG [46], LibDock [47] and SANDOCK [48] programs.
Incremental construction (IC) [30, 49, 50] methods put the ligand into an active site in a fragmental and incremental fashion. The ligand is divided into several fragments by breaking its rotatable bonds and then one of these fragments is selected to dock into the active site first. This anchor is usually the largest fragment or the piece which may have significant functional role or interaction with protein. The remaining fragments can be added incrementally. Different orientations are generated to fit in the active site, which realizes the flexibility of the ligand. The incremental construction method has been used in DOCK 4.0 [51], FlexX [30], Hammerhead [52], SLIDE [53] and eHiTS [54].
In addition to IC, Multiple Copy Simultaneous Search (MCSS) [55, 56] and LUDI [57] are fragment-based methods for the de novo design of ligands and modifications of known ligands that may enhance their binding to the target protein. MCSS makes 1,000 to 5,000 copies of a functional group, which are randomly placed in the binding site of interest and subjected to simultaneous energy minimization and/or quenched molecular dynamics in the forcefield of the protein. Copies only interact with the proteins and any interactions among the copies are omitted. Consequently a set of energetically favorable binding sites and orientations for the functional group is identified based on the interaction energies. The binding site is mapped by using different functional groups. New molecules which perfectly match the binding site can be designed through the linkage of those different functional groups.
LUDI focuses on the hydrogen bonds and hydrophobic contacts which could be formed between the ligand and protein. Its central concept are interaction sites, which are discrete positions in space suitable for forming hydrogen bonds or for filling a hydrophobic pocket [57]. A set of interaction sites is generated either by searching the database or using the rules. The fragment is then fitted onto the interaction sites and evaluated by distance criteria. The final step is the connection of some or all of the fitted fragments to a single molecule.
Stochastic methods search the conformational space by randomly modifying a ligand conformation or a population of ligands. Monte Carlo (MC) and genetic algorithms are two typical algorithms that belong to the class of stochastic methods.
Monte Carlo (MC) [58, 59] methods generate poses of the ligand through bond rotation, rigid-body translation or rotation. The conformation obtained by this transformation is tested with an energy- based selection criterion. If it passes the criterion, it will be saved and further modified to generate next conformation. The iterations will proceed until the pre-defined quantity of conformations is collected. The main advantage of MC is that the change can be quite large allowing the ligand to cross the energy barriers on the potential energy surface, a point that isn’t achieved easily by molecular dynamics based simulation methods. Examples of applying the Monte Carlo methods include an earlier version of AutoDock [60], ICM [61], QXP [62] and Affinity [63].
Genetic algorithms (GA) [31, 32, 64] form another class of well-known stochastic methods. The idea of the GA stems from Darwin’s theory of evolution. Degrees of freedom of the ligand are encoded as binary strings called genes. These genes make up the ‘chromosome’ which actually represents the pose of the ligand. Mutation and crossover are two kinds of genetic operators in GA. Mutation makes random changes to the genes; crossover exchanges genes between two chromosomes. When the genetic operators affect the genes, the result is a new ligand structure. New structures will be assessed by scoring function, and the ones that survived (i.e., exceeded a threshold) can be used for the next generation. Genetic algorithms have been used in AutoDock [31], GOLD [65], DIVALI [66] and DARWIN [67].
Molecular dynamics (MD) [68-70] is widely used as a powerful simulation method in many fields of molecular modeling. In the context of docking, by moving each atom separately in the field of the rest atoms, MD simulation represents the flexibility of both the ligand and protein more effectively than other algorithms. However, the disadvantage of MD simulations is that they progress in very small steps and thus have difficulties in stepping over high energy conformational barriers, which may lead to inadequate sampling. On the other hand, MD simulations are often efficient at local optimization. Thus a current strategy is to use random search in order to identify the conformation of the ligand, followed by the further subtle MD simulations.
Scoring functions
The purpose of the scoring function is to delineate the correct poses from incorrect poses, or binders from inactive compounds in a reasonable computation time. However, scoring functions involve estimating, rather than calculating the binding affinity between the protein and ligand and through these functions, adopting various assumptions and simplifications. Scoring functions can be divided in force-field-based, empirical and knowledge-based scoring functions [5]. Table 2 shows some examples of scoring function formulae belonging to those three classes of scoring functions respectively.
Table 2
Examples of scoring function formulae
Scoring function formulae | Ref. |
---|---|
[31] | |
Extended force-field-based scoring function from AutoDock. | |
For two atoms i, j, the pair-wise atomic energy is evaluated by the sum of van der Waals, hydrogen bond, coulomb energy and desolvation. W are weighted factors for calibrate the empirical free energy. | |
| |
[30] | |
Empirical scoring function from FlexX. | |
ΔG is the estimated free energy of binding; ΔG0 is the regression constant; ΔGrot , ΔGhb , ΔGio , ΔGaro and ΔGlipo are regression coefficients for each corresponding free energy term; f (ΔR,Δα) is scaling function penalizing deviations from the ideal geometry; Nrot is the number of free rotate bonds that are immobilized in the complex. | |
| |
[84] | |
Knowledge-based scoring functions PMF. | |
kB is the Boltzmann constant; T is the absolute temperature; r is the atom pair distance. is the ligand volume correction factor; designates the radial distribution function of a protein atom of type i and a ligand atom of type j. |
Classical force-field-based scoring functions [71-73] assess the binding energy by calculating the sum of the non-bonded (electrostatics and van der Waals) interactions. The electrostatic terms are calculated by a Coulombic formulation. Since such point charge calculations have problems in modeling the protein’s real environment a distance-dependent dielectric function is generally used to modulate the contribution of charge–charge interactions. The van der Waals terms are described by a Lennard-Jones potential function. Adopting different parameter sets for the Lennard-Jones potential can vary the “hardness” of the potential which controls how close a contact between protein and ligand atoms can be acceptable. Force-field-based scoring functions also have the problem of slow computational speed. Thus cut-off distance is used to handle the non-bonded interactions. This also results in decreasing the accuracy of long-range effects involved in binding.
Extensions of force-field-based scoring functions consider the hydrogen bonds, solvations and entropy contributions. Software programs, such as DOCK [10, 50, 51, 74], GOLD [65] and AutoDock [31], offer users such functions. They have some differences in the treatment of hydrogen bonds, the form of the energy function etc.. Furthermore, the results of docking with force-field-based functions can be further refined with other techniques, such as linear interaction energy [75] and free-energy perturbation methods (FEP) [71, 76] to improve the accuracy in predicting binding energies.
In empirical scoring functions [77-81], binding energy decomposes into several energy components, such as hydrogen bond, ionic interaction, hydrophobic effect and binding entropy. Each component is multiplied by a coefficient and then summed up to give a final score. Coefficients are obtained from regression analysis fitted to a test set of ligand-protein complexes with known binding affinities.
Empirical scoring functions have relatively simple energy terms to evaluate. However, it is unclear as to how well they are suited for ligand-protein complexes beyond the training set. Additionally, each term in empirical scoring functions may be treated in a different manner by different software, and the numbers of the terms included are also different. LUDI [57], PLP [78, 79, 82], ChemScore [83] are examples derived from empirical scoring functions
Knowledge-based scoring functions [84-89] use statistical analysis of ligand-protein complexes crystal structures to obtain the interatomic contact frequencies and/or distances between the ligand and protein. They are based on the assumption that the more favorable an interaction is, the greater the frequency of occurrence will be. These frequency distributions are further converted into pairwise atom-type potentials. The score is calculated by favoring preferred contacts and penalizing repulsive interactions between each atom in the ligand and protein within a given cutoff.
The appeal of knowledge-based functions is computational simplicity, which can be exploited to screen large compound databases. They can also model some uncommon interactions like sulphur-aromatic or cation-π, which are often poorly handled in empirical approaches. However, they are still faced with the problem that some interactions are underrepresented in the limited training sets of crystal structures as well as by the bias inherent in the selection of proteins for successful structure determination thus the obtained parameters may not be suitable for widespread use, especially with interactions involving metals or halogens. PMF [84], DrugScore [90], SMoG [91] and Bleep [85] are examples of knowledge-based functions which differ mainly in the size of training sets, the form of the energy function, the definition of atom types, distance cutoff or other parameters.
Consensus scoring [92] is a recent strategy that combines several different scores to assess the docking conformation. A pose of ligand or a potential binder could be accepted when it scores well under a number of different scoring schemes. Consensus scoring usually substantially improves enrichments (i.e., the percentage of strong binder among the high-scoring ligands) in virtual screening, and improves the prediction of bound conformations and poses [93]. However, the prediction of binding energies might still be inaccurate. Also, the usefulness of consensus scoring diminishes when terms in different scoring functions are significantly correlated [5, 93]. CScore [94] is an example of which combines DOCK, ChemScore, PMF, GOLD, and FlexX scoring functions.
Typical scoring functions face the problem of affinity prediction partly because of the limited treatment of solvation effect. One of the ways to solve this problem is physics-based scoring, e.g. MM-PB/SA and MM-GB/SA (MM stands for molecular mechanics, PB and GB for Poisson-Boltzmann and Generalized Born, respectively, SA for solvent-accessible surface area), which is involved in rescoring or lead optimization to improve the accuracy of binding affinity prediction. Promising results were obtained using MM-PB/SA [95, 96] or MM-GB/SA [97] in some studies. However, recently Guimarães and Mathiowetz reported that the GB/SA model poorly estimated protein desolvation on certain systems, while incorporating WaterMap into the MM-GB/SA method instead of GB/SA protein desolvation gave the best ranking result [98]. Singh and Warshel compared several methods for evaluating the affinity of protein-ligand complexes and suggested that PDLD/S-LRA/β (protein dipoles Langevin dipoles linear response approximation) appears to offer an appealing option for the final stages of massive VS and in contrast, PB/SA appears to provide erroneous estimates of the absolute binding energies because of its incorrect estimation of entropies and the problematic treatment of electrostatic energies [99].
Docking methodologies
Rigid ligand and rigid receptor docking
When the ligand and receptor are both treated as rigid bodies, the search space is very limited, considering only three translational and three rotational degrees of freedom. In this case, ligand flexibility could be addressed by using a pre-computed a set of ligand conformations, or by allowing for a degree of atom–atom overlap between the protein and ligand. The early versions of DOCK [10, 50, 51, 74], FLOG [46] and some protein-protein docking programs, such as FTDOCK [100], adopted such a method that kept the ligand and receptor rigid during the process of the docking.
DOCK is the first automated procedure for docking a molecule into a receptor site and is being continuously developed. It characterizes the ligand and receptor as sets of spheres which could be overlaid by means of a clique detection procedure [101]. Geometrical and chemical matching algorithms are used, and the ligand-receptor complexes can be scored by accounting for steric fit, chemical complementation or pharmacophore similarity. Within its improved versions, incremental construction method and exhaustive search are added to consider the ligand flexibility. The exhaustive search randomly generates a user-defined number of conformers as a multiple of the number of rotatable bonds in the ligand. With respect to scoring, the latest version DOCK 6.4 has included both an AMBER-derived force-field scoring with implicit solvent [102] and GB/SA, PB/SA solvation scoring [97, 103].
FLOG generates ligand conformations on the basis of distance geometry and uses a clique-finding algorithm to calculate the sets of distances. Up to 25 explicit conformations of the ligand could be used to dock for some flexibility. FLOG allows users to define essential points which must be paired with a ligand atom. This approach is useful if an important interaction is already known before docking. Conformations are scored with a function considering van der Waals, electrostatics, hydrogen bonding and hydrophobic interactions.
Flexible ligand and rigid receptor docking
For systems whose behavior follows the induced fit paradigm [28, 29], it is of vital importance to consider the flexibilities of both the ligand and receptor since in that case both the ligand and receptor change their conformations to form a minimum energy perfect-fit complex. However, the cost is very high when the receptor is also flexible. Thus the common approach, also a trade-off between accuracy and computational time, is treating the ligand as flexible while the receptor is kept rigid during docking. Almost all the docking programs have adopted this methodology, such as AutoDock [31], FlexX [30].
AutoDock 3.0 incorporates Monte Carlo simulated annealing, evolutionary, genetic and Lamarckian genetic algorithm methods to model the ligand flexibility while keeping the receptor rigid. The scoring function is based on the AMBER force field, including van der Waals, hydrogen bonding, electrostatic interactions, conformational entropy and desolvation terms. Each term is weighted using an empirical scaling factor obtained from experimental data. AutoDock 4.0 is able to model receptor flexibility by allowing side-chains to move. Additionally, interaction of protein-protein docking could be evaluated in this version of AutoDock. AutoDock Vina was recently released as the latest version for molecular docking and virtual screening [104]. By redocking the 190 receptor-ligand complexes that had been used as a training set for the AutoDock 4, AutoDock Vina simultaneously showed approximately a two orders exponential improvement of magnitude in speed and a significantly better accuracy of the binding mode prediction.
FlexX uses an incremental construction algorithm to sample ligand conformations. The base fragment is first docked into the active site by matching hydrogen bond pairs and metal and aromatic ring interactions between the ligand and protein. Then the remaining components are incrementally built-up in accordance with a set of predefined rotatable torsion angles to account for ligand flexibility. The FlexX scoring function is based on Böhm’s work [105]. Its current version includes terms of electrostatic interactions, directional hydrogen bonds, rotational entropy, and aromatic and lipophilic interactions. The interactions between functional groups are also taken into account through assigning the type and geometry for groups.
Flexible ligand and flexible receptor docking
The intrinsic mobility of proteins has been proved to be closely related to ligand binding behavior and it has been reviewed by Teague [106]. Incorporating the receptor flexibility is significant challenge in the field of docking. Ideally, using MD simulations could model all the degrees of freedom in the ligand-receptor complex. But MD has the problem of inadequate sampling that we mentioned earlier. Another hurdle is its high computational expense, which prevents this method from being used in the screening of large chemical database.
In addition to the historic induced fit several theoretical models, conformer selection and conformational induction, have been proposed to illustrate the flexible ligand-protein binding process. According to the definition given by Teague [106], conformer selection refers to a process when a ligand selectively binds to a favorable conformation from a number of protein conformations; conformational induction describes a process in which the ligand converts the protein into a conformation that it would not spontaneously adopt in its unbound state. In some cases, this conformational conversion can be likened to a partial refolding of the protein.
Various methods are currently available to implement the receptor flexibility (Table 3). The simplest one is so-called “soft-docking” [37, 107, 108], decreases the van der Waals repulsion energy term in the scoring function to allow for a degree of atom-atom overlap between the receptor and ligand. For example, the LJ 8-4 potential in GOLD and smooth potential in AutoDock 3.0 belong to this class. This method may not include adequate flexibility. Nevertheless, it has the advantage of computational efficiency as the receptor coordinates are fixed, simply by adjusting van der Waals parameters.
Table 3
Some basic methods for including receptor flexibility.
Method | Description | Advantage | Disadvantage | Program |
---|---|---|---|---|
Soft potential | Change vdW to allow for overlap between receptor and ligand atoms | Computational efficiency. Easy to implement and use combined with other methods. | Inadequate flexibility. Describe flexibility in an implicit, rude and non- quantitative way. | GOLD [65] AutoDock [31] |
| ||||
Rotamer library | Search side chain library to obtain possible conformations | Relative computational efficiency. Avoid minimization barriers. | Strong dependence on the database used. No backbone flexibility. | ICM [61] |
Receptor side chain flexibility | Sample both side chain and ligand conformations simultaneously using GA | Relative computational efficiency. Model the effect that ligand make on binding site residues. | Only selected side chains are involved. No backbone flexibility. | AutoDock 4 [112] |
Ensemble of protein conformations | Docking ligand to a series of receptor structures which represent different conformational states. | Include full and explicit flexibility. | Expensive computational cost. Limited by protein conformations used in sampling. | DOCK [113] FlexE [38] |
Utilizing rotamer libraries [109, 110] is another approach to modeling receptor flexibility. Rotamer libraries include a set of side-chain conformations which are usually determined from statistical analysis of structural experimental data. The advantage of using rotamers is the relative speed in sampling, and the avoiding of minimization barriers. ICM (Internal Coordinates Mechanics) [61] is a program using rotamer libraries with the biased probability methodology [111], coupled with Monte Carlo search of the ligand conformation.
AutoDock 4 [112] adopts a simultaneous sample method to deal with side chain flexibility. Several side chains of the receptor can be selected by users and simultaneously sampled with a ligand using the same methods. Other portions of the receptor are treated rigidly with a grid energy map during sampling. Grid energy map introduced by Goodford [20] is used to store energy information of the receptor and simplify interaction energy calculation between ligand and receptor.
Still another way to deal with the protein flexibility is to use an ensemble of protein conformations, which corresponds to the theory of conformer selection [113, 114]. A ligand is separately docked into a set of rigid protein conformations rather than a single one, and the results are merged depending on the method of choice [115]. This method was originally implemented in DOCK, which generates an average potential energy grid of the ensemble [113] and is extended in many programs in different ways. For example, FlexE [38] collects multiple crystal structures of a certain protein, merging the similar parts while marking the dissimilar areas as different alternatives. During the incremental construction of a ligand discrete protein conformations are sampled in a combinatorial fashion. The highest scoring protein structure is selected based on a comparison between the ligand and each alternative.
Hybrid method is another practical strategy to model receptor flexibility. One example is Glide [33], a very popular program in the field of docking. Glide designs a series of hierarchical filters to search the possible poses and orientations of the ligand within the binding site of the receptor. Ligand flexibility is handled by an exhaustive search of the ligand torsion angle space. Initial ligand conformations are selected based on torsion energies and docked into receptor binding sites with soft potentials. Then a rotamer exploration is used to further model receptor flexibility [36]. IFREDA [115] utilizes a hybrid method that combines soft potential and multiple receptor conformations, accounting for receptor flexibility. Other programs, like QXP [62] and Affinity [63], perform a Monte Carlo search of ligand conformations followed by a minimization step. During minimization, the user-defined parts of the protein are allowed to move in order to avoid atom clashes between the ligand and receptor. SLIDE [53] is designed to incorporate flexibility with the ability to remove clashes by directed, single bond rotation of either the ligand or the side chains of the protein. An optimization approach based on the mean-field theory is applied to model induced-fit complementarities between the ligand and protein.
Methods mentioned above either include only side chain flexibility or full flexibility of the receptor. We have known that loops forming active sites play an important role in ligand binding. In some cases the loop may undergo dramatic conformational change whereas in other portions of the receptor there is little change upon ligand binding. For this situation, side chain flexibility methods fail to sample the correct protein conformation and full flexibility seems to be a computational waste. Figure 1 shows superimposed crystal structures of triosephosphate isomerase as an example. The active site of triosephosphate isomerase has an 11-residue loop which moves 7Å upon ligand binding [116]. However, the rest of the enzyme has no movement in comparison to their apo and holo structures. Several enzyme families also involve loop rearrangement within the active site responsible for ligand binding, such as Bromodomain, an extensive family related to acetyl-lysine binding, or Dihydrofolate reductase, responsible for the maintenance of the cellular pools of tetrahydrofolate, as well as other kinds of kinases [117, 118]. In the next section, we present the Local Move Monte Carlo (LMMC) loop sampling method, a new approach which focuses on sampling ligand conformation within loop-containing active sites.
Local Move Monte Carlo sampling for flexible receptor docking
Local move (also referred to as ‘window move’) starts with changing one torsion angle (called the driver torsion) followed by the adjustment of the six subsequent torsions to allow the rest of the chain to remain in its original position while preserving all bond lengths and bond angles (Figure 2). The pioneering work on local move was done by Go and Scheraga [119], who developed a solution for the system of equations defining the values of the six torsion angles that preserve the backbone bond lengths and angles. Hoffmann and Knapp first applied the local move method in a MC simulation of polyalanine folding that included a suitable Jacobian [120], required for maintaining detailed balance. They demonstrated that this method samples the conformational space more efficiently than single move [121]. The method has been further tested on proline-containing peptides [122], proteins and nucleic acids [123]. Mezei introduced the ‘reverse proximity criterion’ for filtering all possible loop closure solutions to select the most structurally conservative one and tested it on a solvated lipid bilayer [124].
![An external file that holds a picture, illustration, etc.
Object name is nihms-308746-f0002.jpg An external file that holds a picture, illustration, etc.
Object name is nihms-308746-f0002.jpg](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151162/bin/nihms-308746-f0002.jpg)
Local move of a lipid tail. Six subsequent torsions change while keeping the rest of the chain to remain in its original position.
We have developed an improved local move Monte Carlo (LMMC) loop sampling approach for loop predictions. The method generates loop conformations based on simple moves of the torsion angles of side chains and local moves of the backbones of loops. To reduce the computational costs for energy evaluations, we developed a grid-based force field to represent the protein environment and solvation effect. Simulated annealing has been used to enhance the efficiency of the LMMC loop sampling and identify low-energy loop conformations. The prediction quality was evaluated on a set of protein loops with a known crystal structure that has been previously used by others to test different loop prediction methods. The results show that this approach can reproduce the experimental results with root mean square deviation (RMSD) within 1.8 Å for the all the test cases [125]. Figure 3 shows the loop structures of 2act (198-205) sampled by the LMMC method. This LMMC loop prediction approach could be useful for flexible receptor docking. In our future studies, we will develop our LMMC based molecular docking approach, which samples not only the side chains but also the backbone loops in the binding site of proteins and flexible ligands as well. A flowchart of the LMMC based molecular docking approach is given in Figure 4.
![An external file that holds a picture, illustration, etc.
Object name is nihms-308746-f0003.jpg An external file that holds a picture, illustration, etc.
Object name is nihms-308746-f0003.jpg](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3151162/bin/nihms-308746-f0003.jpg)
Loop Structure of 2act (198-205) produced by the local move MC method at 5000K and followed by clustering to generate 100 representative conformations. Black stick represents the crystal loop structure, and gray wires represent the 100 representative loop conformations.
Application examples of molecular docking for drug discovery
Molecular docking has been the most widely employed technique. Though the main application lies in structure-based virtual screening for identification of new active compounds towards a particular target protein, in which it has produced a number of success stories [126], it is actually not a stand-alone technique but is normally embedded in a workflow of different in silico as well as experimental techniques [127]. Several research groups focus on evaluating of the performance of various docking programs or on making improvements to the scoring functions when experimental testing has already been done. Such efforts could give meaningful guidance to choose the methodology for a particular target system. Docking, combined with other computational techniques and experimental data, also could be involved in analyzing drug metabolism to obtain some useful information from the cytochrome P450 system [128-130], for example. In the following, three examples of successful applications of docking are presented.
DNA gyrase is a bacterial enzyme that introduces negative supercoils into bacterial DNA and unwinds of DNA, thus being studied as antibacterial target. HTS failed to find novel inhibitors of DNA gyrase. Boehm et. al. used de novo design for this enzyme and successfully obtained several new inhibitors [131]. Firstly, 3D complex structures of DNA gyrase with known inhibitors, ciprofloxacin and novobiocin, were carefully analyzed to get a common binding pattern, in which both inhibitors donate one hydrogen bond to Asp73 and accept one hydrogen bond from a conserved water molecule. In addition, some lipophilic fragments should be included in the molecule to have lipophilic interaction with the receptor. Based on this information, LUDI and CATALYST were employed to search the Available Chemicals Directory (ACD) and a part of the Roche compound inventory (RIC), respectively, and collected about 600 compounds. Close analogs of these compounds were also considered, thus in total 3000 compounds were further tested using biased screening. Consequently 150 hits were selected and clustered into 14 classes of which 7 classes were proven to be the true and novel inhibitors. Subsequent hit optimization relied strongly on the knowledge of 3D structures of the binding site and eventually generated a series of highly potent DNA gyrase inhibitors.
Another example is focused on the validation of docking and scoring applied in cytochromes P450 and other heme-containing proteins [132]. Docking against heme-containing complexes appears to be difficult because certain ligands coordinate directly to the heme iron atom and the precise energetics of this contact for different chelating groups needs to be properly balanced with other energetic terms, and in the case of the P450s, the environment above the heme group is very hydrophobic compared to other enzymes and some scoring functions and docking methods perform poorly on interactions driven entirely by lipophilic contacts. In this study, 45 complexes from the PDB database comprising heme-containing proteins and ligands were selected. The native ligands were removed and then docked into the defined active cavities using the GOLD [65] software which employs genetic algorithms to generate ligand conformations. The scoring functions used to rank the docking poses were Goldscore [32] and Chemscore [65]. The results show that the success rates are 64% and 57% for Chemscore and Goldscore respectively, which is significantly lower than the value of 79% observed with both scoring functions for the full GOLD validation set. Additionally, it is apparent from the data that the search algorithm was very unlikely to be responsible for the failure in docking. Further research indicated that re-parameterization of metal-acceptor interactions and lipophilicity of planar nitrogen atoms in the scoring functions resulted in a significant increase in the percentage of successful docking poses against the heme binding proteins (Chemscore 73%, Goldscore 65%), which might be useful in docking applications on P450 enzymes and other heme-binding proteins.
Concerning VS and HTS, comparative research has been done by Doman et al. [133]. Both VS and HTS were applied to screen the inhibitors of the protein tyrosine phosphatase-1B (PTP-1B). For the HTS a library of approximately 400,000 compounds from a corporate collection were screened. Some 85 compounds were found with IC50 values less than 100 μM, corresponding to a hit rate of 0.021%. And the most active had an IC50 value of 4.2 μM. For VS, 235,000 commercially available molecules were docked into the crystal structure of PTP-1B (PDB code 1pty) using the Northwestern University version [134-137] of DOCK3.5 [102, 138]. After docking, the top-scoring 1000 molecules (500 for the ACD and 500 for the combined BioSpecs and Maybridge databases) were considered for further evaluation. A total of 889 molecules were actually available, and after visual inspection 365 compounds were chosen for testing. Of these, 127 molecules were found to be active with IC50 <100 μM, corresponding to a hit rate of 34.8%. Structure-based docking therefore enriched the hit rate by 1700-fold over random screening. Another point that should be noted is that the hits from VS and HTS are very different from each other, which implies combination of VS and HTS may be more helpful for lead discovery.
Concluding Remarks
Receptor flexibility, especially backbone flexibility and movement of several key secondary elements of the receptor involving ligand binding and the catalyst, is still a major hurdle in docking studies. Some methods to deal with side chain flexibility have been proven effective and adequate in certain cases. With respect to global flexibility, an ensemble of proteins is a popular solution which accords with the viewpoint of conformer selection. It requires an efficient way to obtain and select reliable protein structures used for docking, which means structures that the ligand can fit in should be included in the ensembles. Besides, computational cost is another limitation for this method. LMMC could be an appropriate method for sampling a ligand within loop-containing active sites since loop tends to be more flexible and hard to model using existing approaches especially due to their possibly dramatic movements. Another advantage is the adjustment of the extent of flexibility. Either the side chain or full movement of the loop can be directly controlled by users.
Scoring function is a fundamental component worth being further improved upon in docking. Successful application examples show that computational approaches have the power to screen hits from a huge database and design novel small molecules. However, the realistic interactions between small molecules and receptors are still relied on experimental technology. Accurate as well as low computational cost scoring functions may bring docking application to a new stage.
Acknowledgement
We gratefully acknowledge financial support from National Institute Health (DC008996 and S10RR027411 to M.C.). The computations were partially supported by grants of the National Center for Supercomputing Applications (MCB070095T and MCB090019 to M.C.).