See DANGLE Flow Chart
The technical detail of the prediction algorithm implemented in DANGLE is described in the publication. Here we just give a brief framework of how it works.
Using the raw chemical shift data and amino acid sequence of the query protein as input, DANGLE computes the sequence-corrected secondary chemical shifts of all measured 1Hα, 15N, 13C', 13Cα and 13Cβ nuclei and then makes predictions of the backbone dihedral angles for each residue in sequential order.
The fragment database is searched to find close matches to the secondary chemical shifts and sequence of each five-residue window along the polypeptide chain of the query protein. When no chemical shift information is available, DANGLE looks for similar fragments based on sequence identity only.
For each window in the query protein, the 10 lowest scoring matches from the fragment database define a "scatter pattern" of (φ,ψ) coordinates on a Ramachandran plot.
DANGLE then analyses this scatter pattern to identify the most probable dihedral angles for the central residue in the query window. According to Baye's theorem, the posterior probability of the hypothesis that a (φ,ψ) combination A can be deduced from a given scatter pattern GS, P(A|GS), is:
P(A|GS) ∝ P(GS|A) × P(A)
P(A), the probability that the central residue adopts a particular set of φ and ψ angles, is determined by our prior knowledge of a normalised experimental population distribution derived from a database of high resolution X-ray structures. This probability term is independent of the data, i.e. the query scatter pattern.
P(GS|A), the posterior probability that a conformation A will give rise to the given scatter pattern, is approximated by the similarity between the query scatter and the database scattergram corresponding to a 10° square bin in which A is located. (The whole 360° × 360° φ,ψ conformational space is divided into 36 × 36 10° square bins. Click here to see how a database scattergram for each bin was made)
As a result, the query scatter pattern is associated to each of the 36 × 36 10° square bins via a Bayesian inferential score, which combines the effect of P(GS|A) and P(A).
After comparing the query scatter with all of the scattergrams in the database, a global likelihood estimate (GLE) diagram is assembled from the Bayesian inferential scores for each 10° square bin in (φ,ψ) space. A cluster of neighbouring bins with greater than a threshold value is considered as an "island", which signifies a range of similar possible conformations.
DANGLE identifies the island that contains the highest Bayesian inferential score, determines the weighted angular means of φ and ψ within this principal island and reports these as the conformational predictions. The locus of 10° bins adjacent to the boundary of the principal island is used to defined the upper and lower limits for the predicted angle in both the φ and ψ dimensions.
In order to obtain estimates of P(GS|A), the probability that a conformation within a particular 10° square bin will give rise to a given scatter pattern, the scatter pattern for the central residue in the current query window must be mapped back to experimentally determined (φ,ψ) combinations.
We assembled a database of "scattergrams", using structure and shift information from the fragment database as a reference set. For each occurrence of structure-derived φ and ψ values within a particular 10° square bin, the local sequence and secondary shifts of the relevant residue were used to generate scatter patterns by collecting angles of the top 10 matches in a similarity search. These scatter patterns were summed together and normalised to produce a scattergram, which corresponds to the distribution of the best matches from each residue with known (φ,ψ) values within a 10° square bin. Scattergrams of this sort were generated for each of the 36 × 36 bins in (φ,ψ) space.
Copyright (C) 2009 Nicole Cheung, Tim Stevens, Bill Broadhurst (University of Cambridge)