bjects, k-means produces a number of partitions for every k in a linear time complexity with respect to any aspect of the problem size. The complexity of k-means algorithm is O, where the number of clusters and the number of interactions are usually less than the number of objects. Several works explore the KU-55933 chemical information relative accuracy of various clustering algorithms in extracting the right number of clusters from generated data. According to Hartigan et al., we cannot point the best clustering method since different approaches are right for different purposes. Chen and Lonardi say that the more popular methods for clustering MD conformations are agglomerative hierarchical clustering since its linkage method is able to use the attributes for describing the chemical structures. More specifically, linkage is the only method able to 21 / 25 An Approach for Clustering MD Trajectory Using Cavity-Based Features calculate the dissimilarities between two clusters of chemical structures using Euclidean distance. Alternatively, Shao et al. found that UPGMA, k-means, and SOM outperformed COBWEB, Bayesian, and other hierarchical clustering methods by using the pairwise RMSD distance as measure of similarity. Although our analyses also show hierarchical agglomerative methods as the best choices for all data sets, the k-means and k-medoids algorithms appear PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19748051 as the worst choice for all data sets. Each study has its own way to generate data and to identify the best clustering algorithm and, therefore, comes with its own advantages and drawbacks. Indeed, an appropriate solution depends on a given analysis or application scenario, so data collection, data representation, and interpreting the clusters found are crucial for selecting a clustering strategy. Conclusions The work we present here analyzes and combines clustering partitions using three different data sets in order to reduce the structural redundancy in a 20 ns MD trajectory of a target protein receptor. Previous studies tackled this computational issue using only the RMSD measure of similarity. The present study, in addition to investigating RMSD-based clustering, also provides a novel measure of similarity, which is based on features from the substratebinding cavity. It addresses the high computational cost involved in using MD ensembles for performing virtual screening of large libraries. We learned that the use of binding cavity properties for clustering MD trajectory is an efficient method to distill significant conformational flexibility within the receptor binding cavity. The chosen properties also outperformed other RMSD measures of similarity. This methodology can be extended to other proteins/receptor, as long as the binding pocket from the FFR model is known in advance. Further applications may include the investigation of ensembles of MD conformations from other target receptor enzymes, as well as with longer MD simulation trajectories. Future directions involve the extension of this approach to the exploration of virtual libraries of compounds where the ensemble of representative MD conformations, shaped by properties of the substrate-binding cavity, can be investigated more effectively. ~~ ~~ ~~ Type 2 diabetes is characterized by a loss of beta cell mass and function, along with the deposition of amyloid in pancreatic PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19748686 islets. The main constituent of islet amyloid is islet amyloid polypeptide. Human IAPP, but not rodent IAPP, forms oligomers and fibrils leading to the amyloid deposits assoc

By mPEGS 1