Clustering Protein Sequences - Structure Prediction by transitive homology
Bolten, Eva and Schliep, Alexander and Schneckener, Sebastian and Schomburg, Dietmar and Schrader, Rainer
Clustering Protein Sequences - Structure Prediction by transitive homology.
Published in: Bioinformatics Vol. 17 (10). pp. 935-941.
It is widely believed that for two proteins A and B a sequence identity above some threshold implies structural similarity. It is not fully understood whether in the case that sequence similarity between A and B is below this threshold the existence of a third protein with a level of sequence similarity with A and with B which is high enough suffices for inferring structural similarity of A and B, that is whether transitivity holds. We examined the protein sequences in the SwissProt database. Their similarity was determined using the Smith-Waterman algorithm. This data was transformed into a directed graph where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity above a fixed threshold. By use of a length dependent scaling of the alignment scores we have a criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed a very efficient library. Methods include both a novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. The parameters of above algorithms used were fine-tuned by using SCOP as a test set. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings obtained and general methodology relevant for testing our hypothesis.
|Citations:||33 (Web of Science)|
|Uncontrolled Keywords:||bioinformatics clustering proteins structure prediction|
|Divisions:||Institute of Computer Science > Computer Science Department - Prof. Dr. Schrader|
|Depositing User:||Alexander Schliep|
|Date Deposited:||06 Apr 2002 00:00|
|Last Modified:||12 Jan 2012 12:26|