Clustering Protein Sequences - Structure Prediction by transitive homology

Bolten, Eva and Schliep, Alexander and Schneckener, Sebastian and Schomburg, Dietmar and Schrader, Rainer (2001) Clustering Protein Sequences - Structure Prediction by transitive homology.
Published in: Bioinformatics Vol. 17 (10). pp. 935-941.


It is widely believed that for two proteins A and B a sequence identity above some threshold implies structural similarity. It is not fully understood whether in the case that sequence similarity between A and B is below this threshold the existence of a third protein with a level of sequence similarity with A and with B which is high enough suffices for inferring structural similarity of A and B, that is whether transitivity holds. We examined the protein sequences in the SwissProt database. Their similarity was determined using the Smith-Waterman algorithm. This data was transformed into a directed graph where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity above a fixed threshold. By use of a length dependent scaling of the alignment scores we have a criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed a very efficient library. Methods include both a novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. The parameters of above algorithms used were fine-tuned by using SCOP as a test set. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings obtained and general methodology relevant for testing our hypothesis.

Download: [img] Postscript - Preprinted Version
Download (285kB) | Preview
Download: [img] PDF - Preprinted Version
Download (151kB) | Preview
Editorial actions: View Item View Item (Login required)
Deposit Information:
ZAIK Number: zaik2000-383
Depositing User: Alexander Schliep
Date Deposited: 06 Apr 2002 00:00
Last Modified: 12 Jan 2012 12:26