Announcing SCANPS ----------------- Geoffrey J. Barton Lab of Molecular Biophysics Rex Richards Building South Parks Road Oxford OX1 3QU UK. gjb@bioch.ox.ac.uk UPDATE: 13/8/1997 Version 1.01 - recompiled for Silicon Graphics IRIX 6.2 and R10K processor. Otherwise similar to Version 1.0. Note: Comparison of speed with SSEARCH was done with an earlier version of SSEARCH than that distributed today (13/8/1997). ---------------- SCANPS (pronounced Scan-P-S) stands for SCAN Protein Sequence. The main function of SCANPS is to use a rigorous local alignment method to search protein sequence databases with a query sequence or multiple alignment. SCANPS also allows all pairwise comparsons to be made between a set of sequences and can estimate the statistical significance of the alignments. SCANPS has been used in the analysis of many protein families. For example, it was used to make the discovery of similarity between PD-ECGF (Platelet derived endothelial cell growth factor) and TP (Thymidine Phosphorylase) (Barton, 1992). The program was also used to find the similarity between E. coli diadenosine tetra-phosphatase and the protein Ser/Thr phosphatases (Barton, et al, 1994). Principal features of SCANPS ---------------------------- Efficient finding of Nearly-ALL local alignments (the NALL method) (Barton, 1993) that score above a cutoff or probability threshold, between a sequence and a database. This means if two proteins have more than one common region, most regions are reported. Effectively, this is like BLAST (Altschul et al, 1990) but with gapped alignments. Efficient implementation of the Smith-Waterman Algorithm - this returns the highest scoring local alignment between two sequences including gaps where necessary. The program is approximately a factor of three faster than sssearch. Estimation of the significance of the local alignments. An empirical method is used which takes into account the alignment score and the alignment length. This has the effect of pushing unusually high scoring, but short alignments higher up the hit list. Comparison of all pairs of sequences in a set using either the Smith-Waterman, or NALL methods. Availability ------------ The SCANPS program has been used as a test bed for a lot of studies, many of which are not yet published. When the work is published, I will try to clean up the source code and distribute it. Currently, I can not be sure that the code will compile on all ANSI-C compilers, so for the time being, I am making precompiled binaries available for Sun (SunOS 4.1.3) and Silicon Graphics (IRIX 5.2). I have access to a Silicon Graphics running IRIX 4.X, so if you want the programs on the older operating system, then let me know. The programs are available by anonymous ftp from geoff.biop.ox.ac.uk in the subdirectory programs/scanps. You can also reach this directory using a WWW browser such as Mosaic (URL=http://geoff.biop.ox.ac.uk). You can also read preprints of related papers on line at this site, or download PostScript copies. If you download the programs please send me a short email with your name, affiliation and address. I will add you to my user database and send you an email when the programs are updated and/or sources are made available. References ---------- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. J. Mol. Biol., 215:403-410, 1990. G. J. Barton. Comput. Appl. Biosci., 9:729-734, 1993. G. J. Barton, P. T. C. Cohen, and D. Barford. Eur. J. Biochem., 220:225-237, 1994. G. J. Barton, C. P. Ponting, G. Spraggon, C. Finnis, and D. Sleep. Protein Science, 1:688-690, 1992. ------------------------------------------------------------------- This directory contains the SCANPS distribution in the file scanps_1.0.tar.gz. It also contains the pir sequence database complete with indexes for the scanps program "sortsco". This database expands to about 30 megabytes. You need the GNU gunzip program to uncompress these distribtions - please see an ftp site near you! -------------------------------------------------------------------