Icahn School of Medicine at Mount Sinai

New York, NY 10029

March 27, 2006.

The program is run interactively. At the start it establishes a log file. After that, it continually offers the user the choice of one of the following functions:

- Select and read a scoring matrix and set the gap-penalty values for
initiating and extending a gap.
The 66 matrices provided by the database
AAindex,
Version 3.0 (Kawashima, S., Ogata, H., and Kanehisa, M.;
AAindex: amino acid index database.
*Nucleic Acids Res.***27I**, 368-369 (1999)) have been included into the distribution. - Read a set of sequences (in FASTA format) either as first or as second set (for possible ortholog search). A Postscript plot of the distribution of pairwise % identities and alignment scores is also generated.
- Check a set for redundancy: cluster the sequences in a set by % homology and select the one in the 'middle' as the representative of the cluster. The 'middle' is defined as the sequence whose lowest homology with the rest of the cluster member is the highest.
- Initialize the weight calculation, assuming that no structure has been determined for the proteins represented by the sequences in the set
- Add sequences representing proteins with known structure to a set
- Add sequences representing proteins with unknown structure to a set
- Change the status of selected proteins from unknown to known
- Define residues of special emphasis in sequences already in the database and specify a different percentage identity threshold for this subset of residues
- Find a subset of sequences that covers the whole set
using one of the four algorithms:
- Greedy and coordinated: Determine structure of the protein with the highest weight in the set U
- Stochastic and coordinated: Determine structures of proteins from the set U with a probability proportional to a weight associated with each protein
- Random and coordinated: Determine structures of proteins from the set U with uniform probability considering only proteins whose weight is positive
- Random and uncoordinated: Determine structures of proteins from the set U with uniform probability considering all proteins in the set U

- Match the sequences on the two sets
with one of the following alorithms (list of matches will be written
to f ile with extension
**.mat**):- For each sequence
*S1*in one set, list all sequences in the other set that are within user-defined percentage (Default: 5 %) of the best match to*S1*(sequences in the second set may appear on more than one list) - For each sequence
*S1*in one set, list all sequences in the other set that are within user-defined percentage (Default: 5 %) of the best match to*S1*and are better matched to*S1*than to any other sequence in the first set (sequences in the second set may apear on only one list) - Match sequences in an optimal way (maximize the minimum match score) using the optimization procedure called Hungarian method

- For each sequence
- Report the weights assigned to the sequences in a set on the log file
- Report the content of the whole database (sequences, pairwise scores, weights) in the log file
- Save the database
- Restore the database
- Exit

__Compilation of the program__

The program is written in Fortran 77. Its size is governed by the parameters (the number between the braces is the value set in the source code), established in the first line of the program

**MAXAA**{30} - maximum number different amino acids**MAXRES**{2000} - maximum number of residues in a sequence**MAXSEQ**{1000} - maximum number of sequences in one set**MAXNG**{100} - maximum number of sequences in the 'vicinity' of a sequence**MAXDB**{1000} - maximum number of sequences in the database**MAXSPAV**{20} - average number of special residues per sequence in the database

g77 -O4 -o pspace.exe pspace.f