Shifty - Chemical Shift Prediction (Version 1.1)

Shifty will likely undergo changes in Jul/Aug of 1997. Please feel free to download and test the version we are currently providing.

Purpose: Shifty calculates proton, carbon, or nitrogen chemical shifts for a given input sequence by aligning the input sequence to homologous proteins in an NMR database.

The algorithm is straight forward. Get the user's protein sequence and calculate sequence alignments with those in the shifts database. Take the best alignments, rank them, and transfer the shift information from the known sequence to the user's sequence making adjustments according to secondary structure and amino acid substitutions.

Note that chemical shift predictions from shifty are made independently for each sequence alignment pair. A program related to shifty called orb attempts to make a chemical shift prediction considering the ensemble of homologous sequences.


Installation

Executable versions of this program for suns or sgis are freely available at our ftp site. First you will need to download the software from our website:

Once you have downloaded the software, you then proceed by uncompressing and untarring the files:

	uncompress myfile.tar.Z
	tar xvf myfile.tar

You should then take a look at the README file to understand what files are being installed and the installation options you have. After this, type "Install" to put the files in the appropriate places.


Preparing Data Files

Create an input sequence file for your protein similar to the example below:

	# This is an example sequence
	>CaM Calmodulin - Drosophila melanogaster (1-148)
	ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD
	MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGFI
	SAAELRHVMTNLGEKLTDEEVDEMIREANIDGDGQVNYEEFVTMMTSK
Notes:

Shifty then proceeds to align the input sequence with the NMR chemical shifts protein database using a Needleman-Wunsch type algorithm. The top "x" alignments are then printed to an output file (included in the alignments are the calculated chemical shifts).


Output

The output for the various types of database are as follows. Note the following abbreviations:
	Num = Sequence Number
	I   = Input Sequence
	D   = Database Sequence
	S   = Secondary Structure

1. 1H Database

	Num I D S  NH  AH  BH1  BH2  GH1  GH2  DH1  DH2  EH1  EH2

Exceptions are as follows;

	For Val; NH  AH  BH  GH  GH
	For Ile; NH  AH  BH  GH  GH
	For Thr; NH  AH  BH  GH
	For Gly; NH  AH  AH
	For Met; NH  AH  BH  BH  GH  GH  EH
	For Leu; NH  AH  BH  BH  GH  DH  DH

2. 13C Database
	Num I D S      NH      CA      CB      CO

3. 15N Database
	Num I D S       N      NH


Authors

Authors: David Wishart, M. Scott Watson, Robert Boyko, Brian Sykes

Funding for this project has been provided by the Medical Research Council of Canada , Bristol-Myers Squibb(Canada), Alberta heritage Fund for Medical Research, and the Protein Engineering Networks of Centres of Excellence (Canada).


Last modified: July 7, 1997

Robert Boyko - robert.boyko@ualberta.ca