RICHARD LAVERY and HEINZ SKLENAR

LABORATOIRE DE BIOCHIMIE THEORIQUE, CNRS URA 77 INSTITUT DE BIOLOGIE PHYSICO-CHIMIQUE 13 RUE PIERRE ET MARIE CURIE PARIS 75005, FRANCE

JUNE 1996

Curves [1-4] is an algorithm for calculating a helical parameter description for any irregular nucleic acid segment with respect to an optimal, global helical axis. The solution is obtained by minimising a function which represents the variations in helical parameters between successive nucleotides as well as quantifying the kinks and dislocations which exist between successive helical axis segments. This function has four variables for each base (or optionally for each base pair, base triplet, etc.) within the structure (X displacement, Y displacement, Inclination and Tip) which determine the position of the local helical axis segment in space.

For multi-stranded segments it is possible to calculate a single optimal helical axis or to generate axes for each strand independently. It is also possible to impose fixed "virtual" ends on the segment studied to investigate the effect on the helical description of imbedding the segment in a nucleic acid environment of known structure (see reference [1]). Curves also includes calculation of local inter-base and inter-base pair parameters which do not depend on the definition of a global helical axis and may be compared with parameters from other programs which only calculate local parameters.

Curves [2,3] respects the choice of names and the sign conventions decided upon during the September 1988 EMBO meeting on DNA Curvature in Churchill College, Cambridge [5].

Our procedure differs from those proposed by other authors in that it yields a global and optimal curvilinear helical axis which can be used to directly determine the curvature of a given oligonucleotide fragment. In this way, we can also obtain a full set of global helical parameters relating each base to its neighbours, but also to the optimal axis. A careful definition of the helical parameters employed ensures their independence and commutability. The method yields parameters which are also independent of the sense of reading a nucleic acid fragment. Finally, all rotational parameters refer to rotations around defined axes in space and thus are easy to understand. These features and the completeness of the parameter set generated make Curves ideal not only for analysing DNA or RNA structure, but also for interfacing with molecular modeling programs and, notably, with internal coordinate techniques such as the Jumna program [6,7] developed in our laboratory (note that the .AXE file output from Curves can be directly input to Jumna in order to build a nucleic fragment).

Input geome try can come from a protein data bank file (.PDB or .BRK), files of the format used in our laboratory and for the Jumna program (.MAC, described in a separate guide) or a general data file (.DAT). In the latter case atoms should be specified, after an initial title line, by giving the atom name, the XYZ coordinates, the subunit name and the subunit number in the format (A4,3F10.5,1X,A4,I4).

Curves can output a file (in .PDB Brookhaven format or in our .MAC format) showing the helical axis system, the base positions (represented by rectangles) and backbone ribbons, which can be used for graphic representation of the results ( this is strongly encouraged ). An option also exists to represent the local helical axis segments individually or to generate a spline smoothed curve for the entire axis. Lastly, Curves can output the helicoidal parameters in a concise form in a .DAF file which may be useful as input to spreadsheet or other computer programs.

A utility program, GEX, is also provided with Curves that ena bles groove dimensions to be plotted. Output from GEX can be input directly to the MATHEMATICA package.

(Note that the current version of Curves cannot make a groove dimension analysis for nucleic acids containing bulges unless the bulge bases are not included in the analysis).

(1) The Definition of Generalised Helicoidal Parameters and of Axis Curvature for Irregular Nucleic Acids. R. Lavery and H. Sklenar J. Biomol. Struct. Dynam. 6 (1988) 63-91.

(2) Defining the Structure of Irregular Nucleic Acids: Conventions and Principles. R. Lavery and H. Sklenar J. Biomol. Struct. Dynam. 6 (1989) 655-667.

(3) A Quantitative Description of the Conformation of Biological Macromolecules. R. Lavery and H. Sklenar Proceedings of the 6th Conversation in Biomolecular Stereodynamics ed. R.H. Sarma Adenine Press, New York (1990) 215-235.

(4) Conformational and Helicoidal Analysis of 30ps of Molecular Dynamics on the d(CGCGAATTCGCG) Double Helix: "Curves", Dials and Windows. G. Ravishankar, S. Swaminathan, D.L. Beveridge, R. Lavery and H. Sklenar J. Biomol. Struct. Dynam. 6 (1989) 669-699.

(5) Definitions and Nomenclature of Nucleic Acid Structure Parameters. R.E. Dickerson, M. Bansal, C.R. Calladine, S. Diekmann, W.N. Hunter, O. Kennard, R. Lavery, H.C.M. Nelson, W.K. Olson, W. Saenger, Z. Shakked, H. Sklenar, D.M. Soumpasis, C.-S. Tung, E. von Kitzing, A.H.- J. Wang and V.B. Zhurkin (a) J. Biomol. Struct. Dynam. 6 (1989) 627-634. (b) J. Mol. Biol. 205 (1989) 787-791. (c) EMBO J. 8 (1989) 1-4.

(6) Junctions and Bends in Nucleic Acids: A New Theoretical Modelling Approach. R. Lavery in Structure and Expression Vol.3 DNA Bending and Curvature eds. W.K. Olson, R.H. Sarma, M.H. Sarma and M. Sundaralingam. 1988 Adenine Press, New York 191-211.

(7) JUMNA (Junction Minimisation of Nucleic Acids). R. Lavery, K. Zakrzewska and H. Sklenar Computer Physics Communications 91 (1995) 135-158.

(8) Describing Protein Structure: A General Algorithm Yielding Complete Helicoidal Parameters and a Unique Helicoidal Axis. H. Sklenar, C. Etchebest and R. Lavery Proteins: Structure, Function and Genetics 6 (1989) 46-60.

(9) Looking into the Grooves of DNA. N. Boutonnet, X. Hui and K. Zakrzewska Biopolymers (1993) 33, 479-490.

(10) Measuring the Geometry of DNA Grooves. E. Stofer and R. Lavery Biopolymers 34 (1993) 337-346.

(11) Modeling Nucleic Acids: fine Structure, flexibility and Conformational Transitions. R. Lavery Advances in Computational Biology 1 (1994) 69-145.

(1) Namelist input

(2) Number and length of strands

(3) Nucleotide subunit numbers

(4) Starting values for Xdi, Ydi, Inc, Tip

... the detailed description of the input is given in the following section.

Some remarks on dimensions ...

The major dimensions of the Curves arrays as well as a certain number of basic parameters and the implicit variable types are contained in a file called 'curves_data.inc' which is included in each subroutine at compile time. A listing of this file is given below. This structure allows users to easily modify the size of the structures which can be treated.

implicit real*8 (a-h,o-z) implicit integer*4 (i-n)

c...Maximum number of atoms parameter(n1=3000)

c...Maximum number of subunits parameter(n2=400)

c...Maximum number of nucleotides parameter(n3=80)

c...Delta change of variables for numerical gradient test parameter(deltv=1.d-8)

c...Conversion factor from degrees to radians parameter(cdr=0.017453293d0)

c...Conversion factor from radians to degrees parameter(crd=57.29577951d0)

c...Constant: Pi parameter(pi=3.141592654d0)

c...Constant: Precision for eigenvalue calculation parameter(range=1.d-8)

For ease of reading, the namelist variables have been grouped according to their function within the program...you only need to input variables which must be changed from their default values (order of namelist input is unimportant)

KEY to namelist description... Name of namelist variable, Type: R- Real, I- Integer, L- Logical, A Character, Current default value

FILE (A,' ') : Name of file containing geometry (with extension .PDB, .MAC or .DAT)

LIS (A,' ') : Name of output .LIS file

DNA (A,' ') : Name of .DNA graphic output file (MAC format)

PDB (A,' ') : Name of .PDB graphic output file (Brookhaven format)

DAF (A,' ') : Name of .DAF output data file of results

AXIN (A,' ') : Name of Jumna .AXE file for use as a starting point (only for Jumna users)

AXOUT (A,' ') : Name of output .AXE file for Jumna (only for Jumna users, illegal if there are zeros in the first strand input)

(Note that if DNA, PDB, DAF, AXIN or AXOUT are left blank, these files will not be generated. If LIS is left blank the program can be run interactively and the output will appear on the screen)

FIT (L,FALSE) : If TRUE then fit standard base geometries to the actual base coordinates before calculating the base fixed axis systems (uses the analytic LS procedure of McLachlan J. Mol. Biol. 81 (1979) 49). Useful when base coordinates are distorted e.g. after molecular dynamic simulations

IBOND (I,0) : If IBOND>0 then give IBOND pairs of atom numbers after the name list input. Chemical bonds will then be forced between these atoms. Useful again for distorted structures where the normal chemical bond length limits used within curves would fail to find the correct backbone connectivity. When such problems occur they will be indicated in the output listing.

MINI (L,TRUE) : Minimisation carried out if TRUE

ACC (R,1.D-6) : Accuracy of convergence

MAXN (I,500) : Cycle limit for minimisation

SUPP (L,TRUE) : Suppression of test output during cycles

TEST (L,FALSE) : Optional numeric test of gradients

(It is unlikely that you will ever have to change ACC, SUPP or MAXN. TEST only served during the development of Curves)

COMB (L,FALSE) : If TRUE then calculate a single helical axis for a multi-strand segment

ENDS (L,FALSE) : If TRUE input Xdi, Ydi, Rise, Inc, Tip, Twist values for fixed "virtual" ends (two lines of data at the end of the input, corresponding to the lower and upper virtual bases or base pairs). This tests the effect of imbedding the segment in a nucleic acid of known helical structure [1]. Note ENDS cannot be used is strands are of different lengths

REST (L,FALSE) : If TRUE different starting values can be input for each nucleotide. Mainly used in conjunction with MINI=FALSE to force chosen base-axis parameters and look at the resulting axis

DINU (L,FALSE) : If TRUE allows DNA with ideal or approximate dinucleotide symmetry to be treated using modified sums for the A terms of F(h)

BREAK (I,0) : If BREAK=I a break point is inserted between nucleotide levels I-1 and I. This means the axis obtained is optimal for the fragment up to I-1 and from I to the end of the structure, but takes no account of irregularity at the break point junction

LINE (L,FALSE) : Calculation of helical parameters for the best linear axis

ZAXE (L,FALSE) : Calculation of helical parameters taking the Cartesian Z axis to be the helical axis.

Curves can function in 3 modes as far as the definition of the helical axis is concerned :

MINI=.true. (LINE and ZAXE = .false.) implies that the optimisation procedure will search for the best curvilinear axis to represent the nucleic acid fragment

LINE=.true. (MINI=.true., ZAXE=.false.) implies that the optimisation procedure will search for the best straight helical axis to represent the nucleic acid fragment

ZAXE=.true. (MINI and LINE = .false.) implies that the Cartesian Z axis will be assumed to be the helical axis (this option is only useful for symmetric conformations generated by modelling programs such as JUMNA)

- the program contains protection to ensure that mini=.true. if line=.true., mini=.false. if zaxe=.true. and that line and zaxe cannot both be true.

IOR (I,0) : If IOR>0 then the molecule will be reoriented so that the helical axis segment of the IOR nucleotide is parallel to the Z axis and its local dyad parallel to the X axis. This affects the .DNA or .PDB graphic file and prompts the output of a new .MAC molecular file. This can be useful for graphic viewing. The name of the output molecular file is distinguished by an underline added to the name of the molecular input file.

WID (R,0.75) : Half-width of ribbon in the graphic output

SPLINE (I,0): If >0 a spline smoothed curve is generated to describe the global helical axis with the number of intermediate points specified between each nucleotide level

GRV (L,FALSE) : If TRUE groove geometry is measured

NBAC (I,7) : Choice of atom defining backbone splines between which groove width is measured (1: C1', 2: C2', 3: C3', 4: C4', 5: O1', 6: O3', 7: P, 8: O5', 9: C5')

NLEVEL (I,3) : Number of intermediate points at which groove measurements will be made between base pair levels. Note if rise is bigger than normal (e.g. at an intercalation site) NLEVEL will automatically be increased locally.

---- This is the end of the namelist input ----

NST, NR1, NR2, NR3, NR4

Where NST is the number of strands and NR1-NR4 are the number of nucleotide levels per strand (Note: for a single strand NR2-4=0. For a strand given in the 3'-5' sense NR should be negative. For all strands present in the structure, the absolute value of each NR should be equal to the number of nucleotides in the longest strand. The fact that some strands may be shorter is indicated by putting zeros in the input lines described below not by modifying the corresponding NR.

Give NST lines of data containing the numbers of the subunits corresponding to the bases or nucleotides to be analysed in each strand. Base pairing is indicated by vertically aligning the nucleotide numbers in successive input lines. If strands have different lengths the missing nucleotides in the shorted strands are indicated by zeros. Zeros within a line imply the existence of bulges. "Silent" bases excluded for axis calculation are indicated by negative numbers (see notes to Version 5.1).

The numbers specifying the bases refer to the order of occurrence of the subunits within the input data file. Note that this is not the same thing as the 'subunit number' unless the subunits in the file are numbered contiguously starting from 1.

Xdi, Ydi, Inc, Tip values as starting guess

0,0,0,0 works in all A- and B-DNA cases tested so far. For Z-DNA put Tip=180 or an inverted axis is obtained . (Note: the number of lines of starting values is determined by the type of calculation. For REST=.false. one line is required for a single strand or a multi-strand structure with COMB=.true. and N lines for a N-strand structure with COMB=.false. If REST=.true. then the number of lines required equals Abs(NR(1)) if COMB=TRUE and otherwise equals the total number of nucleotides in the structure.

(1) B-DNA, the Dickerson dodecamer (with groove measurement).

home/Cur5_s <<! &inp comb=.t., file=dodec.pdb, grv=.t., lis=dick, pdb=dick, &end 2 12 -12 0 0 1 2 3 4 5 6 7 8 9 10 11 12 24 23 22 21 20 19 18 17 16 15 14 13 0. 0. 0. 0. !

(2) A DNA duplex/triplex structure with a break point at the junction.

/home/Cur5_s <<! &inp comb=.t., file=triple.mac, break=6, pdb=trip, lis=trip, &end 3 10 -10 10 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 0 0 0 0 0 63 66 69 72 75 0. 0. 0. 0. !

(3) Finding the best linear axis for a Z-DNA duplex (note DINU=.T. and initial Tip=180o).

/home/Cur5_s <<! &inp comb=.t., file=z1.mac, spline=3, pdb=z1out, lis=z1out, dinu=.t., line=.t., &end 2 8 -8 0 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 0. 0. 0. 180. !

Curves only requires base or nucleotide subunit numbers in order to do its work (that is the number describing the position of each subunit in the input file: 1st, 2nd, 3rd,...,10th, etc.). Because of this the names of the atoms in each nucleotide must be the standard names (N9, C8, P, etc.). Both O1'and O4' are acceptable in the sugar and both primed and starred names (C1',C1*) are recognised. All atom names should start with their chemical symbol, but M is acceptable for the thymine methyl carbon. Connectivity is generated automatically within the program from a table of standard minimum and maximum bond lengths (see Subroutine Bonder). If the data is of poor quality all the bonds necessary to treat a given nucleotide may not be found. In this case the namelist option IBOND can be used to force the missing bonds to form by giving the global numbers on the atoms concerned. Missing bonds will however only effect the ability of curves to list the sugar pucker and the backbone torsions for the nucleotides directly involved. (See also notes on Version 5.1 concerning mispairs, abasic sites and bulges)

Curves output firstly summarises the file names involved in the calculation, various data on the nucleic acid fragment which is read and the state of the namelist variables. The sequence read from the input file is then given, including the direction of each strand. If FIT=TRUE then the RMS fit between the actual base coordinates and the standard bases is also output.

This is followed by a summary of the minimisation procedure (if MINI=TRUE) which ends with the final value of the function, its components and the gradients (see [1]). For fragments of the same size the final function value is a measure of the overall degree of irregularity in the conformation (helically regular nucleic acids give the value zero).

The detailed output describing the global helical axis and the helicoidal parameters then follows in the sections indicated below:

(A) Global axis parameters (the vectorial direction of each local helical axis segment U and its reference point P and DIF value which characterises global irregularity of the given dinucleotide step)

(B) Global base-axis parameters

(C) Global base-pair axis parameters

(D) Global base-base parameters

(E) Global inter-base parameters

(F) Global inter-base pair parameters

(G) Local inter-base parameters (using successive base axes JKL)

(H) Local inter-base pair parameters (using mean b.p. JKL axes)

(I) Global axis curvature (see below)

(J) Backbone parameters (sugar pucker and backbone torsions)

(K) Groove measurement of minor and major grooves, showing width, depth and the angle of the plane defining the minimal width with respect to the local helical axis. Note that a '*' after the width indicates an interpolated value (see publications 8,9 on page 4)

Notation : Parameters between successive bases or bases pairs A and B are indicated in the output by A/B. Parameters between the two bases of a base pair formed by A and B are indicated by A-B.

Axis curvature : The junctions between successive axis segments are described by two translations (Ax and Ay) and two rotations (Ainc and Atip) [1,2]. The total angle formed between the axes (Angle), the total lateral displacement (Adis=Sqrt[Ax 2 +Ay 2 ]) and the distance between successive reference points (Path) are also given. The total angle of curvature between the first and last helical axis segments is listed along with its angular direction with respect to the dyad axis of the first nucleotide. Curvature is also described as the perpendicular distances of successive reference points from a line drawn between the first and last reference points. Finally, the apparent shortening is obtained from the difference between the sum of the path lengths and the distance along the line between the first and last reference points. ss versus ds output: If single strand DNA or RNA is analysed all output referring to base pairs naturally disappears. The sections which remain are consequently: A,B,E,G,I and J.

Base sequence codes : Each base, base dimer (in one strand) and base trimer (in one strand) and indicated by numerical codes termed Bc, Dc and Tc respectively. These codes are described below. If any code appears with a negative sign it indicates that the second form shown in the table is actually present in the strand being analysed. These codes are useful for programs which read the .LIS file in order to make an automatic analysis of sequence dependent structural properties.

Bases

1: Guanine or Inosine 2: Adenine or Y base(tRNA) 3: Cytosine 4: Thymine, Uracil or Pseudouracil

Dimers (5'-3')

1 : GG / CC .... PuPu / PyPy 2 : GA / TC 3 : AG / CT 4 : AA / TT 5 : GC .... PuPy 6 : GT / AC 7 : AT 8 : CG .... PyPu 9 : CA / TG 10: TA

Trimers (5'-3')

1 : GGG / CCC 17: GAG / CTC .... PuPuPu / PyPyPy 2 : GGA / TCC 18: GAA / TTC 3 : AGG / CCT 19: AAG / CTT 4 : AGA / TCT 20: AAA / TTT 5 : GGC / GCC 21: GAC / GTC .... PuPuPy / PuPyPy 6 : GGT / ACC 22: GAT / ATC 7 : AGC / GCT 23: AAC / GTT 8 : AGT / ACT 24: AAT / ATT 9 : CGG / CCG 25: CAG / CTG .... PyPuPu / PyPyPu 10: CGA / TCG 26: CAA / TTG 11: TGG / CCA 27: TAG / CTA 12: TGA / TCA 28: TAA / TTA 13: CGC / GCG 29: CAC / GTG .... PyPuPy / PuPyPu 14: CGT / ACG 30: CAT / ATG 15: TGC / GCA 31: TAC / GTA 16: TGT / ACA 32: TAT / ATA

/AXE/ Coordinates for axis and base plate file /CHA/ Character variables in namelist /BAK/ Backbone conformation parameters /DAT/ Base reference axis system and strand indices /DER/ Derivative components /DRL/ Real, integer and logical namelist variables /GRO/ Groove measurement data /END/ Data for fixed virtual ends /HOL/ Save matrices for helical parameter solution /LSF/ Data for IBOND and standard bases used in LSfit /MAC/ Geometry of fragments studied, atom names, etc. /MIN/ Variables and gradients for minimiser /VEC/ Helical system vectors and sub-component

AACUR Main program, input and flow control ANALY Control of minimisation AVER Average of 2 torsion angles (function) AXEINT Spline interpolation of helical axis BACINT Spline interpolation of backbones BACKBO Calculation of backbone conformation BONDER Atom bonding for .PDB and .DAT input data CALC Calculation of functional CUBSPL Spline smoothing of helical axis for graphics DERIV Calculation of function derivatives DIFF Difference between 2 torsion angles (function) EIGEN Solves eigenvalue problem for L.S. fitting GRADS Assembly of functional derivatives GRADT Numeric test of functional derivatives GROOVE Measurement of groove geometry INPUT Input of geometry from .MAC, .DAT or .PDB LOCATE Construction of base reference axis systems LOCPAR Local inter-base helical parameter calculation LSFIT Performs McLachlan LS fitting MC11A/E Minimiser vector subroutines MINFOR Minimiser (Harwell VA13F) MOVE Cycle subroutine for minimiser NML FORTRAN namelist interpreter OUTAXE Output of helicoidal and backbone parameters

PARAMS Calculation of helicoidal parameters PDBOUT Output of graphic file, Brookhaven format PLATE Creation of plate graphic representation SETD Generation of classical bonding matrix SETEND Setup of U and P for fixed virtual ends SETUP Block data of standard base geometries TITLE Titles for printing in analysis/outaxe TORP Calculation of torsion angles (function) UP Calculation of U and P from Xdi, Ydi, Inc, Tip

Revised Locally: Thursday, 13-Mar-97 12:07:59 EST