Scripts for crystallographic data manipulation
Queen's University Protein Function Discovery
and Department of Biomedical and Molecular Sciences
Molecular Modelling and Crystallographic Computing Facility
Crystallography and Modelling:

My scripts for crystallographic data manipulation

Except for the setup script, these scripts are found locally in /software/misc/scripts and some run FORTRAN programs that are found in /software/misc/. The FORTRAN programs are also freely available. As far as I know, all of these scripts work as I meant them to work. Though there are no guarantees, if you find bugs/problems, if you want a new program, or have suggestions for a better way, then please let me know !

These programs are "free." You may do with them as you please, but please let me know if you find bugs or have questions about the use of them.

Setup script for all crystallographic software

These scripts are used for setting environment variables (e.g. $PATH) and aliases for various software packages. They are designed to make it more user friendly in that your PATH will not increase in length if you repeatedly set up the environment over and over again.
  • setup Do source /software/setup <progname> to run. <progname> may contain version information, e.g., source /software/setup ccp4_4.1.1
    • This calls one of the two following shells scripts, depending on which shell you are using:
    • setup.csh Works with the tcsh shell.
    • The bash shell variant of the above (works with sh and zsh as well).
  • add_path used in the setup scripts above like:
    export PATH=`add_path -sh /software/progname/v4.3.2.1/bin_Linux/`
    to add that directory to the PATH environment variable only if it isn't already there.
  • remove_path used in the setup scripts above like:
    export PATH=`remove_path -re -sh /software/progname/`
    to remove all instances of the "progname" directory from the PATH environment variable .

Miscellaneous useful scripts

  • Python script that lists the symmetry operators for any (or all) space groups in both (x,y,z) format and in matrix form (rotation first, then translation). This requires that you have the cctbx Computational Crystallography Toolbox installed. E.g.:
    ./ p212121
    19 4 4 P 21 21 21
      1 x,y,z             [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]    [0.0, 0.0, 0.0]     
      2 x+1/2,-y+1/2,-z   [1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, 0.0, -1.0]  [0.25, 0.25, 0.0]   
      3 -x,y+1/2,-z+1/2   [-1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, -1.0]  [0.0, 0.25, 0.25]   
      4 -x+1/2,-y,z+1/2   [-1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 0.0, 0.0, 1.0]  [0.25, 0.0, 0.25]   
  • Python script for the calculation of the mean and standard deviation for input data file (ignoring lines that begin with "#").
  • A python script to download PDB files from Options include:
    • -c to specify mmCIF format and
    • -s to include downloading of a structure factor file, if present.
    Multiple codes can be listed on the command line to download multiple files at once. For example -s 1f83 1f82
    Compressed files are uncompressed using gunzip.
  • Python library of useful statistical calculations. Can be used to calculate some basic numbers on any file read in via stdin or as a filename on the command line. It splits columns of data into separate data sets and calculates mean, stdev, median, max, and min. Other functions include: avg_dev (average deviation), var (variance), skew, kurtosis, mode, histogram, lsq (least-squares fit)
  • Python script for the calculation of a moving-window average of data from an input file (expects the file to have two columns of data, i.e. X and Y values). Typically used to provide a smoothed plot of some feature versus residue number for a protein sequence. Options include:
    • -w # or --window=# to specify a window of size # (default 7)

Sequence manipulations and analysis

  • - Python script to convert amino acid sequence from 1-letter to 3-letter code and vice versa. Can also read SEQRES records or determine the sequence from the coordinates (using in PDB files. Try --help for instructions.
  • - python script to calculate molecular mass and count individual atom types from a sequence fed via standard input Requires the above
  • - Python script to search for repetitive patterns in sequences. Input sequence is expected to be in single letter code. Lines beginning with '>', such as in PIR/FASTA format files, are ignored. End-of-line numbers and spaces are also ignored. Patterns are entered as regular expressions, e.g.: -p '[GP].{9,12}[TV]' < file.seq
    This will find repetitions of a pattern that begins with glycine or proline, followed by between 9 and 12 other amino acids, followed by threonine or valine. For more information on regular expressions, see Regex HOWTO, or Python 2.3 Quick reference re module or Regular Expression for Protein Motif Search.
  • - Python script for highlighting patterns in text file. Can be used with, above. E.g. -p '[GP].{9,12}[TV]' < file.seq | -p 'A.*S' --colour=bold,blue,yellow_back
  • - Python script to calculate sequence variability from pre-aligned sequences. Requires the above

Dealing with multiple conformations for O and PROTIN/REFMAC

  • Python script built on Biopython to strip multiple conformations out of one file and write out a file with either the "A" or "B" conformations. Useful when using a structure for molecular dynamics with, e.g. GROMACS, or other analyses.
  • strip_mult Awk script to strip multiple conformations out of one file and write to two separate files for rebuilding with O. The original "chain_id" is maintained to enable working with multi-subunit (or multiple molecules in the asymmetric unit) structures.
  • re_mult - Restore multiple conformations to one PDB file for refinement with PROTIN/REFMAC from your two PDB files that were used for rebuilding in O.

Diffraction Data analysis and format conversion

  • axial_refl - strip axial reflections out of scalepack output to look for systematic absences
  • denzocell - strip unit cell and crystal orientation values from the denzo integration log file.
  • denzohist - strip histograms from denzo integration log file
  • denzostats - strip chi**2 values from denzo integration log file
  • denzolog_strip - runs denzocell, denzohist and denzostats at one time
  • scale2xplor - convert scalepack merged I and sig(I) to F and sig(F) for X-plor usage. Option to convert negative I's to -sqrt(|I|).
  • scalepack_cell - strip out last refined unit cell value from scalepack log file for use in refinement scripts.

Structure analysis and format conversion

  • ddm_strip - strip out difference distance matrix output from my FORTRAN program ddm to do a scatter plot with gnuplot (faster than a true contour plot). An alternative to allow better control of the plot (ranges etc.) and to provide a cleaner-looking output is to use the gnuplot contouring option (with the splot command), write the contours out to a file ("set out table") and replot that file (editing if necessary to split the different contour levels into separate files).
  • - calculate the angle and distance between the vectors defining two helices.
  • newchain - for use with the output of XPAND with the -E option: increments the chain id for each symmetry operator in the output, thus removing the redundant naming of atoms. This allows for easier identification of atoms using O and makes measurement of distances possible. It won't work properly if you have multiple chains to start with.
  • strip_hyd - remove hydrogens from PDB file (MOLEMAN2 may be more reliable, though slightly slower).
  • vector_angles - calculate the angle and distance between two vectors.
  • - Python script that creates a class of a Protein object containing coordinates, sequence etc. Includes functions for calculating interatomic distances, creating distance matrices, difference distance matrices (between two class instances), finding neighbours, calculating radius of gyration and other coordinate statistics, renumbering atom and residue numbers, reading and writing PDB and Gromacs .GRO files
  • - Python script that calculates surface complementarity as in the Sc program from the CCP4 (Collaborative Computational Package 4 -- Protein Crystallography) program package. - Python script that calculates surface similarity as in the Sc program from the CCP4 (Collaborative Computational Package 4 -- Protein Crystallography) program package.
    For both of the above, simply run the programs with two input file names on the command line:
    ./ mol1 mol2
    where mol1 and mol2 are file name roots for the coordinate and vertices files. In other words, after running MSMS with the -of flag, you would have the files:
    The .xyzrn and .vert files are the input to the or program. The .face files are not used here.

    The output .vert file from and can be used in the script to draw the surface in PyMOL coloured according to the Sc value at each vertex.

    You first need to convert PDB-format files to the xyzrn format using my script.

  • Conversion script for convert PDB-format files to xyzrn-format files for use by MSMS.

Refinement and Molecular Replacement statistics

  • refmacR - strip out R, Rfree, CC, CCfree, FOM, FOMfree from refmac log files for plotting with gnuplot
  • ref.R.plt - sample gnuplot file for printing 'ref.R', where 'ref.R' is the output from refmacR
  • oversig - convert AMORE rotation and translation values to peak divided by sigma
  • xplorR - strip out R and Rfree from X-plor log files for plotting with gnuplot.
  • rf_oversig - convert X-plor rotation function values to peak divided by sigma from X-plor RF log files
  • tf_oversig - convert X-plor translation function values to peak divided by sigma

Last revised: Monday, 25-Feb-2013 15:34:05 EST