Skip to content

Commit

Permalink
- Latest development changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Hubley committed May 26, 2017
1 parent 86089b3 commit f4961e0
Show file tree
Hide file tree
Showing 7 changed files with 1,153 additions and 202 deletions.
81 changes: 81 additions & 0 deletions CosegConfig.pm
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/usr/bin/perl
##---------------------------------------------------------------------------##
## File:
## @(#) CosegConfig.pm
## Author:
## Robert Hubley <[email protected]>
## Description:
## This is the main configuration file for the Coseg
## perl programs. Before you can run the programs included
## in this package you will need to edit this file and
## configure for your site.
##
#******************************************************************************
#* Copyright (C) Institute for Systems Biology 2016 Developed by
#* Arian Smit and Robert Hubley.
#*
#* This work is licensed under the Open Source License v2.1. To view a copy
#* of this license, visit http://www.opensource.org/licenses/osl-2.1.php or
#* see the license.txt file contained in this distribution.
#*
###############################################################################
package CosegConfig;
use FindBin;
require Exporter;
@EXPORT_OK = qw( $REPEATMASKER_DIR );

%EXPORT_TAGS = ( all => [ @EXPORT_OK ] );
@ISA = qw(Exporter);

BEGIN {
##----------------------------------------------------------------------##
## CONFIGURE THE FOLLOWING PARAMETERS FOR YOUR INSTALLATION ##
## ##
##
## RepeatMasker Location
## ======================
## The path to the RepeatMasker programs and support files
## This is the directory with this file as well as
## the ProcessRepeats and Library/ and Matrices/ subdirectories
## reside.
##
## i.e. Typical UNIX installation
## $REPEATMASKER_DIR = "/usr/local/RepeatMasker";
##
$REPEATMASKER_DIR = "/usr/local/RepeatMasker";

## ##
## END CONFIGURATION AREA ##
##----------------------------------------------------------------------##
}

sub standalone_entry_point
{
print "Enter location of the RepeatMasker program: ";
my $answer = <STDIN>;
$answer =~ s/[\n\r]+//g;
# TODO Validate

open IN,"<CosegConfig.pm" or die;
open OUT,">CosegConfig.new" or die;
while ( <IN> )
{
if ( /^\s*\$REPEATMASKER_DIR\s*\=/ )
{
print OUT " \$REPEATMASKER_DIR = \"$answer\";\n";
}else
{
print OUT;
}
}
close IN;
close OUT;

system("mv CosegConfig.new CosegConfig.pm");
exit;
}

## Allow this module to be called as a standalone script
__PACKAGE__->standalone_entry_point() unless caller;

1;
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
##
## Makefile for coseg project
##
VERSION=0.2.2
VERSION=0.2.3
INSTALLDIR=/usr/local/coseg-${VERSION}

## Basic
CC = cc -O4 -lm
CC = cc -g -O4 -lm
## A nice memory leak checker:
#CC = bgcc -O4 -fbounds-checking -lm

Expand Down
71 changes: 52 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ Background

There are a few caveats:

- The input sequences must be full length alignments to a *single*
* The input sequences must be full length alignments to a *single*
reference sequence.

- The longer the sequence the harder this problem is to solve. Shorter
* The longer the sequence the harder this problem is to solve. Shorter
families or short subregions are recommended for this process.

Current Authors: Robert Hubley <[email protected]>
Expand Down Expand Up @@ -216,13 +216,17 @@ Running using your own data
with "-l div", "-l pv", "-l c" accordingly.


Experimental
============
Utility Programs
================

refineConsSeqs.pl

Using the refiner contained in the RepeatModeler package one may
submit subfamily members for consensus refinement. There is an
experimental script called refineConsSeqs.pl that will run this
analysis:
Coseg uses a rather crude method of building consensus sequences
for each subfamily it finds. This script makes use of the refiner
script in the RepeatModeler package to build and refine consensus
sequences based on subfamily members assigned by coseg.

Usage:

1. From the directory where your coseg results can be
found run:
Expand All @@ -232,43 +236,72 @@ analysis:
Where "mycosegrun" is the prefix of the coseg run.


extractSubSeqs.pl

A utility script to extract the sequences for a given coseg subfamily.
This uses the coseg input sequence file along with the *.assign file
to determine which sequences belong to the queried subfamily number.

Usage:




Version History
---------------
0.2.2: - Create a *.svg graph file without the need
-.-.-:
TODO: Print out a multiple alignment of the
subfamilies showing only the signficant
differences ( i.e. the "." and "i" method )
and grouping them by tree/div order.

* Fixed the *.svg "<svg>" tag so that the files
will directly load in HTML5 web browsers.
* Calculation of divergence has been improved(?).
We now use kimura substition distance with CpG
site accounting modifications instead of the
mixed substition and indel calculation.
* Require that the diagnostic mutations which
broke up the parent family are maintained
after allowing all elements to be re-assigned.
Currently this is only coded for bi-mutations.
This is a major change and deserves to have
a flag allowing one not to use it.

0.2.2: * Create a *.svg graph file without the need
to download/use GraphViz. The layout is
handled by an adaptation of algorithm
developed by Atze van der Ploeg. The SVG
file produced supports various labeling
options and subfamily details displayed
when a node is hovered over.
- Changed the default colormap for the graph
* Changed the default colormap for the graph
output. Now warm colors denote more diverged
subfamilies in the tree while cooler colors
represent younger subfamilies. To restore
the original color scheme use the new "-o"
flag to coseg.
- Added parameter to control the minimum distance
* Added parameter to control the minimum distance
between diagnostic sites. Now the user can override
the historic value of 10 using the -u flag.
- Improved error reporting when there is a mismatch
* Improved error reporting when there is a mismatch
between an individual sequence length and the
consensus length in the input files.
- Fixed a bug that caused coseg to segfault.
- Added experimental script refineConsSeqs.pl. This
* Fixed a bug that caused coseg to segfault.
* Added experimental script refineConsSeqs.pl. This
script uses the RepeatModeler application to build
and refine the consensus sequences for each
subfamily.
0.2.1: - Improved code documentation
- Single mutation significance cutoff ( SIGMATHRESH ) was
0.2.1: * Improved code documentation
* Single mutation significance cutoff ( SIGMATHRESH ) was
pre-calculated for Alkes Alu analysis and hardcoded. This
version calculates the correct sigma cutoff using the length
of the input sequence.
- Fixed bug with implementation of Siegel's pValue
* Fixed bug with implementation of Siegel's pValue
calculation which caused a segfault -- found by Neal Platt.
- Switched default pvalue method to Andy Siegel's method and
* Switched default pvalue method to Andy Siegel's method and
provided a new "-k" switch to use Alkes Price's method.
- Fixed bug where the program was exiting when calculations
* Fixed bug where the program was exiting when calculations
fell below the precision of the machine ( epsilon ). Message
given was "Below epsilon..." and the runcoseg.pl script
moved on even though coseg failed.
Expand Down
Loading

0 comments on commit f4961e0

Please sign in to comment.