-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Robert Hubley
committed
May 26, 2017
1 parent
86089b3
commit f4961e0
Showing
7 changed files
with
1,153 additions
and
202 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
#!/usr/bin/perl | ||
##---------------------------------------------------------------------------## | ||
## File: | ||
## @(#) CosegConfig.pm | ||
## Author: | ||
## Robert Hubley <[email protected]> | ||
## Description: | ||
## This is the main configuration file for the Coseg | ||
## perl programs. Before you can run the programs included | ||
## in this package you will need to edit this file and | ||
## configure for your site. | ||
## | ||
#****************************************************************************** | ||
#* Copyright (C) Institute for Systems Biology 2016 Developed by | ||
#* Arian Smit and Robert Hubley. | ||
#* | ||
#* This work is licensed under the Open Source License v2.1. To view a copy | ||
#* of this license, visit http://www.opensource.org/licenses/osl-2.1.php or | ||
#* see the license.txt file contained in this distribution. | ||
#* | ||
############################################################################### | ||
package CosegConfig; | ||
use FindBin; | ||
require Exporter; | ||
@EXPORT_OK = qw( $REPEATMASKER_DIR ); | ||
|
||
%EXPORT_TAGS = ( all => [ @EXPORT_OK ] ); | ||
@ISA = qw(Exporter); | ||
|
||
BEGIN { | ||
##----------------------------------------------------------------------## | ||
## CONFIGURE THE FOLLOWING PARAMETERS FOR YOUR INSTALLATION ## | ||
## ## | ||
## | ||
## RepeatMasker Location | ||
## ====================== | ||
## The path to the RepeatMasker programs and support files | ||
## This is the directory with this file as well as | ||
## the ProcessRepeats and Library/ and Matrices/ subdirectories | ||
## reside. | ||
## | ||
## i.e. Typical UNIX installation | ||
## $REPEATMASKER_DIR = "/usr/local/RepeatMasker"; | ||
## | ||
$REPEATMASKER_DIR = "/usr/local/RepeatMasker"; | ||
|
||
## ## | ||
## END CONFIGURATION AREA ## | ||
##----------------------------------------------------------------------## | ||
} | ||
|
||
sub standalone_entry_point | ||
{ | ||
print "Enter location of the RepeatMasker program: "; | ||
my $answer = <STDIN>; | ||
$answer =~ s/[\n\r]+//g; | ||
# TODO Validate | ||
|
||
open IN,"<CosegConfig.pm" or die; | ||
open OUT,">CosegConfig.new" or die; | ||
while ( <IN> ) | ||
{ | ||
if ( /^\s*\$REPEATMASKER_DIR\s*\=/ ) | ||
{ | ||
print OUT " \$REPEATMASKER_DIR = \"$answer\";\n"; | ||
}else | ||
{ | ||
print OUT; | ||
} | ||
} | ||
close IN; | ||
close OUT; | ||
|
||
system("mv CosegConfig.new CosegConfig.pm"); | ||
exit; | ||
} | ||
|
||
## Allow this module to be called as a standalone script | ||
__PACKAGE__->standalone_entry_point() unless caller; | ||
|
||
1; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,10 +21,10 @@ Background | |
|
||
There are a few caveats: | ||
|
||
- The input sequences must be full length alignments to a *single* | ||
* The input sequences must be full length alignments to a *single* | ||
reference sequence. | ||
|
||
- The longer the sequence the harder this problem is to solve. Shorter | ||
* The longer the sequence the harder this problem is to solve. Shorter | ||
families or short subregions are recommended for this process. | ||
|
||
Current Authors: Robert Hubley <[email protected]> | ||
|
@@ -216,13 +216,17 @@ Running using your own data | |
with "-l div", "-l pv", "-l c" accordingly. | ||
|
||
|
||
Experimental | ||
============ | ||
Utility Programs | ||
================ | ||
|
||
refineConsSeqs.pl | ||
|
||
Using the refiner contained in the RepeatModeler package one may | ||
submit subfamily members for consensus refinement. There is an | ||
experimental script called refineConsSeqs.pl that will run this | ||
analysis: | ||
Coseg uses a rather crude method of building consensus sequences | ||
for each subfamily it finds. This script makes use of the refiner | ||
script in the RepeatModeler package to build and refine consensus | ||
sequences based on subfamily members assigned by coseg. | ||
|
||
Usage: | ||
|
||
1. From the directory where your coseg results can be | ||
found run: | ||
|
@@ -232,43 +236,72 @@ analysis: | |
Where "mycosegrun" is the prefix of the coseg run. | ||
|
||
|
||
extractSubSeqs.pl | ||
|
||
A utility script to extract the sequences for a given coseg subfamily. | ||
This uses the coseg input sequence file along with the *.assign file | ||
to determine which sequences belong to the queried subfamily number. | ||
|
||
Usage: | ||
|
||
|
||
|
||
|
||
Version History | ||
--------------- | ||
0.2.2: - Create a *.svg graph file without the need | ||
-.-.-: | ||
TODO: Print out a multiple alignment of the | ||
subfamilies showing only the signficant | ||
differences ( i.e. the "." and "i" method ) | ||
and grouping them by tree/div order. | ||
|
||
* Fixed the *.svg "<svg>" tag so that the files | ||
will directly load in HTML5 web browsers. | ||
* Calculation of divergence has been improved(?). | ||
We now use kimura substition distance with CpG | ||
site accounting modifications instead of the | ||
mixed substition and indel calculation. | ||
* Require that the diagnostic mutations which | ||
broke up the parent family are maintained | ||
after allowing all elements to be re-assigned. | ||
Currently this is only coded for bi-mutations. | ||
This is a major change and deserves to have | ||
a flag allowing one not to use it. | ||
|
||
0.2.2: * Create a *.svg graph file without the need | ||
to download/use GraphViz. The layout is | ||
handled by an adaptation of algorithm | ||
developed by Atze van der Ploeg. The SVG | ||
file produced supports various labeling | ||
options and subfamily details displayed | ||
when a node is hovered over. | ||
- Changed the default colormap for the graph | ||
* Changed the default colormap for the graph | ||
output. Now warm colors denote more diverged | ||
subfamilies in the tree while cooler colors | ||
represent younger subfamilies. To restore | ||
the original color scheme use the new "-o" | ||
flag to coseg. | ||
- Added parameter to control the minimum distance | ||
* Added parameter to control the minimum distance | ||
between diagnostic sites. Now the user can override | ||
the historic value of 10 using the -u flag. | ||
- Improved error reporting when there is a mismatch | ||
* Improved error reporting when there is a mismatch | ||
between an individual sequence length and the | ||
consensus length in the input files. | ||
- Fixed a bug that caused coseg to segfault. | ||
- Added experimental script refineConsSeqs.pl. This | ||
* Fixed a bug that caused coseg to segfault. | ||
* Added experimental script refineConsSeqs.pl. This | ||
script uses the RepeatModeler application to build | ||
and refine the consensus sequences for each | ||
subfamily. | ||
0.2.1: - Improved code documentation | ||
- Single mutation significance cutoff ( SIGMATHRESH ) was | ||
0.2.1: * Improved code documentation | ||
* Single mutation significance cutoff ( SIGMATHRESH ) was | ||
pre-calculated for Alkes Alu analysis and hardcoded. This | ||
version calculates the correct sigma cutoff using the length | ||
of the input sequence. | ||
- Fixed bug with implementation of Siegel's pValue | ||
* Fixed bug with implementation of Siegel's pValue | ||
calculation which caused a segfault -- found by Neal Platt. | ||
- Switched default pvalue method to Andy Siegel's method and | ||
* Switched default pvalue method to Andy Siegel's method and | ||
provided a new "-k" switch to use Alkes Price's method. | ||
- Fixed bug where the program was exiting when calculations | ||
* Fixed bug where the program was exiting when calculations | ||
fell below the precision of the machine ( epsilon ). Message | ||
given was "Below epsilon..." and the runcoseg.pl script | ||
moved on even though coseg failed. | ||
|
Oops, something went wrong.