commit/galaxy-central: 11 new changesets

8 Oct 2013

11 new commits in galaxy-central:

https://bitbucket.org/galaxy/galaxy-central/commits/26e7555cd499/
Changeset:   26e7555cd499
User:        fubar
Date:        2013-10-08 01:24:59
Summary:     hg backout -r 1092b0d
backout unintended commits to tophat and bowtie2 wrappers
Affected #:  3 files

diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/ngs_rna/tophat2_wrapper.xml

--- a/tools/ngs_rna/tophat2_wrapper.xml
+++ b/tools/ngs_rna/tophat2_wrapper.xml
@@ -126,14 +126,6 @@
     </command><inputs>
-        <param name="jobname" type="text" value="Tophat2" size="80" label="Job title for outputs" 
-                  help="Output name to remind you what this was for">
-           <sanitizer invalid_char="">
-               <valid initial="string.letters,string.digits">
-                  <add value="_" /> 
-               </valid>
-           </sanitizer>
-        </param><conditional name="singlePaired"><param name="sPaired" type="select" label="Is this library mate-paired?"><option value="single">Single-end</option>
@@ -281,19 +273,19 @@
     </stdio><outputs>
-        <data format="tabular" name="fusions" label="${on_string}_${jobname}_fusions.xls" from_work_dir="tophat_out/fusions.out">
+        <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data>
-        <data format="bed" name="insertions" label="${on_string}_${jobname}_ins.bed" from_work_dir="tophat_out/insertions.bed">
+        <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="deletions" label="${on_string}_${jobname}_del.bed" from_work_dir="tophat_out/deletions.bed">
+        <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="junctions" label="${on_string}_${jobname}_splicejunc.bed" from_work_dir="tophat_out/junctions.bed">
+        <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bam" name="accepted_hits" label="${on_string}_${jobname}_hits.bam" from_work_dir="tophat_out/accepted_hits.bam">
+        <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs>

diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/rgenetics/rgFastQC.py
--- a/tools/rgenetics/rgFastQC.py
+++ b/tools/rgenetics/rgFastQC.py
@@ -27,7 +27,6 @@
 from rgutils import getFileString
 import zipfile
 import gzip
-import bz2
 
 class FastQC():
     """wrapper
@@ -63,43 +62,27 @@
         infname = self.opts.inputfilename
         linf = infname.lower()
         trimext = False
-        isgz = False
-        isbz2 = False
-        iszip = False
         # decompression at upload currently does NOT remove this now bogus ending - fastqc will barf
         # patched may 29 2013 until this is fixed properly
-        f = gzip.open(self.opts.input)
-        try:
+        if ( linf.endswith('.gz') or linf.endswith('.gzip') ): 
+            f = gzip.open(self.opts.input)
+            try:
                testrow = f.readline()
-               if not (linf.endswith('.gz') or linf.endswith('.gzip')):
-                   isgz = True
-        except:
-               if linf.endswith('.gz') or linf.endswith('.gzip'):
-                   trimext = True
-        f.close()
-        f = bz2.BZ2File(self.opts.input)
-        try:
-            x = read(f,2) # will work if a real bz2
-            if not linf.endswith('bz2'):
-                isbz2 = True
-        except:
-              if linf.endswith('bz2'):
-                  trimext = True
-        f.close()
-        if zipfile.is_zipfile(self.opts.input):
-            if not linf.endswith('.zip'):
-                iszip = True
-        else:
-            if linf.endswith('.zip'):
- 		trimext = True
+            except:
+               trimext = True
+            f.close()
+        elif linf.endswith('bz2'):
+           f = bz2.open(self.opts.input,'rb')
+           try:
+              f.readline()
+           except:
+              trimext = True
+           f.close()
+        elif linf.endswith('.zip'):
+           if not zipfile.is_zipfile(self.opts.input):
+              trimext = True
         if trimext:
            infname = os.path.splitext(infname)[0]
-        elif isgz:
-           infname = '%s.gz' % infname
-        elif isbz2:
-           infname = '%s.bzip2' % infname
-        elif iszip:
-           infname = '%s.zip' % infname
         fastqinfilename = re.sub(ur'[^a-zA-Z0-9_\-\.]', '_', os.path.basename(infname))
         link_name = os.path.join(self.opts.outputdir, fastqinfilename)
         os.symlink(self.opts.input, link_name)

diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/sr_mapping/bowtie2_wrapper.xml
--- a/tools/sr_mapping/bowtie2_wrapper.xml
+++ b/tools/sr_mapping/bowtie2_wrapper.xml
@@ -87,22 +87,23 @@
         ## read group information
         #if str($read_group.selection) == "yes":
             #if $read_group.rgid and $read_group.rglb and $read_group.rgpl and $read_group.rgsm:
-                --rg-id "$read_group.rgid"
-                --rg "LB:$read_group.rglb"
-                --rg "PL:$read_group.rgpl"
-                --rg "SM:$read_group.rgsm"
+                --rg-id $read_group.rgid
+                --rg LB:$read_group.rglb
+                --rg PL:$read_group.rgpl
+                --rg SM:$read_group.rgsm
             #end if
         #end if
         
         ## view/sort and output file
-        | samtools view -Su - | samtools sort -o - - > $output
+        | samtools view -Su - | samtools sort -o - - > $output;
         
         ## rename unaligned sequence files
         #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r:
             #set left  = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' )
             #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' )
-            ;mv $left $output_unaligned_reads_l;
-            mv $right $output_unaligned_reads_r
+        
+            mv $left $output_unaligned_reads_l;
+            mv $right $output_unaligned_reads_r;
         #end if
     </command>
     
@@ -112,8 +113,6 @@
     </stdio><inputs>
-        <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" />
-
         <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?">
@@ -217,7 +216,7 @@
 
     <!-- define outputs --><outputs>
-        <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" >
+        <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" ><filter>unaligned_file is True</filter><actions><action type="format">
@@ -225,7 +224,7 @@
                 </action></actions></data>
-        <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)">
+        <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format">
@@ -233,7 +232,7 @@
                 </action></actions></data>
-        <data format="bam" name="output" label="${on_string}_${jobtitle}_aligned.bam">
+        <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads"><actions><conditional name="reference_genome.source"><when value="indexed">
@@ -255,17 +254,6 @@
     </outputs><tests>
-        <test>
-            <!-- basic test on single paired default run -->
-            <param name="type" value="single"/>
-            <param name="selection" value="no"/>
-            <param name="full" value="no"/>
-            <param name="unaligned_file" value="false"/>
-            <param name="source" value="history" />
-            <param name="input_1" value="bowtie2/phix_reads.fastq" ftype="fastqsanger"/>
-            <param name="own_file" value="bowtie2/phix_genome.fasta" />
-            <output name="output" file="bowtie2/phix_mapped.bam" />
-        </test></tests><help>


https://bitbucket.org/galaxy/galaxy-central/commits/962d981799cf/
Changeset:   962d981799cf
User:        fubar
Date:        2013-10-08 01:50:10
Summary:     backout -r 10799
Affected #:  191 files

diff -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 tools/rgedgeR/rgToolFactory.py
--- a/tools/rgedgeR/rgToolFactory.py
+++ /dev/null
@@ -1,605 +0,0 @@
-# rgToolFactory.py
-# see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
-# 
-# copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012
-# 
-# all rights reserved
-# Licensed under the LGPL
-# suggestions for improvement and bug fixes welcome at https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home
-#
-# july 2013
-# added ability to combine images and individual log files into html output
-# just make sure there's a log file foo.log and it will be output
-# together with all images named like "foo_*.pdf
-# otherwise old format for html
-#
-# January 2013
-# problem pointed out by Carlos Borroto
-# added escaping for <>$ - thought I did that ages ago...
-#
-# August 11 2012 
-# changed to use shell=False and cl as a sequence
-
-# This is a Galaxy tool factory for simple scripts in python, R or whatever ails ye.
-# It also serves as the wrapper for the new tool.
-# 
-# you paste and run your script
-# Only works for simple scripts that read one input from the history.
-# Optionally can write one new history dataset,
-# and optionally collect any number of outputs into links on an autogenerated HTML page.
-
-# DO NOT install on a public or important site - please.
-
-# installed generated tools are fine if the script is safe.
-# They just run normally and their user cannot do anything unusually insecure
-# but please, practice safe toolshed.
-# Read the fucking code before you install any tool 
-# especially this one
-
-# After you get the script working on some test data, you can
-# optionally generate a toolshed compatible gzip file
-# containing your script safely wrapped as an ordinary Galaxy script in your local toolshed for
-# safe and largely automated installation in a production Galaxy.
-
-# If you opt for an HTML output, you get all the script outputs arranged
-# as a single Html history item - all output files are linked, thumbnails for all the pdfs.
-# Ugly but really inexpensive.
-# 
-# Patches appreciated please. 
-#
-#
-# long route to June 2012 product
-# Behold the awesome power of Galaxy and the toolshed with the tool factory to bind them
-# derived from an integrated script model  
-# called rgBaseScriptWrapper.py
-# Note to the unwary:
-#   This tool allows arbitrary scripting on your Galaxy as the Galaxy user
-#   There is nothing stopping a malicious user doing whatever they choose
-#   Extremely dangerous!!
-#   Totally insecure. So, trusted users only
-#
-# preferred model is a developer using their throw away workstation instance - ie a private site.
-# no real risk. The universe_wsgi.ini admin_users string is checked - only admin users are permitted to run this tool.
-#
-
-import sys 
-import shutil 
-import subprocess 
-import os 
-import time 
-import tempfile 
-import optparse
-import tarfile
-import re
-import shutil
-import math
-
-progname = os.path.split(sys.argv[0])[1] 
-myversion = 'V000.2 June 2012' 
-verbose = False 
-debug = False
-toolFactoryURL = 'https://bitbucket.org/fubar/galaxytoolfactory'
-
-def timenow():
-    """return current time as a string
-    """
-    return time.strftime('%d/%m/%Y %H:%M:%S', time.localtime(time.time()))
-
-html_escape_table = {
-     "&": "&",
-     ">": ">",
-     "<": "<",
-     "$": "\$"
-     }
-
-def html_escape(text):
-     """Produce entities within text."""
-     return "".join(html_escape_table.get(c,c) for c in text)
-
-def cmd_exists(cmd):
-     return subprocess.call("type " + cmd, shell=True, 
-           stdout=subprocess.PIPE, stderr=subprocess.PIPE) == 0
-
-
-class ScriptRunner:
-    """class is a wrapper for an arbitrary script
-    """
-
-    def __init__(self,opts=None,treatbashSpecial=True):
-        """
-        cleanup inputs, setup some outputs
-        
-        """
-        self.useGM = cmd_exists('gm')
-        self.useIM = cmd_exists('convert')
-        self.useGS = cmd_exists('gs')
-        self.treatbashSpecial = treatbashSpecial
-        if opts.output_dir: # simplify for the tool tarball
-            os.chdir(opts.output_dir)
-        self.thumbformat = 'png'
-        self.opts = opts
-        self.toolname = re.sub('[^a-zA-Z0-9_]+', '', opts.tool_name) # a sanitizer now does this but..
-        self.toolid = self.toolname
-        self.myname = sys.argv[0] # get our name because we write ourselves out as a tool later
-        self.pyfile = self.myname # crude but efficient - the cruft won't hurt much
-        self.xmlfile = '%s.xml' % self.toolname
-        s = open(self.opts.script_path,'r').readlines()
-        s = [x.rstrip() for x in s] # remove pesky dos line endings if needed
-        self.script = '\n'.join(s)
-        fhandle,self.sfile = tempfile.mkstemp(prefix=self.toolname,suffix=".%s" % (opts.interpreter))
-        tscript = open(self.sfile,'w') # use self.sfile as script source for Popen
-        tscript.write(self.script)
-        tscript.close()
-        self.indentedScript = '\n'.join([' %s' % x for x in s]) # for restructured text in help
-        self.escapedScript = '\n'.join([html_escape(x) for x in s])
-        self.elog = os.path.join(self.opts.output_dir,"%s_error.log" % self.toolname)
-        if opts.output_dir: # may not want these complexities 
-            self.tlog = os.path.join(self.opts.output_dir,"%s_runner.log" % self.toolname)
-            art = '%s.%s' % (self.toolname,opts.interpreter)
-            artpath = os.path.join(self.opts.output_dir,art) # need full path
-            artifact = open(artpath,'w') # use self.sfile as script source for Popen
-            artifact.write(self.script)
-            artifact.close()
-        self.cl = []
-        self.html = []
-        a = self.cl.append
-        a(opts.interpreter)
-        if self.treatbashSpecial and opts.interpreter in ['bash','sh']:
-            a(self.sfile)
-        else:
-            a('-') # stdin
-        a(opts.input_tab)
-        a(opts.output_tab)
-        self.outFormats = 'tabular' # TODO make this an option at tool generation time
-        self.inputFormats = 'tabular' # TODO make this an option at tool generation time
-        self.test1Input = '%s_test1_input.xls' % self.toolname
-        self.test1Output = '%s_test1_output.xls' % self.toolname
-        self.test1HTML = '%s_test1_output.html' % self.toolname
-
-    def makeXML(self):
-        """
-        Create a Galaxy xml tool wrapper for the new script as a string to write out
-        fixme - use templating or something less fugly than this example of what we produce
-
-        <tool id="reverse" name="reverse" version="0.01">
-            <description>a tabular file</description>
-            <command interpreter="python">
-            reverse.py --script_path "$runMe" --interpreter "python" 
-            --tool_name "reverse" --input_tab "$input1" --output_tab "$tab_file" 
-            </command>
-            <inputs>
-            <param name="input1"  type="data" format="tabular" label="Select a suitable input file from your history"/><param name="job_name" type="text" label="Supply a name for the outputs to remind you what they contain" value="reverse"/>
-
-            </inputs>
-            <outputs>
-            <data format="tabular" name="tab_file" label="${job_name}"/>
-
-            </outputs>
-            <help>
-            
-**What it Does**
-
-Reverse the columns in a tabular file
-
-            </help>
-            <configfiles>
-            <configfile name="runMe">
-            
-# reverse order of columns in a tabular file
-import sys
-inp = sys.argv[1]
-outp = sys.argv[2]
-i = open(inp,'r')
-o = open(outp,'w')
-for row in i:
-     rs = row.rstrip().split('\t')
-     rs.reverse()
-     o.write('\t'.join(rs))
-     o.write('\n')
-i.close()
-o.close()
- 
-
-            </configfile>
-            </configfiles>
-            </tool>
-        
-        """ 
-        newXML="""<tool id="%(toolid)s" name="%(toolname)s" version="%(tool_version)s">
-            %(tooldesc)s
-            %(command)s
-            <inputs>
-            %(inputs)s
-            </inputs>
-            <outputs>
-            %(outputs)s
-            </outputs>
-            <configfiles>
-            <configfile name="runMe">
-            %(script)s
-            </configfile>
-            </configfiles>
-            %(tooltests)s
-            <help>
-            %(help)s
-            </help>
-            </tool>""" # needs a dict with toolname, toolid, interpreter, scriptname, command, inputs as a multi line string ready to write, outputs ditto, help ditto
-               
-        newCommand="""<command interpreter="python">
-            %(toolname)s.py --script_path "$runMe" --interpreter "%(interpreter)s" 
-            --tool_name "%(toolname)s" %(command_inputs)s %(command_outputs)s 
-            </command>""" # may NOT be an input or htmlout
-        tooltestsTabOnly = """<tests><test>
-        <param name="input1" value="%(test1Input)s" ftype="tabular"/>
-        <param name="job_name" value="test1"/>
-        <param name="runMe" value="$runMe"/>
-        <output name="tab_file" file="%(test1Output)s" ftype="tabular"/>
-        </test></tests>"""
-        tooltestsHTMLOnly = """<tests><test>
-        <param name="input1" value="%(test1Input)s" ftype="tabular"/>
-        <param name="job_name" value="test1"/>
-        <param name="runMe" value="$runMe"/>
-        <output name="html_file" file="%(test1HTML)s" ftype="html" lines_diff="5"/>
-        </test></tests>"""
-        tooltestsBoth = """<tests><test>
-        <param name="input1" value="%(test1Input)s" ftype="tabular"/>
-        <param name="job_name" value="test1"/>
-        <param name="runMe" value="$runMe"/>
-        <output name="tab_file" file="%(test1Output)s" ftype="tabular" />
-        <output name="html_file" file="%(test1HTML)s" ftype="html" lines_diff="10"/>
-        </test></tests>"""
-        xdict = {}
-        xdict['tool_version'] = self.opts.tool_version
-        xdict['test1Input'] = self.test1Input
-        xdict['test1HTML'] = self.test1HTML
-        xdict['test1Output'] = self.test1Output   
-        if self.opts.make_HTML and self.opts.output_tab <> 'None':
-            xdict['tooltests'] = tooltestsBoth % xdict
-        elif self.opts.make_HTML:
-            xdict['tooltests'] = tooltestsHTMLOnly % xdict
-        else:
-            xdict['tooltests'] = tooltestsTabOnly % xdict
-        xdict['script'] = self.escapedScript 
-        # configfile is least painful way to embed script to avoid external dependencies
-        # but requires escaping of <, > and $ to avoid Mako parsing
-        if self.opts.help_text:
-            xdict['help'] = open(self.opts.help_text,'r').read()
-        else:
-            xdict['help'] = 'Please ask the tool author for help as none was supplied at tool generation'
-        coda = ['**Script**','Pressing execute will run the following code over your input file and generate some outputs in your history::']
-        coda.append(self.indentedScript)
-        coda.append('**Attribution** This Galaxy tool was created by %s at %s\nusing the Galaxy Tool Factory.' % (self.opts.user_email,timenow()))
-        coda.append('See %s for details of that project' % (toolFactoryURL))
-        coda.append('Please cite: Creating re-usable tools from scripts: The Galaxy Tool Factory. Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team. ')
-        coda.append('Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573')
-        xdict['help'] = '%s\n%s' % (xdict['help'],'\n'.join(coda))
-        if self.opts.tool_desc:
-            xdict['tooldesc'] = '<description>%s</description>' % self.opts.tool_desc
-        else:
-            xdict['tooldesc'] = ''
-        xdict['command_outputs'] = '' 
-        xdict['outputs'] = '' 
-        if self.opts.input_tab <> 'None':
-            xdict['command_inputs'] = '--input_tab "$input1" ' # the space may matter a lot if we append something
-            xdict['inputs'] = '<param name="input1"  type="data" format="%s" label="Select a suitable input file from your history"/> \n' % self.inputFormats
-        else:
-            xdict['command_inputs'] = '' # assume no input - eg a random data generator       
-            xdict['inputs'] = ''
-        xdict['inputs'] += '<param name="job_name" type="text" label="Supply a name for the outputs to remind you what they contain" value="%s"/> \n' % self.toolname
-        xdict['toolname'] = self.toolname
-        xdict['toolid'] = self.toolid
-        xdict['interpreter'] = self.opts.interpreter
-        xdict['scriptname'] = self.sfile
-        if self.opts.make_HTML:
-            xdict['command_outputs'] += ' --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes" '
-            xdict['outputs'] +=  ' <data format="html" name="html_file" label="${job_name}.html"/>\n'
-        if self.opts.output_tab <> 'None':
-            xdict['command_outputs'] += ' --output_tab "$tab_file"'
-            xdict['outputs'] += ' <data format="%s" name="tab_file" label="${job_name}"/>\n' % self.outFormats
-        xdict['command'] = newCommand % xdict
-        xmls = newXML % xdict
-        xf = open(self.xmlfile,'w')
-        xf.write(xmls)
-        xf.write('\n')
-        xf.close()
-        # ready for the tarball
-
-
-    def makeTooltar(self):
-        """
-        a tool is a gz tarball with eg
-        /toolname/tool.xml /toolname/tool.py /toolname/test-data/test1_in.foo ...
-        """
-        retval = self.run()
-        if retval:
-            print >> sys.stderr,'## Run failed. Cannot build yet. Please fix and retry'
-            sys.exit(1)
-        self.makeXML()
-        tdir = self.toolname
-        os.mkdir(tdir)
-        if self.opts.input_tab <> 'None': # no reproducible test otherwise? TODO: maybe..
-            testdir = os.path.join(tdir,'test-data')
-            os.mkdir(testdir) # make tests directory
-            shutil.copyfile(self.opts.input_tab,os.path.join(testdir,self.test1Input))
-            if self.opts.output_tab <> 'None':
-                shutil.copyfile(self.opts.output_tab,os.path.join(testdir,self.test1Output))
-            if self.opts.make_HTML:
-                shutil.copyfile(self.opts.output_html,os.path.join(testdir,self.test1HTML))
-            if self.opts.output_dir:
-                shutil.copyfile(self.tlog,os.path.join(testdir,'test1_out.log'))
-        op = '%s.py' % self.toolname # new name
-        outpiname = os.path.join(tdir,op) # path for the tool tarball
-        pyin = os.path.basename(self.pyfile) # our name - we rewrite ourselves (TM)
-        notes = ['# %s - a self annotated version of %s generated by running %s\n' % (op,pyin,pyin),]
-        notes.append('# to make a new Galaxy tool called %s\n' % self.toolname)
-        notes.append('# User %s at %s\n' % (self.opts.user_email,timenow()))
-        pi = open(self.pyfile,'r').readlines() # our code becomes new tool wrapper (!) - first Galaxy worm
-        notes += pi
-        outpi = open(outpiname,'w')
-        outpi.write(''.join(notes))
-        outpi.write('\n')
-        outpi.close()
-        stname = os.path.join(tdir,self.sfile)
-        if not os.path.exists(stname):
-            shutil.copyfile(self.sfile, stname)
-        xtname = os.path.join(tdir,self.xmlfile)
-        if not os.path.exists(xtname):
-            shutil.copyfile(self.xmlfile,xtname)
-        tarpath = "%s.gz" % self.toolname
-        tar = tarfile.open(tarpath, "w:gz")
-        tar.add(tdir,arcname=self.toolname)
-        tar.close()
-        shutil.copyfile(tarpath,self.opts.new_tool)
-        shutil.rmtree(tdir)
-        ## TODO: replace with optional direct upload to local toolshed?
-        return retval
-
-
-    def compressPDF(self,inpdf=None,thumbformat='png'):
-        """need absolute path to pdf
-        """
-        assert os.path.isfile(inpdf), "## Input %s supplied to %s compressPDF not found" % (inpdf,self.myName)
-        hf,hlog = tempfile.mkstemp(suffix="%s.log" % self.toolname)
-        sto = open(hlog,'w')
-        outpdf = '%s_compressed' % inpdf
-        cl = ["gs", "-sDEVICE=pdfwrite", "-dNOPAUSE", "-dBATCH","-dPDFSETTINGS=/printer", "-sOutputFile=%s" % outpdf,inpdf]
-        x = subprocess.Popen(cl,stdout=sto,stderr=sto,cwd=self.opts.output_dir)
-        retval1 = x.wait()
-        if retval1 == 0:
-            os.unlink(inpdf)
-            shutil.move(outpdf,inpdf)
-        outpng = '%s.%s' % (os.path.splitext(inpdf)[0],thumbformat)
-        if self.useGM:        
-            cl2 = ['gm convert', inpdf, outpng]
-        else: # assume imagemagick
-            cl2 = ['convert', inpdf, outpng]
-        x = subprocess.Popen(cl2,stdout=sto,stderr=sto,cwd=self.opts.output_dir)
-        retval2 = x.wait()
-        sto.close()
-        retval = retval1 or retval2
-        return retval
-
-
-    def getfSize(self,fpath,outpath):
-        """
-        format a nice file size string
-        """
-        size = ''
-        fp = os.path.join(outpath,fpath)
-        if os.path.isfile(fp):
-            size = '0 B'
-            n = float(os.path.getsize(fp))
-            if n > 2**20:
-                size = '%1.1f MB' % (n/2**20)
-            elif n > 2**10:
-                size = '%1.1f KB' % (n/2**10)
-            elif n > 0:
-                size = '%d B' % (int(n))
-        return size
-
-    def makeHtml(self):
-        """ Create an HTML file content to list all the artifacts found in the output_dir
-        """
-
-        galhtmlprefix = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
-        <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 
-        <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
-        <meta name="generator" content="Galaxy %s tool output - see http://g2.trac.bx.psu.edu/" /> 
-        <title></title> 
-        <link rel="stylesheet" href="/static/style/base.css" type="text/css" /> 
-        </head> 
-        <body> 
-        <div class="toolFormBody"> 
-        """ 
-        galhtmlattr = """<hr/><div class="infomessage">This tool (%s) was generated by the <a href="https://bitbucket.org/fubar/galaxytoolfactory/overview">Galaxy Tool Factory</a></div><br/>""" 
-        galhtmlpostfix = """</div></body></html>\n"""
-
-        flist = os.listdir(self.opts.output_dir)
-        flist = [x for x in flist if x <> 'Rplots.pdf']
-        flist.sort()
-        html = []
-        html.append(galhtmlprefix % progname)
-        html.append('<div class="infomessage">Galaxy Tool "%s" run at %s</div><br/>' % (self.toolname,timenow()))
-        fhtml = []
-        if len(flist) > 0:
-            logfiles = [x for x in flist if x.lower().endswith('.log')] # log file names determine sections
-            logfiles.sort()
-            logfiles = [x for x in logfiles if os.path.abspath(x) <> os.path.abspath(self.tlog)]
-            logfiles.append(os.path.abspath(self.tlog)) # make it the last one
-            pdflist = []
-            npdf = len([x for x in flist if os.path.splitext(x)[-1].lower() == '.pdf'])
-            for rownum,fname in enumerate(flist):
-                dname,e = os.path.splitext(fname)
-                sfsize = self.getfSize(fname,self.opts.output_dir)
-                if e.lower() == '.pdf' : # compress and make a thumbnail
-                    thumb = '%s.%s' % (dname,self.thumbformat)
-                    pdff = os.path.join(self.opts.output_dir,fname)
-                    retval = self.compressPDF(inpdf=pdff,thumbformat=self.thumbformat)
-                    if retval == 0:
-                        pdflist.append((fname,thumb))
-                if (rownum+1) % 2 == 0:
-                    fhtml.append('<tr class="odd_row"><td><a href="%s">%s</a></td><td>%s</td></tr>' % (fname,fname,sfsize))
-                else:
-                    fhtml.append('<tr><td><a href="%s">%s</a></td><td>%s</td></tr>' % (fname,fname,sfsize))
-            for logfname in logfiles: # expect at least tlog - if more
-                if os.path.abspath(logfname) == os.path.abspath(self.tlog): # handled later
-                    sectionname = 'All tool run'
-                    if (len(logfiles) > 1):
-                        sectionname = 'Other'
-                    ourpdfs = pdflist
-                else:
-                    realname = os.path.basename(logfname)
-                    sectionname = os.path.splitext(realname)[0].split('_')[0] # break in case _ added to log
-                    ourpdfs = [x for x in pdflist if os.path.basename(x[0]).split('_')[0] == sectionname]
-                    pdflist = [x for x in pdflist if os.path.basename(x[0]).split('_')[0] <> sectionname] # remove
-                nacross = 1
-                npdf = len(ourpdfs)
-
-                if npdf > 0:
-                    nacross = math.sqrt(npdf) ## int(round(math.log(npdf,2)))
-                    if int(nacross)**2 != npdf:
-                        nacross += 1
-                    nacross = int(nacross)
-                    width = min(400,int(1200/nacross))
-                    html.append('<div class="toolFormTitle">%s images and outputs</div>' % sectionname)
-                    html.append('(Click on a thumbnail image to download the corresponding original PDF image)<br/>')
-                    ntogo = nacross # counter for table row padding with empty cells
-                    html.append('<div><table class="simple" cellpadding="2" cellspacing="2">\n<tr>')
-                    for i,paths in enumerate(ourpdfs): 
-                        fname,thumb = paths
-                        s= """<td><a href="%s"><img src="%s" title="Click to download a PDF of %s" hspace="5" width="%d" 
-                           alt="Image called %s"/></a></td>\n""" % (fname,thumb,fname,width,fname)
-                        if ((i+1) % nacross == 0):
-                            s += '</tr>\n'
-                            ntogo = 0
-                            if i < (npdf - 1): # more to come
-                               s += '<tr>'
-                               ntogo = nacross
-                        else:
-                            ntogo -= 1
-                        html.append(s)
-                    if html[-1].strip().endswith('</tr>'):
-                        html.append('</table></div>\n')
-                    else:
-                        if ntogo > 0: # pad
-                           html.append('<td> </td>'*ntogo)
-                        html.append('</tr></table></div>\n')
-                logt = open(logfname,'r').readlines()
-                logtext = [x for x in logt if x.strip() > '']
-                html.append('<div class="toolFormTitle">%s log output</div>' % sectionname)
-                if len(logtext) > 1:
-                    html.append('\n<pre>\n')
-                    html += logtext
-                    html.append('\n</pre>\n')
-                else:
-                    html.append('%s is empty<br/>' % logfname)
-        if len(fhtml) > 0:
-           fhtml.insert(0,'<div><table class="colored" cellpadding="3" cellspacing="3"><tr><th>Output File Name (click to view)</th><th>Size</th></tr>\n')
-           fhtml.append('</table></div><br/>')
-           html.append('<div class="toolFormTitle">All output files available for downloading</div>\n')
-           html += fhtml # add all non-pdf files to the end of the display
-        else:
-            html.append('<div class="warningmessagelarge">### Error - %s returned no files - please confirm that parameters are sane</div>' % self.opts.interpreter)
-        html.append(galhtmlpostfix)
-        htmlf = file(self.opts.output_html,'w')
-        htmlf.write('\n'.join(html))
-        htmlf.write('\n')
-        htmlf.close()
-        self.html = html
-
-
-    def run(self):
-        """
-        scripts must be small enough not to fill the pipe!
-        """
-        if self.treatbashSpecial and self.opts.interpreter in ['bash','sh']:
-          retval = self.runBash()
-        else:
-            if self.opts.output_dir:
-                ste = open(self.elog,'w')
-                sto = open(self.tlog,'w')
-                sto.write('## Toolfactory generated command line = %s\n' % ' '.join(self.cl))
-                sto.flush()
-                p = subprocess.Popen(self.cl,shell=False,stdout=sto,stderr=ste,stdin=subprocess.PIPE,cwd=self.opts.output_dir)
-            else:
-                p = subprocess.Popen(self.cl,shell=False,stdin=subprocess.PIPE)
-            p.stdin.write(self.script)
-            p.stdin.close()
-            retval = p.wait()
-            if self.opts.output_dir:
-                sto.close()
-                ste.close()
-                err = open(self.elog,'r').readlines()
-                if retval <> 0 and err: # problem
-                    print >> sys.stderr,err
-            if self.opts.make_HTML:
-                self.makeHtml()
-        return retval
-
-    def runBash(self):
-        """
-        cannot use - for bash so use self.sfile
-        """
-        if self.opts.output_dir:
-            s = '## Toolfactory generated command line = %s\n' % ' '.join(self.cl)
-            sto = open(self.tlog,'w')
-            sto.write(s)
-            sto.flush()
-            p = subprocess.Popen(self.cl,shell=False,stdout=sto,stderr=sto,cwd=self.opts.output_dir)
-        else:
-            p = subprocess.Popen(self.cl,shell=False)            
-        retval = p.wait()
-        if self.opts.output_dir:
-            sto.close()
-        if self.opts.make_HTML:
-            self.makeHtml()
-        return retval
-  
-
-def main():
-    u = """
-    This is a Galaxy wrapper. It expects to be called by a special purpose tool.xml as:
-    <command interpreter="python">rgBaseScriptWrapper.py --script_path "$scriptPath" --tool_name "foo" --interpreter "Rscript"
-    </command>
-    """
-    op = optparse.OptionParser()
-    a = op.add_option
-    a('--script_path',default=None)
-    a('--tool_name',default=None)
-    a('--interpreter',default=None)
-    a('--output_dir',default=None)
-    a('--output_html',default=None)
-    a('--input_tab',default="None")
-    a('--output_tab',default="None")
-    a('--user_email',default='Unknown')
-    a('--bad_user',default=None)
-    a('--make_Tool',default=None)
-    a('--make_HTML',default=None)
-    a('--help_text',default=None)
-    a('--tool_desc',default=None)
-    a('--new_tool',default=None)
-    a('--tool_version',default=None)
-    opts, args = op.parse_args()
-    assert not opts.bad_user,'UNAUTHORISED: %s is NOT authorized to use this tool until Galaxy admin adds %s to admin_users in universe_wsgi.ini' % (opts.bad_user,opts.bad_user)
-    assert opts.tool_name,'## Tool Factory expects a tool name - eg --tool_name=DESeq'
-    assert opts.interpreter,'## Tool Factory wrapper expects an interpreter - eg --interpreter=Rscript'
-    assert os.path.isfile(opts.script_path),'## Tool Factory wrapper expects a script path - eg --script_path=foo.R'
-    if opts.output_dir:
-        try:
-            os.makedirs(opts.output_dir)
-        except:
-            pass
-    r = ScriptRunner(opts)
-    if opts.make_Tool:
-        retcode = r.makeTooltar()
-    else:
-        retcode = r.run()
-    os.unlink(r.sfile)
-    if retcode:
-        sys.exit(retcode) # indicate failure to job runner
-
-
-if __name__ == "__main__":
-    main()
-
-

diff -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 tools/rgedgeR/rgedgeRpaired.xml
--- a/tools/rgedgeR/rgedgeRpaired.xml
+++ /dev/null
@@ -1,1080 +0,0 @@
-<tool id="rgDifferentialCount" name="Differential_Count" version="0.20">
-  <description>models using BioConductor packages</description>
-  <requirements>
-      <requirement type="package" version="2.12">biocbasics</requirement>
-      <requirement type="package" version="3.0.1">r3</requirement>
-      <requirement type="package" version="1.3.18">graphicsmagick</requirement>
-      <requirement type="package" version="9.07">ghostscript</requirement>
-  </requirements>
-  
-  <command interpreter="python">
-     rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "DifferentialCounts" 
-    --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes"
-  </command>
-  <inputs>
-    <param name="input1"  type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample"
-       help="Use the HTSeq based count matrix preparation tool to create these matrices from BAM/SAM files and a GTF file of genomic features"/>
-    <param name="title" type="text" value="Differential Counts" size="80" label="Title for job outputs" 
-           help="Supply a meaningful name here to remind you what the outputs contain">
-      <sanitizer invalid_char="">
-        <valid initial="string.letters,string.digits"><add value="_" /></valid>
-      </sanitizer>
-    </param>
-    <param name="treatment_name" type="text" value="Treatment" size="50" label="Treatment Name"/>
-    <param name="Treat_cols" label="Select columns containing treatment." type="data_column" data_ref="input1" numerical="True" 
-         multiple="true" use_header_names="true" size="120" display="checkboxes">
-        <validator type="no_options" message="Please select at least one column."/>
-    </param>
-    <param name="control_name" type="text" value="Control" size="50" label="Control Name"/>
-    <param name="Control_cols" label="Select columns containing control." type="data_column" data_ref="input1" numerical="True" 
-         multiple="true" use_header_names="true" size="120" display="checkboxes" optional="true">
-    </param>
-    <param name="subjectids" type="text" optional="true" size="120" value = ""
-       label="IF SUBJECTS NOT ALL INDEPENDENT! Enter comma separated strings to indicate sample labels for (eg) pairing - must be one for every column in input"
-       help="Leave blank if no pairing, but eg if data from sample id A99 is in columns 2,4 and id C21 is in 3,5 then enter 'A99,C21,A99,C21'">
-      <sanitizer>
-        <valid initial="string.letters,string.digits"><add value="," /></valid>
-      </sanitizer>
-    </param>
-    <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs"
-     help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/>
-    <param name="useNDF" type="boolean" truevalue="T" falsevalue="F" checked="false" size="1" 
-              label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples"
-     help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/>
-
-    <conditional name="edgeR">
-        <param name="doedgeR" type="select" 
-           label="Run this model using edgeR"
-           help="edgeR uses a negative binomial model and seems to be powerful, even with few replicates">
-          <option value="F">Do not run edgeR</option>
-          <option value="T" selected="true">Run edgeR</option>
-         </param>
-         <when value="T">
-          <param name="edgeR_priordf" type="integer" value="20" size="3" 
-           label="prior.df for tagwise dispersion - lower value = more emphasis on each tag's variance. Replaces prior.n  and prior.df = prior.n * residual.df"
-           help="0 = Use edgeR default. Use a small value to 'smooth' small samples. See edgeR docs and note below"/>
-         </when>
-         <when value="F"></when>
-    </conditional>
-    <conditional name="DESeq2">
-    <param name="doDESeq2" type="select" 
-       label="Run the same model with DESeq2 and compare findings"
-       help="DESeq2 is an update to the DESeq package. It uses different assumptions and methods to edgeR">
-      <option value="F" selected="true">Do not run DESeq2</option>
-      <option value="T">Run DESeq2</option>
-     </param>
-     <when value="T">
-         <param name="DESeq_fitType" type="select">
-            <option value="parametric" selected="true">Parametric (default) fit for dispersions</option>
-            <option value="local">Local fit - this will automagically be used if parametric fit fails</option>
-            <option value="mean">Mean dispersion fit- use this if you really understand what you're doing - read the fine manual linked below in the documentation</option>
-         </param> 
-     </when>
-     <when value="F"></when>
-    </conditional>
-    <param name="doVoom" type="select" 
-       label="Run the same model with Voom/limma and compare findings"
-       help="Voom uses counts per million and a precise transformation of variance so count data can be analysed using limma">
-      <option value="F" selected="true">Do not run VOOM</option>
-      <option value="T">Run VOOM</option>
-     </param>
-    <conditional name="camera">
-        <param name="doCamera" type="select" label="Run the edgeR implementation of Camera GSEA for up/down gene sets" 
-            help="If yes, you can choose a set of genesets to test and/or supply a gmt format geneset collection from your history">
-        <option value="F" selected="true">Do not run GSEA tests with the Camera algorithm</option>
-        <option value="T">Run GSEA tests with the Camera algorithm</option>
-        </param>
-         <when value="T">
-         <conditional name="gmtSource">
-          <param name="refgmtSource" type="select" 
-             label="Use a gene set (.gmt) from your history and/or use a built-in (MSigDB etc) gene set">
-            <option value="indexed" selected="true">Use a built-in gene set</option>
-            <option value="history">Use a gene set from my history</option>
-            <option value="both">Add a gene set from my history to a built in gene set</option>
-          </param>
-          <when value="indexed">
-            <param name="builtinGMT" type="select" label="Select a gene set matrix (.gmt) file to use for the analysis">
-              <options from_data_table="gseaGMT_3.1">
-                <filter type="sort_by" column="2" />
-                <validator type="no_options" message="No GMT v3.1 files are available - please install them"/>
-              </options>
-            </param>
-          </when>
-          <when value="history">
-            <param name="ownGMT" type="data" format="gmt" label="Select a Gene Set from your history" />
-          </when>
-          <when value="both">
-            <param name="ownGMT" type="data" format="gseagmt" label="Select a Gene Set from your history" />
-            <param name="builtinGMT" type="select" label="Select a gene set matrix (.gmt) file to use for the analysis">
-              <options from_data_table="gseaGMT_4">
-                <filter type="sort_by" column="2" />
-                <validator type="no_options" message="No GMT v4 files are available - please fix tool_data_table and loc files"/>
-              </options>
-            </param>
-           </when>
-         </conditional>
-         </when>
-         <when value="F"> 
-         </when>
-    </conditional>
-    <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for amily wise error rate control"
-     help="Conventional default value of 0.05 recommended"/>
-    <param name="fdrtype" type="select" label="FDR (Type II error) control method" 
-         help="Use fdr or bh typically to control for the number of tests in a reliable way">
-            <option value="fdr" selected="true">fdr</option>
-            <option value="BH">Benjamini Hochberg</option>
-            <option value="BY">Benjamini Yukateli</option>
-            <option value="bonferroni">Bonferroni</option>
-            <option value="hochberg">Hochberg</option>
-            <option value="holm">Holm</option>
-            <option value="hommel">Hommel</option>
-            <option value="none">no control for multiple tests</option>
-    </param>
-  </inputs>
-  <outputs>
-    <data format="tabular" name="out_edgeR" label="${title}_topTable_edgeR.xls">
-         <filter>edgeR['doedgeR'] == "T"</filter>
-    </data>
-    <data format="tabular" name="out_DESeq2" label="${title}_topTable_DESeq2.xls">
-         <filter>DESeq2['doDESeq2'] == "T"</filter>
-    </data>
-    <data format="tabular" name="out_VOOM" label="${title}_topTable_VOOM.xls">
-         <filter>doVoom == "T"</filter>
-    </data>
-    <data format="html" name="html_file" label="${title}.html"/>
-  </outputs>
- <stdio>
-     <exit_code range="4"   level="fatal"   description="Number of subject ids must match total number of samples in the input matrix" />
- </stdio>
- <tests>
-<test>
-<param name='input1' value='test_bams2mx.xls' ftype='tabular' />
- <param name='treatment_name' value='case' />
- <param name='title' value='edgeRtest' />
- <param name='useNDF' value='' />
- <param name='doedgeR' value='T' />
- <param name='doVoom' value='T' />
- <param name='doDESeq2' value='T' />
- <param name='fdrtype' value='fdr' />
- <param name='edgeR_priordf' value="8" />
- <param name='fdrthresh' value="0.05" />
- <param name='control_name' value='control' />
- <param name='subjectids' value='' />
-  <param name='Treat_cols' value='3,4,5,9' />
- <param name='Control_cols' value='2,6,7,8' />
- <output name='out_edgeR' file='edgeRtest1out.xls' compare='diff' />
- <output name='html_file' file='edgeRtest1out.html'  compare='diff' lines_diff='20' />
-</test>
-</tests>
-
-<configfiles>
-<configfile name="runme">
-<![CDATA[
-# 
-# edgeR.Rscript
-# updated npv 2011 for R 2.14.0 and edgeR 2.4.0 by ross
-# Performs DGE on a count table containing n replicates of two conditions
-#
-# Parameters
-#
-# 1 - Output Dir
-
-# Original edgeR code by: S.Lunke and A.Kaspi
-reallybig = log10(.Machine\$double.xmax)
-reallysmall = log10(.Machine\$double.xmin)
-library('stringr')
-library('gplots')
-library('edgeR')
-hmap2 = function(cmat,nsamp=100,outpdfname='heatmap2.pdf', TName='Treatment',group=NA,myTitle='title goes here')
-{
-# Perform clustering for significant pvalues after controlling FWER
-    samples = colnames(cmat)
-    gu = unique(group)
-    gn = rownames(cmat)
-    if (length(gu) == 2) {
-        col.map = function(g) {if (g==gu[1]) "#FF0000" else "#0000FF"}
-        pcols = unlist(lapply(group,col.map))        
-        } else {
-        colours = rainbow(length(gu),start=0,end=4/6)
-        pcols = colours[match(group,gu)]        }
-    dm = cmat[(! is.na(gn)),] 
-    # remove unlabelled hm rows
-    nprobes = nrow(dm)
-    # sub = paste('Showing',nprobes,'contigs ranked for evidence of differential abundance')
-    if (nprobes > nsamp) {
-      dm =dm[1:nsamp,]
-      #sub = paste('Showing',nsamp,'contigs ranked for evidence for differential abundance out of',nprobes,'total')
-    }
-    newcolnames = substr(colnames(dm),1,20)
-    colnames(dm) = newcolnames
-    pdf(outpdfname)
-    heatmap.2(dm,main=myTitle,ColSideColors=pcols,col=topo.colors(100),dendrogram="col",key=T,density.info='none',
-         Rowv=F,scale='row',trace='none',margins=c(8,8),cexRow=0.4,cexCol=0.5)
-    dev.off()
-}
-
-hmap = function(cmat,nmeans=4,outpdfname="heatMap.pdf",nsamp=250,TName='Treatment',group=NA,myTitle="Title goes here")
-{
-    # for 2 groups only was
-    #col.map = function(g) {if (g==TName) "#FF0000" else "#0000FF"}
-    #pcols = unlist(lapply(group,col.map))
-    gu = unique(group)
-    colours = rainbow(length(gu),start=0.3,end=0.6)
-    pcols = colours[match(group,gu)]
-    nrows = nrow(cmat)
-    mtitle = paste(myTitle,'Heatmap: n contigs =',nrows)
-    if (nrows > nsamp)  {
-               cmat = cmat[c(1:nsamp),]
-               mtitle = paste('Heatmap: Top ',nsamp,' DE contigs (of ',nrows,')',sep='')
-          }
-    newcolnames = substr(colnames(cmat),1,20)
-    colnames(cmat) = newcolnames
-    pdf(outpdfname)
-    heatmap(cmat,scale='row',main=mtitle,cexRow=0.3,cexCol=0.4,Rowv=NA,ColSideColors=pcols)
-    dev.off()
-}
-
-qqPlot = function(descr='qqplot',pvector, outpdf='qqplot.pdf',...)
-# stolen from https://gist.github.com/703512
-{
-    o = -log10(sort(pvector,decreasing=F))
-    e = -log10( 1:length(o)/length(o) )
-    o[o==-Inf] = reallysmall
-    o[o==Inf] = reallybig
-    maint = descr
-    pdf(outpdf)
-    plot(e,o,pch=19,cex=1, main=maint, ...,
-        xlab=expression(Expected~~-log[10](italic(p))),
-        ylab=expression(Observed~~-log[10](italic(p))),
-        xlim=c(0,max(e)), ylim=c(0,max(o)))
-    lines(e,e,col="red")
-    grid(col = "lightgray", lty = "dotted")
-    dev.off()
-}
-
-smearPlot = function(DGEList,deTags, outSmear, outMain)
-        {
-        pdf(outSmear)
-        plotSmear(DGEList,de.tags=deTags,main=outMain)
-        grid(col="lightgray", lty="dotted")
-        dev.off()
-        }
-        
-boxPlot = function(rawrs,cleanrs,maint,myTitle,pdfname)
-{   # 
-        nc = ncol(rawrs)
-        for (i in c(1:nc)) {rawrs[(rawrs[,i] < 0),i] = NA}
-        fullnames = colnames(rawrs)
-        newcolnames = substr(colnames(rawrs),1,20)
-        colnames(rawrs) = newcolnames
-        newcolnames = substr(colnames(cleanrs),1,20)
-        colnames(cleanrs) = newcolnames
-        defpar = par(no.readonly=T)
-        print.noquote('raw contig counts by sample:')
-        print.noquote(summary(rawrs))
-        print.noquote('normalised contig counts by sample:')
-        print.noquote(summary(cleanrs))
-        pdf(pdfname)
-        par(mfrow=c(1,2))
-        boxplot(rawrs,varwidth=T,notch=T,ylab='log contig count',col="maroon",las=3,cex.axis=0.35,main=paste('Raw:',maint))
-        grid(col="lightgray",lty="dotted")
-        boxplot(cleanrs,varwidth=T,notch=T,ylab='log contig count',col="maroon",las=3,cex.axis=0.35,main=paste('After ',maint))
-        grid(col="lightgray",lty="dotted")
-        dev.off()
-        pdfname = "sample_counts_histogram.pdf" 
-        nc = ncol(rawrs)
-        print.noquote(paste('Using ncol rawrs=',nc))
-        ncroot = round(sqrt(nc))
-        if (ncroot*ncroot < nc) { ncroot = ncroot + 1 }
-        m = c()
-        for (i in c(1:nc)) {
-              rhist = hist(rawrs[,i],breaks=100,plot=F)
-              m = append(m,max(rhist\$counts))
-             }
-        ymax = max(m)
-        ncols = length(fullnames)
-        if (ncols > 20) 
-        {
-           scale = 7*ncols/20
-           pdf(pdfname,width=scale,height=scale)
-        } else { 
-           pdf(pdfname)
-        }
-        par(mfrow=c(ncroot,ncroot))
-        for (i in c(1:nc)) {
-                 hist(rawrs[,i], main=paste("Contig logcount",i), xlab='log raw count', col="maroon", 
-                 breaks=100,sub=fullnames[i],cex=0.8,ylim=c(0,ymax))
-             }
-        dev.off()
-        par(defpar)
-      
-}
-
-cumPlot = function(rawrs,cleanrs,maint,myTitle)
-{   # updated to use ecdf
-        pdfname = "Filtering_rowsum_bar_charts.pdf"
-        defpar = par(no.readonly=T)
-        lrs = log(rawrs,10) 
-        lim = max(lrs)
-        pdf(pdfname)
-        par(mfrow=c(2,1))
-        hist(lrs,breaks=100,main=paste('Before:',maint),xlab="# Reads (log)",
-             ylab="Count",col="maroon",sub=myTitle, xlim=c(0,lim),las=1)
-        grid(col="lightgray", lty="dotted")
-        lrs = log(cleanrs,10) 
-        hist(lrs,breaks=100,main=paste('After:',maint),xlab="# Reads (log)",
-             ylab="Count",col="maroon",sub=myTitle,xlim=c(0,lim),las=1)
-        grid(col="lightgray", lty="dotted")
-        dev.off()
-        par(defpar)
-}
-
-cumPlot1 = function(rawrs,cleanrs,maint,myTitle)
-{   # updated to use ecdf
-        pdfname = paste(gsub(" ","", myTitle , fixed=TRUE),"RowsumCum.pdf",sep='_')
-        pdf(pdfname)
-        par(mfrow=c(2,1))
-        lastx = max(rawrs)
-        rawe = knots(ecdf(rawrs))
-        cleane = knots(ecdf(cleanrs))
-        cy = 1:length(cleane)/length(cleane)
-        ry = 1:length(rawe)/length(rawe)
-        plot(rawe,ry,type='l',main=paste('Before',maint),xlab="Log Contig Total Reads",
-             ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle)
-        grid(col="blue")
-        plot(cleane,cy,type='l',main=paste('After',maint),xlab="Log Contig Total Reads",
-             ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle)
-        grid(col="blue")
-        dev.off()
-}
-
-
-
-doGSEAold = function(y=NULL,design=NULL,histgmt="",
-                  bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt",
-                  ntest=0, myTitle="myTitle", outfname="GSEA.xls", minnin=5, maxnin=2000,fdrthresh=0.05,fdrtype="BH")
-{
-  sink('Camera.log')
-  genesets = c()
-  if (bigmt > "")
-  {
-    bigenesets = readLines(bigmt)
-    genesets = bigenesets
-  }
-  if (histgmt > "")
-  {
-    hgenesets = readLines(histgmt)
-    if (bigmt > "") {
-      genesets = rbind(genesets,hgenesets)
-    } else {
-      genesets = hgenesets
-    } # use only history if no bi
-  }
-  print.noquote(paste("@@@read",length(genesets), 'genesets from',histgmt,bigmt))
-  genesets = strsplit(genesets,'\t') # tabular. genesetid\tURLorwhatever\tgene_1\t..\tgene_n
-  outf = outfname
-  head=paste(myTitle,'edgeR GSEA')
-  write(head,file=outfname,append=F)
-  ntest=length(genesets)
-  urownames = toupper(rownames(y))
-  upcam = c()
-  downcam = c()
-  for (i in 1:ntest) {
-    gs = unlist(genesets[i])
-    g = gs[1] # geneset_id
-    u = gs[2]
-    if (u > "") { u = paste("<a href=\'",u,"\'>",u,"</a>",sep="") }
-    glist = gs[3:length(gs)] # member gene symbols
-    glist = toupper(glist)
-    inglist = urownames %in% glist
-    nin = sum(inglist)
-    if ((nin > minnin) && (nin < maxnin)) {
-      ### print(paste('@@found',sum(inglist),'genes in glist'))
-      camres = camera(y=y,index=inglist,design=design)
-      if (! is.null(camres)) {
-      rownames(camres) = g # gene set name
-      camres = cbind(GeneSet=g,URL=u,camres)
-      if (camres\$Direction == "Up") 
-        {
-        upcam = rbind(upcam,camres) } else {
-          downcam = rbind(downcam,camres)
-        }
-      }
-   }
-  }
-  uscam = upcam[order(upcam\$PValue),]
-  unadjp = uscam\$PValue
-  uscam\$adjPValue = p.adjust(unadjp,method=fdrtype)
-  nup = max(10,sum((uscam\$adjPValue < fdrthresh)))
-  dscam = downcam[order(downcam\$PValue),]
-  unadjp = dscam\$PValue
-  dscam\$adjPValue = p.adjust(unadjp,method=fdrtype)
-  ndown = max(10,sum((dscam\$adjPValue < fdrthresh)))
-  write.table(uscam,file=paste('camera_up',outfname,sep='_'),quote=F,sep='\t',row.names=F)
-  write.table(dscam,file=paste('camera_down',outfname,sep='_'),quote=F,sep='\t',row.names=F)
-  print.noquote(paste('@@@@@ Camera up top',nup,'gene sets:'))
-  write.table(head(uscam,nup),file="",quote=F,sep='\t',row.names=F)
-  print.noquote(paste('@@@@@ Camera down top',ndown,'gene sets:'))
-  write.table(head(dscam,ndown),file="",quote=F,sep='\t',row.names=F)
-  sink()
-}
-
-
-
-
-doGSEA = function(y=NULL,design=NULL,histgmt="",
-                  bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt",
-                  ntest=0, myTitle="myTitle", outfname="GSEA.xls", minnin=5, maxnin=2000,fdrthresh=0.05,fdrtype="BH")
-{
-  sink('Camera.log')
-  genesets = c()
-  if (bigmt > "")
-  {
-    bigenesets = readLines(bigmt)
-    genesets = bigenesets
-  }
-  if (histgmt > "")
-  {
-    hgenesets = readLines(histgmt)
-    if (bigmt > "") {
-      genesets = rbind(genesets,hgenesets)
-    } else {
-      genesets = hgenesets
-    } # use only history if no bi
-  }
-  print.noquote(paste("@@@read",length(genesets), 'genesets from',histgmt,bigmt))
-  genesets = strsplit(genesets,'\t') # tabular. genesetid\tURLorwhatever\tgene_1\t..\tgene_n
-  outf = outfname
-  head=paste(myTitle,'edgeR GSEA')
-  write(head,file=outfname,append=F)
-  ntest=length(genesets)
-  urownames = toupper(rownames(y))
-  upcam = c()
-  downcam = c()
-  incam = c()
-  urls = c()
-  gsids = c()
-  for (i in 1:ntest) {
-    gs = unlist(genesets[i])
-    gsid = gs[1] # geneset_id
-    url = gs[2]
-    if (url > "") { url = paste("<a href=\'",url,"\'>",url,"</a>",sep="") }
-    glist = gs[3:length(gs)] # member gene symbols
-    glist = toupper(glist)
-    inglist = urownames %in% glist
-    nin = sum(inglist)
-    if ((nin > minnin) && (nin < maxnin)) {
-      incam = c(incam,inglist)
-      gsids = c(gsids,gsid)
-      urls = c(urls,url)
-    }
-  }
-  incam = as.list(incam)
-  names(incam) = gsids
-  allcam = camera(y=y,index=incam,design=design)
-  allcamres = cbind(geneset=gsids,allcam,URL=urls)
-  for (i in 1:ntest) {
-    camres = allcamres[i]
-    res = try(test = (camres\$Direction == "Up"))
-    if ("try-error" %in% class(res)) {
-      cat("test failed, camres = :")
-      print.noquote(camres)
-    } else  { if (camres\$Direction == "Up")
-    {  upcam = rbind(upcam,camres)
-    } else { downcam = rbind(downcam,camres)
-    }
-              
-    }
-  }
-  uscam = upcam[order(upcam\$PValue),]
-  unadjp = uscam\$PValue
-  uscam\$adjPValue = p.adjust(unadjp,method=fdrtype)
-  nup = max(10,sum((uscam\$adjPValue < fdrthresh)))
-  dscam = downcam[order(downcam\$PValue),]
-  unadjp = dscam\$PValue
-  dscam\$adjPValue = p.adjust(unadjp,method=fdrtype)
-  ndown = max(10,sum((dscam\$adjPValue < fdrthresh)))
-  write.table(uscam,file=paste('camera_up',outfname,sep='_'),quote=F,sep='\t',row.names=F)
-  write.table(dscam,file=paste('camera_down',outfname,sep='_'),quote=F,sep='\t',row.names=F)
-  print.noquote(paste('@@@@@ Camera up top',nup,'gene sets:'))
-  write.table(head(uscam,nup),file="",quote=F,sep='\t',row.names=F)
-  print.noquote(paste('@@@@@ Camera down top',ndown,'gene sets:'))
-  write.table(head(dscam,ndown),file="",quote=F,sep='\t',row.names=F)
-  sink()
-  }
- 
-
-edgeIt = function (Count_Matrix=c(),group=c(),out_edgeR=F,out_VOOM=F,out_DESeq2=F,fdrtype='fdr',priordf=5, 
-        fdrthresh=0.05,outputdir='.', myTitle='Differential Counts',libSize=c(),useNDF=F,
-        filterquantile=0.2, subjects=c(),mydesign=NULL,
-        doDESeq2=T,doVoom=T,doCamera=T,doedgeR=T,org='hg19',
-        histgmt="", bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt",
-        doCook=F,DESeq_fitType="parameteric") 
-{
-  # Error handling
-  if (length(unique(group))!=2){
-    print("Number of conditions identified in experiment does not equal 2")
-    q()
-    }
-  require(edgeR)
-  options(width = 512) 
-  mt = paste(unlist(strsplit(myTitle,'_')),collapse=" ")
-  allN = nrow(Count_Matrix)
-  nscut = round(ncol(Count_Matrix)/2)
-  colTotmillionreads = colSums(Count_Matrix)/1e6
-  counts.dataframe = as.data.frame(c()) 
-  rawrs = rowSums(Count_Matrix)
-  nonzerod = Count_Matrix[(rawrs > 0),] # remove all zero count genes
-  nzN = nrow(nonzerod)
-  nzrs = rowSums(nonzerod)
-  zN = allN - nzN
-  print('# Quantiles for non-zero row counts:',quote=F)
-  print(quantile(nzrs,probs=seq(0,1,0.1)),quote=F)
-  if (useNDF == T)
-  {
-    gt1rpin3 = rowSums(Count_Matrix/expandAsMatrix(colTotmillionreads,dim(Count_Matrix)) >= 1) >= nscut
-    lo = colSums(Count_Matrix[!gt1rpin3,])
-    workCM = Count_Matrix[gt1rpin3,]
-    cleanrs = rowSums(workCM)
-    cleanN = length(cleanrs)
-    meth = paste( "After removing",length(lo),"contigs with fewer than ",nscut," sample read counts >= 1 per million, there are",sep="")
-    print(paste("Read",allN,"contigs. Removed",zN,"contigs with no reads.",meth,cleanN,"contigs"),quote=F)
-    maint = paste('Filter >=1/million reads in >=',nscut,'samples')
-  }   else {        
-    useme = (nzrs > quantile(nzrs,filterquantile))
-    workCM = nonzerod[useme,]
-    lo = colSums(nonzerod[!useme,])
-    cleanrs = rowSums(workCM)
-    cleanN = length(cleanrs)
-    meth = paste("After filtering at count quantile =",filterquantile,", there are",sep="")
-    print(paste('Read',allN,"contigs. Removed",zN,"with no reads.",meth,cleanN,"contigs"),quote=F)
-    maint = paste('Filter below',filterquantile,'quantile')
-  }
-  cumPlot(rawrs=rawrs,cleanrs=cleanrs,maint=maint,myTitle=myTitle)
-  allgenes = rownames(workCM)
-  reg = "^chr([0-9]+):([0-9]+)-([0-9]+)"
-  genecards="<a href=\'http://www.genecards.org/index.php?path=/Search/keyword/"
-  ucsc = paste("<a href=\'http://genome.ucsc.edu/cgi-bin/hgTracks?db=",org,sep='')
-  testreg = str_match(allgenes,reg)
-  if (sum(!is.na(testreg[,1]))/length(testreg[,1]) > 0.8) # is ucsc style string
-  {
-    print("@@ using ucsc substitution for urls")
-    contigurls = paste0(ucsc,"&position=chr",testreg[,2],":",testreg[,3],"-",testreg[,4],"\'>",allgenes,"</a>")
-  } else {
-    print("@@ using genecards substitution for urls")
-    contigurls = paste0(genecards,allgenes,"\'>",allgenes,"</a>")
-  }
-  print.noquote("# urls")
-  print.noquote(head(contigurls))
-  print(paste("# Total low count contigs per sample = ",paste(lo,collapse=',')),quote=F) 
-  cmrowsums = rowSums(workCM)
-  TName=unique(group)[1]
-  CName=unique(group)[2]
-  if (is.null(mydesign)) {
-    if (length(subjects) == 0) 
-    {
-      mydesign = model.matrix(~group)
-    } 
-    else { 
-      subjf = factor(subjects)
-      mydesign = model.matrix(~subjf+group) # we block on subject so make group last to simplify finding it
-    }
-  } 
-  print.noquote(paste('Using samples:',paste(colnames(workCM),collapse=',')))
-  print.noquote('Using design matrix:')
-  print.noquote(mydesign)
-  if (doedgeR) {
-  sink('edgeR.log')
-  #### Setup DGEList object
-  DGEList = DGEList(counts=workCM, group = group)
-  DGEList = calcNormFactors(DGEList)
-
-  DGEList = estimateGLMCommonDisp(DGEList,mydesign)
-  comdisp = DGEList\$common.dispersion
-  DGEList = estimateGLMTrendedDisp(DGEList,mydesign)
-  if (edgeR_priordf > 0) {
-    print.noquote(paste("prior.df =",edgeR_priordf))
-    DGEList = estimateGLMTagwiseDisp(DGEList,mydesign,prior.df = edgeR_priordf)
-  } else {
-    DGEList = estimateGLMTagwiseDisp(DGEList,mydesign)
-  }
-  DGLM = glmFit(DGEList,design=mydesign)
-  DE = glmLRT(DGLM,coef=ncol(DGLM\$design)) # always last one - subject is first if needed
-  efflib = DGEList\$samples\$lib.size*DGEList\$samples\$norm.factors
-  normData = (1e+06*DGEList\$counts/efflib)
-  uoutput = cbind( 
-    Name=as.character(rownames(DGEList\$counts)),
-    DE\$table,
-    adj.p.value=p.adjust(DE\$table\$PValue, method=fdrtype),
-    Dispersion=DGEList\$tagwise.dispersion,totreads=cmrowsums,normData,
-    DGEList\$counts
-  )
-  soutput = uoutput[order(DE\$table\$PValue),] # sorted into p value order - for quick toptable
-  goodness = gof(DGLM, pcutoff=fdrthresh)
-  if (sum(goodness\$outlier) > 0) {
-    print.noquote('GLM outliers:')
-    print(paste(rownames(DGLM)[(goodness\$outlier)],collapse=','),quote=F)
-    } else { 
-      print('No GLM fit outlier genes found\n')
-    }
-  z = limma::zscoreGamma(goodness\$gof.statistic, shape=goodness\$df/2, scale=2)
-  pdf("edgeR_GoodnessofFit.pdf")
-  qq = qqnorm(z, panel.first=grid(), main="tagwise dispersion")
-  abline(0,1,lwd=3)
-  points(qq\$x[goodness\$outlier],qq\$y[goodness\$outlier], pch=16, col="maroon")
-  dev.off()
-  estpriorn = getPriorN(DGEList)
-  print(paste("Common Dispersion =",comdisp,"CV = ",sqrt(comdisp),"getPriorN = ",estpriorn),quote=F)
-  efflib = DGEList\$samples\$lib.size*DGEList\$samples\$norm.factors
-  normData = (1e+06*DGEList\$counts/efflib)
-  uniqueg = unique(group)
-  #### Plot MDS
-  sample_colors =  match(group,levels(group))
-  sampleTypes = levels(factor(group))
-  print.noquote(sampleTypes)
-  pdf("edgeR_MDSplot.pdf")
-  plotMDS.DGEList(DGEList,main=paste("edgeR MDS for",myTitle),cex=0.5,col=sample_colors,pch=sample_colors)
-  legend(x="topleft", legend = sampleTypes,col=c(1:length(sampleTypes)), pch=19)
-  grid(col="blue")
-  dev.off()
-  colnames(normData) = paste( colnames(normData),'N',sep="_")
-  print(paste('Raw sample read totals',paste(colSums(nonzerod,na.rm=T),collapse=',')))
-  nzd = data.frame(log(nonzerod + 1e-2,10))
-  try( boxPlot(rawrs=nzd,cleanrs=log(normData,10),maint='TMM Normalisation',myTitle=myTitle,pdfname="edgeR_raw_norm_counts_box.pdf") )
-  write.table(soutput,file=out_edgeR, quote=FALSE, sep="\t",row.names=F)
-  tt = cbind( 
-    Name=as.character(rownames(DGEList\$counts)),
-    DE\$table,
-    adj.p.value=p.adjust(DE\$table\$PValue, method=fdrtype),
-    Dispersion=DGEList\$tagwise.dispersion,totreads=cmrowsums
-  )
-  print.noquote("# edgeR Top tags\n")
-  tt = cbind(tt,URL=contigurls) # add to end so table isn't laid out strangely
-  tt = tt[order(DE\$table\$PValue),] 
-  print.noquote(tt[1:50,])
-  deTags = rownames(uoutput[uoutput\$adj.p.value < fdrthresh,])
-  nsig = length(deTags)
-  print(paste('#',nsig,'tags significant at adj p=',fdrthresh),quote=F)
-  deColours = ifelse(deTags,'red','black')
-  pdf("edgeR_BCV_vs_abundance.pdf")
-  plotBCV(DGEList, cex=0.3, main="Biological CV vs abundance")
-  dev.off()
-  dg = DGEList[order(DE\$table\$PValue),]
-  #normData = (1e+06 * dg\$counts/expandAsMatrix(dg\$samples\$lib.size, dim(dg)))
-  efflib = dg\$samples\$lib.size*dg\$samples\$norm.factors
-  normData = (1e+06*dg\$counts/efflib)
-  outpdfname="edgeR_top_100_heatmap.pdf"
-  hmap2(normData,nsamp=100,TName=TName,group=group,outpdfname=outpdfname,myTitle=paste('edgeR Heatmap',myTitle))
-  outSmear = "edgeR_smearplot.pdf"
-  outMain = paste("Smear Plot for ",TName,' Vs ',CName,' (FDR@',fdrthresh,' N = ',nsig,')',sep='')
-  smearPlot(DGEList=DGEList,deTags=deTags, outSmear=outSmear, outMain = outMain)
-  qqPlot(descr=paste(myTitle,'edgeR adj p QQ plot'),pvector=tt\$adj.p.value,outpdf='edgeR_qqplot.pdf')
-  norm.factor = DGEList\$samples\$norm.factors
-  topresults.edgeR = soutput[which(soutput\$adj.p.value < fdrthresh), ]
-  edgeRcountsindex = which(allgenes %in% rownames(topresults.edgeR))
-  edgeRcounts = rep(0, length(allgenes))
-  edgeRcounts[edgeRcountsindex] = 1  # Create venn diagram of hits
-  sink()
-  } ### doedgeR
-  if (doDESeq2 == T)
-  {
-    sink("DESeq2.log")
-    # DESeq2
-    require('DESeq2')
-    library('RColorBrewer')
-    if (length(subjects) == 0)
-        {
-        pdata = data.frame(Name=colnames(workCM),Rx=group,row.names=colnames(workCM))
-        deSEQds = DESeqDataSetFromMatrix(countData = workCM,  colData = pdata, design = formula(~ Rx))
-        } else {
-        pdata = data.frame(Name=colnames(workCM),Rx=group,subjects=subjects,row.names=colnames(workCM))
-        deSEQds = DESeqDataSetFromMatrix(countData = workCM,  colData = pdata, design = formula(~ subjects + Rx))
-        }
-    #DESeq2 = DESeq(deSEQds,fitType='local',pAdjustMethod=fdrtype)
-    #rDESeq = results(DESeq2)
-    #newCountDataSet(workCM, group)
-    deSeqDatsizefac = estimateSizeFactors(deSEQds)
-    deSeqDatdisp = estimateDispersions(deSeqDatsizefac,fitType=DESeq_fitType)
-    resDESeq = nbinomWaldTest(deSeqDatdisp, pAdjustMethod=fdrtype)
-    rDESeq = as.data.frame(results(resDESeq))
-    rDESeq = cbind(Contig=rownames(workCM),rDESeq,NReads=cmrowsums,URL=contigurls)
-    srDESeq = rDESeq[order(rDESeq\$pvalue),]
-    qqPlot(descr=paste(myTitle,'DESeq2 adj p qq plot'),pvector=rDESeq\$padj,outpdf='DESeq2_qqplot.pdf')
-    cat("# DESeq top 50\n")
-    print.noquote(srDESeq[1:50,])
-    write.table(srDESeq,file=out_DESeq2, quote=FALSE, sep="\t",row.names=F)
-    topresults.DESeq = rDESeq[which(rDESeq\$padj < fdrthresh), ]
-    DESeqcountsindex = which(allgenes %in% rownames(topresults.DESeq))
-    DESeqcounts = rep(0, length(allgenes))
-    DESeqcounts[DESeqcountsindex] = 1
-    pdf("DESeq2_dispersion_estimates.pdf")
-    plotDispEsts(resDESeq)
-    dev.off()
-    ysmall = abs(min(rDESeq\$log2FoldChange))
-    ybig = abs(max(rDESeq\$log2FoldChange))
-    ylimit = min(4,ysmall,ybig)
-    pdf("DESeq2_MA_plot.pdf")
-    plotMA(resDESeq,main=paste(myTitle,"DESeq2 MA plot"),ylim=c(-ylimit,ylimit))
-    dev.off()
-    rlogres = rlogTransformation(resDESeq)
-    sampledists = dist( t( assay(rlogres) ) )
-    sdmat = as.matrix(sampledists)
-    pdf("DESeq2_sample_distance_plot.pdf")
-    heatmap.2(sdmat,trace="none",main=paste(myTitle,"DESeq2 sample distances"),
-         col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255))
-    dev.off()
-    ###outpdfname="DESeq2_top50_heatmap.pdf"
-    ###hmap2(sresDESeq,nsamp=50,TName=TName,group=group,outpdfname=outpdfname,myTitle=paste('DESeq2 vst rlog Heatmap',myTitle))
-    sink()
-    result = try( (ppca = plotPCA( varianceStabilizingTransformation(deSeqDatdisp,blind=T), intgroup=c("Rx","Name")) ) )
-    if ("try-error" %in% class(result)) {
-         print.noquote('DESeq2 plotPCA failed.')
-         } else {
-         pdf("DESeq2_PCA_plot.pdf")
-         #### wtf - print? Seems needed to get this to work
-         print(ppca)
-         dev.off()
-        }
-  }
-
-  if (doVoom == T) {
-      sink('VOOM.log')
-      if (doedgeR == F) {
-         #### Setup DGEList object
-         DGEList = DGEList(counts=workCM, group = group)
-         DGEList = calcNormFactors(DGEList)
-         DGEList = estimateGLMCommonDisp(DGEList,mydesign)
-         DGEList = estimateGLMTrendedDisp(DGEList,mydesign)
-         DGEList = estimateGLMTagwiseDisp(DGEList,mydesign)
-         DGEList = estimateGLMTagwiseDisp(DGEList,mydesign)
-         norm.factor = DGEList\$samples\$norm.factors
-         }
-      pdf("VOOM_mean_variance_plot.pdf")
-      dat.voomed = voom(DGEList, mydesign, plot = TRUE, lib.size = colSums(workCM) * norm.factor)
-      dev.off()
-      # Use limma to fit data
-      fit = lmFit(dat.voomed, mydesign)
-      fit = eBayes(fit)
-      rvoom = topTable(fit, coef = length(colnames(mydesign)), adj = fdrtype, n = Inf, sort="none")
-      qqPlot(descr=paste(myTitle,'VOOM-limma adj p QQ plot'),pvector=rvoom\$adj.P.Val,outpdf='VOOM_qqplot.pdf')
-      rownames(rvoom) = rownames(workCM)
-      rvoom = cbind(rvoom,NReads=cmrowsums,URL=contigurls)
-      srvoom = rvoom[order(rvoom\$P.Value),]
-      cat("# VOOM top 50\n")
-      print(srvoom[1:50,])
-      write.table(srvoom,file=out_VOOM, quote=FALSE, sep="\t",row.names=F)
-      # Use an FDR cutoff to find interesting samples for edgeR, DESeq and voom/limma
-      topresults.voom = rvoom[which(rvoom\$adj.P.Val < fdrthresh), ]
-      voomcountsindex = which(allgenes %in% topresults.voom\$ID)
-      voomcounts = rep(0, length(allgenes))
-      voomcounts[voomcountsindex] = 1
-      sink()
-  }
-
-  if (doCamera) {
-  doGSEA(y=DGEList,design=mydesign,histgmt=histgmt,bigmt=bigmt,ntest=20,myTitle=myTitle,
-    outfname=paste(mt,"GSEA.xls",sep="_"),fdrthresh=fdrthresh,fdrtype=fdrtype)
-  }
-
-  if ((doDESeq2==T) || (doVoom==T) || (doedgeR==T)) {
-    if ((doVoom==T) && (doDESeq2==T) && (doedgeR==T)) {
-        vennmain = paste(mt,'Voom,edgeR and DESeq2 overlap at FDR=',fdrthresh)
-        counts.dataframe = data.frame(edgeR = edgeRcounts, DESeq2 = DESeqcounts, 
-                                       VOOM_limma = voomcounts, row.names = allgenes)
-       } else if ((doDESeq2==T) && (doedgeR==T))  {
-         vennmain = paste(mt,'DESeq2 and edgeR overlap at FDR=',fdrthresh)
-         counts.dataframe = data.frame(edgeR = edgeRcounts, DESeq2 = DESeqcounts, row.names = allgenes)
-       } else if ((doVoom==T) && (doedgeR==T)) {
-        vennmain = paste(mt,'Voom and edgeR overlap at FDR=',fdrthresh)
-        counts.dataframe = data.frame(edgeR = edgeRcounts, VOOM_limma = voomcounts, row.names = allgenes)
-       }
-    
-    if (nrow(counts.dataframe > 1)) {
-      counts.venn = vennCounts(counts.dataframe)
-      vennf = "Venn_significant_genes_overlap.pdf" 
-      pdf(vennf)
-      vennDiagram(counts.venn,main=vennmain,col="maroon")
-      dev.off()
-    }
-  } #### doDESeq2 or doVoom
-  
-}
-#### Done
-
-###sink(stdout(),append=T,type="message")
-builtin_gmt = ""
-history_gmt = ""
-history_gmt_name = ""
-out_edgeR = F
-out_DESeq2 = F
-out_VOOM = "$out_VOOM"
-doDESeq2 = $DESeq2.doDESeq2 # make these T or F
-doVoom = $doVoom
-doCamera = $camera.doCamera
-doedgeR = $edgeR.doedgeR
-edgeR_priordf = 0
-
-
-#if $doVoom == "T":
-  out_VOOM = "$out_VOOM"
-#end if
-
-#if $DESeq2.doDESeq2 == "T":
-  out_DESeq2 = "$out_DESeq2"
-  DESeq_fitType = "$DESeq2.DESeq_fitType"
-#end if
-
-#if $edgeR.doedgeR == "T":
-  out_edgeR = "$out_edgeR"
-  edgeR_priordf = $edgeR.edgeR_priordf  
-#end if
-
-#if $camera.doCamera == 'T'
- #if $camera.gmtSource.refgmtSource == "indexed" or $camera.gmtSource.refgmtSource == "both":
-  builtin_gmt = "${camera.gmtSource.builtinGMT.fields.path}"
- #end if
- #if $camera.gmtSource.refgmtSource == "history" or $camera.gmtSource.refgmtSource == "both":
-    history_gmt = "${camera.gmtSource.ownGMT}"
-    history_gmt_name = "${camera.gmtSource.ownGMT.name}"
-  #end if
-#end if
-
-
-if (sum(c(doedgeR,doVoom,doDESeq2)) == 0)
-{
-write("No methods chosen - nothing to do! Please try again after choosing one or more methods", stderr())
-quit(save="no",status=2)
-}
-
-Out_Dir = "$html_file.files_path"
-Input =  "$input1"
-TreatmentName = "$treatment_name"
-TreatmentCols = "$Treat_cols"
-ControlName = "$control_name"
-ControlCols= "$Control_cols"
-org = "$input1.dbkey"
-if (org == "") { org = "hg19"}
-fdrtype = "$fdrtype"
-fdrthresh = $fdrthresh
-useNDF = $useNDF
-fQ = $fQ # non-differential centile cutoff
-myTitle = "$title"
-sids = strsplit("$subjectids",',')
-subjects = unlist(sids)
-nsubj = length(subjects)
-TCols = as.numeric(strsplit(TreatmentCols,",")[[1]])-1
-CCols = as.numeric(strsplit(ControlCols,",")[[1]])-1 
-cat('Got TCols=')
-cat(TCols)
-cat('; CCols=')
-cat(CCols)
-cat('\n')
-useCols = c(TCols,CCols)
-if (file.exists(Out_Dir) == F) dir.create(Out_Dir)
-Count_Matrix = read.table(Input,header=T,row.names=1,sep='\t') #Load tab file assume header
-snames = colnames(Count_Matrix)
-nsamples = length(snames)
-if (nsubj >  0 & nsubj != nsamples) {
-options("show.error.messages"=T)
-mess = paste('Fatal error: Supplied subject id list',paste(subjects,collapse=','),
-   'has length',nsubj,'but there are',nsamples,'samples',paste(snames,collapse=','))
-write(mess, stderr())
-quit(save="no",status=4)
-}
-if (length(subjects) != 0) {subjects = subjects[useCols]}
-Count_Matrix = Count_Matrix[,useCols] ### reorder columns
-rn = rownames(Count_Matrix)
-islib = rn %in% c('librarySize','NotInBedRegions')
-LibSizes = Count_Matrix[subset(rn,islib),][1] # take first
-Count_Matrix = Count_Matrix[subset(rn,! islib),]
-group = c(rep(TreatmentName,length(TCols)), rep(ControlName,length(CCols)) )             #Build a group descriptor
-group = factor(group, levels=c(ControlName,TreatmentName))
-colnames(Count_Matrix) = paste(group,colnames(Count_Matrix),sep="_")                   #Relable columns
-results = edgeIt(Count_Matrix=Count_Matrix,group=group, out_edgeR=out_edgeR, out_VOOM=out_VOOM, out_DESeq2=out_DESeq2,
-                 fdrtype='BH',mydesign=NULL,priordf=edgeR_priordf,fdrthresh=fdrthresh,outputdir='.',
-                 myTitle=myTitle,useNDF=F,libSize=c(),filterquantile=fQ,subjects=subjects,
-                 doDESeq2=doDESeq2,doVoom=doVoom,doCamera=doCamera,doedgeR=doedgeR,org=org,
-                 histgmt=history_gmt,bigmt=builtin_gmt,DESeq_fitType=DESeq_fitType)
-sessionInfo()
-]]>
-</configfile>
-</configfiles>
-<help>
-
-**What it does**
-
-Allows short read sequence counts from controlled experiments to be analysed for differentially expressed genes.
-Optionally adds a term for subject if not all samples are independent or if some other factor needs to be blocked in the design.
-
-**Input**
-
-Requires a count matrix as a tabular file. These are best made using the companion HTSeq_ based counter Galaxy wrapper
-and your fave gene model to generate inputs. Each row is a genomic feature (gene or exon eg) and each column the 
-non-negative integer count of reads from one sample overlapping the feature. 
-The matrix must have a header row uniquely identifying the source samples, and unique row names in 
-the first column. Typically the row names are gene symbols or probe ids for downstream use in GSEA and other methods.
-
-**Specifying comparisons**
-
-This is basically dumbed down for two factors - case vs control.
-
-More complex interfaces are possible but painful at present. 
-Probably need to specify a phenotype file to do this better.
-Work in progress. Send code.
-
-If you have (eg) paired samples and wish to include a term in the GLM to account for some other factor (subject in the case of paired samples),
-put a comma separated list of indicators for every sample (whether modelled or not!) indicating (eg) the subject number or 
-A list of integers, one for each subject or an empty string if samples are all independent.
-If not empty, there must be exactly as many integers in the supplied integer list as there are columns (samples) in the count matrix.
-Integers for samples that are not in the analysis *must* be present in the string as filler even if not used.
-
-So if you have 2 pairs out of 6 samples, you need to put in unique integers for the unpaired ones
-eg if you had 6 samples with the first two independent but the second and third pairs each being from independent subjects. you might use
-8,9,1,1,2,2 
-as subject IDs to indicate two paired samples from the same subject in columns 3/4 and 5/6
-
-**Methods available**
-
-You can run 3 popular Bioconductor packages available for count data.
-
-edgeR - see edgeR_ for details
-
-VOOM/limma - see limma_VOOM_ for details
-
-DESeq2 - see DESeq2_ for details
-
-and optionally camera in edgeR which works better if MSigDB is installed.
-
-**Outputs**
-
-Some helpful plots and analysis results. Note that most of these are produced using R code 
-suggested by the excellent documentation and vignettes for the Bioconductor
-packages invoked. The Tool Factory is used to automatically lay these out for you to enjoy.
-
-**Note on Voom**
-
-The voom from limma version 3.16.6 help in R includes this from the authors - but you should read the paper to interpret this method.
-
-This function is intended to process RNA-Seq or ChIP-Seq data prior to linear modelling in limma.
-
-voom is an acronym for mean-variance modelling at the observational level.
-The key concern is to estimate the mean-variance relationship in the data, then use this to compute appropriate weights for each observation.
-Count data almost show non-trivial mean-variance relationships. Raw counts show increasing variance with increasing count size, while log-counts typically show a decreasing mean-variance trend.
-This function estimates the mean-variance trend for log-counts, then assigns a weight to each observation based on its predicted variance.
-The weights are then used in the linear modelling process to adjust for heteroscedasticity.
-
-In an experiment, a count value is observed for each tag in each sample. A tag-wise mean-variance trend is computed using lowess.
-The tag-wise mean is the mean log2 count with an offset of 0.5, across samples for a given tag.
-The tag-wise variance is the quarter-root-variance of normalized log2 counts per million values with an offset of 0.5, across samples for a given tag.
-Tags with zero counts across all samples are not included in the lowess fit. Optional normalization is performed using normalizeBetweenArrays.
-Using fitted values of log2 counts from a linear model fit by lmFit, variances from the mean-variance trend were interpolated for each observation.
-This was carried out by approxfun. Inverse variance weights can be used to correct for mean-variance trend in the count data.
-
-
-Author(s)
-
-Charity Law and Gordon Smyth
-
-References
-
-Law, CW (2013). Precision weights for gene expression analysis. PhD Thesis. University of Melbourne, Australia.
-
-Law, CW, Chen, Y, Shi, W, Smyth, GK (2013). Voom! Precision weights unlock linear model analysis tools for RNA-seq read counts.
-Technical Report 1 May 2013, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Reseach, Melbourne, Australia.
-http://www.statsci.org/smyth/pubs/VoomPreprint.pdf
-
-See Also
-
-A voom case study is given in the edgeR User's Guide.
-
-vooma is a similar function but for microarrays instead of RNA-seq.
-
-
-***old rant on changes to Bioconductor package variable names between versions***
-
-The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue) 
-breaking this and all other code that assumed the old name for this variable, 
-between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing). 
-This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing 
-to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly
-when their old scripts break. This tool currently now works with 2.4.6.
-
-**Note on prior.N**
-
-http://seqanswers.com/forums/showthread.php?t=5591 says:
-
-*prior.n*
-
-The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion. 
-You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood 
-in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your 
-tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the 
-common likelihood the weight of one observation.
-
-In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value, 
-or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that 
-you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation 
-(squeezing) of the tagwise dispersions. How many samples do you have in your experiment? 
-What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10. 
-If you have more samples, then the tagwise dispersion estimates will be more reliable, 
-so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5. 
-
-
-From Bioconductor Digest, Vol 118, Issue 5, Gordon writes:
-
-Dear Dorota,
-
-The important settings are prior.df and trend.
-
-prior.n and prior.df are related through prior.df = prior.n * residual.df,
-and your experiment has residual.df = 36 - 12 = 24.  So the old setting of
-prior.n=10 is equivalent for your data to prior.df = 240, a very large
-value.  Going the other way, the new setting of prior.df=10 is equivalent
-to prior.n=10/24.
-
-To recover old results with the current software you would use
-
-  estimateTagwiseDisp(object, prior.df=240, trend="none")
-
-To get the new default from old software you would use
-
-  estimateTagwiseDisp(object, prior.n=10/24, trend=TRUE)
-
-Actually the old trend method is equivalent to trend="loess" in the new
-software. You should use plotBCV(object) to see whether a trend is
-required.
-
-Note you could also use
-
-  prior.n = getPriorN(object, prior.df=10)
-
-to map between prior.df and prior.n.
-
-----
-
-**Attributions**
-
-edgeR - edgeR_ 
-
-VOOM/limma - limma_VOOM_ 
-
-DESeq2 - DESeq2_ for details
-
-See above for Bioconductor package documentation for packages exposed in Galaxy by this tool and app store package.
-
-Galaxy_ (that's what you are using right now!) for gluing everything together 
-
-Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is 
-licensed to you under the LGPL_ like other rgenetics artefacts
-
-.. _LGPL: http://www.gnu.org/copyleft/lesser.html
-.. _HTSeq: http://www-huber.embl.de/users/anders/HTSeq/doc/index.html
-.. _edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html
-.. _DESeq2: http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
-.. _limma_VOOM: http://www.bioconductor.org/packages/release/bioc/html/limma.html
-.. _Galaxy: http://getgalaxy.org
-</help>
-
-</tool>
-
-

This diff is so big that we needed to truncate the remainder.

https://bitbucket.org/galaxy/galaxy-central/commits/2c281898882a/
Changeset:   2c281898882a
User:        fubar
Date:        2013-10-08 01:52:21
Summary:     backout -r 10800
Affected #:  4 files

diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 lib/galaxy/config.py
--- a/lib/galaxy/config.py
+++ b/lib/galaxy/config.py
@@ -109,7 +109,6 @@
         self.allow_user_creation = string_as_bool( kwargs.get( "allow_user_creation", "True" ) )
         self.allow_user_deletion = string_as_bool( kwargs.get( "allow_user_deletion", "False" ) )
         self.allow_user_dataset_purge = string_as_bool( kwargs.get( "allow_user_dataset_purge", "False" ) )
-        self.use_data_id_on_string = string_as_bool( kwargs.get( "use_data_id_on_string", "False" ) )
         self.allow_user_impersonation = string_as_bool( kwargs.get( "allow_user_impersonation", "False" ) )
         self.new_user_dataset_access_role_default_private = string_as_bool( kwargs.get( "new_user_dataset_access_role_default_private", "False" ) )
         self.collect_outputs_from = [ x.strip() for x in kwargs.get( 'collect_outputs_from', 'new_file_path,job_working_directory' ).lower().split(',') ]

diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 lib/galaxy/tools/actions/__init__.py
--- a/lib/galaxy/tools/actions/__init__.py
+++ b/lib/galaxy/tools/actions/__init__.py
@@ -1,6 +1,5 @@
 import os
 import galaxy.tools
-import re
 
 from galaxy.exceptions import ObjectInvalid
 from galaxy.model import LibraryDatasetDatasetAssociation
@@ -192,20 +191,14 @@
                 data = data.to_history_dataset_association( None )
                 inp_data[name] = data
             
-#            else: # HDA
-#                if data.hid:
-#                    input_names.append( 'data %s' % data.hid )
+            else: # HDA
+                if data.hid:
+                    input_names.append( 'data %s' % data.hid )
             input_ext = data.ext
             
             if data.dbkey not in [None, '?']:
                 input_dbkey = data.dbkey
-            data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name)
-            if not trans.app.config.use_data_id_on_string:
-                # we want names in our on_strings not numbers
-                input_names.append(data_name_sane)
-            else:
-                if data.hid:
-                    input_names.append('data %s' % data.hid)
+
         # Collect chromInfo dataset and add as parameters to incoming
         db_datasets = {}
         db_dataset = trans.db_dataset_for( input_dbkey )
@@ -239,14 +232,11 @@
         if len( input_names ) == 1:
             on_text = input_names[0]
         elif len( input_names ) == 2:
-            #on_text = '%s and %s' % tuple(input_names[0:2])
-            on_text = '%s_%s' % tuple(input_names[0:2])
+            on_text = '%s and %s' % tuple(input_names[0:2])
         elif len( input_names ) == 3:
-            #on_text = '%s, %s, and %s' % tuple(input_names[0:3])
-            on_text = '%s_%s_%s' % tuple(input_names[0:3])
+            on_text = '%s, %s, and %s' % tuple(input_names[0:3])
         elif len( input_names ) > 3:
-            #on_text = '%s, %s, and others' % tuple(input_names[0:2])
-            on_text = '%s_%s_and_others' % tuple(input_names[0:2])
+            on_text = '%s, %s, and others' % tuple(input_names[0:2])
         else:
             on_text = ""
         # Add the dbkey to the incoming parameters

diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 tools/ngs_rna/tophat2_wrapper.xml
--- a/tools/ngs_rna/tophat2_wrapper.xml
+++ b/tools/ngs_rna/tophat2_wrapper.xml
@@ -273,19 +273,19 @@
     </stdio><outputs>
-        <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out">
+        <data format="tabular" name="fusions" label="${on_string}_tophat2fus.xls" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data>
-        <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed">
+        <data format="bed" name="insertions" label="${on_string}_tophat2ins.bed" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed">
+        <data format="bed" name="deletions" label="${on_string}_tophhat2del.bed" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed">
+        <data format="bed" name="junctions" label="${on_string}tophat2sj.bed" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam">
+        <data format="bam" name="accepted_hits" label="${on_string}tophat2hits.bam" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs>

diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 tools/sr_mapping/bowtie2_wrapper.xml
--- a/tools/sr_mapping/bowtie2_wrapper.xml
+++ b/tools/sr_mapping/bowtie2_wrapper.xml
@@ -113,6 +113,8 @@
     </stdio><inputs>
+        <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" />
+
         <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?">
@@ -216,7 +218,7 @@
 
     <!-- define outputs --><outputs>
-        <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" >
+        <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_{jobtitle}_unaligL.fastq" ><filter>unaligned_file is True</filter><actions><action type="format">
@@ -224,7 +226,7 @@
                 </action></actions></data>
-        <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)">
+        <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_{jobtitle}_unaligR.fastq)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format">
@@ -232,7 +234,7 @@
                 </action></actions></data>
-        <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads">
+        <data format="bam" name="output" label="${tool.name} on ${on_string}_{jobtitle}_aligned.bam"><actions><conditional name="reference_genome.source"><when value="indexed">


https://bitbucket.org/galaxy/galaxy-central/commits/6a39fac59cc0/
Changeset:   6a39fac59cc0
User:        fubar
Date:        2013-10-08 01:53:45
Summary:     backout -r 10802
Affected #:  2 files

diff -r 2c281898882a3c35713166e489667fe084ce5413 -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d lib/galaxy/tools/parameters/output.py
--- a/lib/galaxy/tools/parameters/output.py
+++ b/lib/galaxy/tools/parameters/output.py
@@ -65,7 +65,7 @@
     def is_case( self, output_dataset, other_values ):
         ref = self.get_ref( output_dataset, other_values )
         return bool( str( ref ) == self.value )
-
+        
 class DatatypeIsInstanceToolOutputActionConditionalWhen( ToolOutputActionConditionalWhen ):
     tag = "when datatype_isinstance"
     def __init__( self, parent, config_elem, value ):
@@ -215,11 +215,12 @@
             self.offset = elem.get( 'offset', -1 )
             self.offset = int( self.offset )
         else:
+            self.options = []
             self.missing_tool_data_table_name = self.name
     def get_value( self, other_values ):
-        try:
+        if self.options:
             options = self.options
-        except:
+        else:
             options = []
         for filter in self.filters:
             options = filter.filter_options( options, other_values )
@@ -248,7 +249,7 @@
     def __init__( self, parent, elem ):
         super( FormatToolOutputAction, self ).__init__( parent, elem )
         self.default = elem.get( 'default', None )
-
+        
     def apply_action( self, output_dataset, other_values ):
         value = self.option.get_value( other_values )
         if value is None and self.default is not None:
@@ -431,7 +432,7 @@
         value = str( getattr( ref.metadata, self.name ) )
         rval = []
         for fields in options:
-            if self.keep == ( self.compare( fields[self.column], value ) ):
+            if self.keep == ( self.compare( fields[self.column], value ) ): 
                 rval.append( fields )
         return rval
 
@@ -452,7 +453,7 @@
                 value = self.cast( value )
             except:
                 value = False #unable to cast or access value; treat as false
-            if self.keep == bool( value ):
+            if self.keep == bool( value ): 
                 rval.append( fields )
         return rval
 
@@ -480,7 +481,7 @@
 option_types = {}
 for option_type in [ NullToolOutputActionOption, FromFileToolOutputActionOption, FromParamToolOutputActionOption, FromDataTableOutputActionOption ]:
     option_types[ option_type.tag ] = option_type
-
+    
 filter_types = {}
 for filter_type in [ ParamValueToolOutputActionOptionFilter, InsertColumnToolOutputActionOptionFilter, MultipleSplitterFilter, ColumnStripFilter, MetadataValueFilter, BooleanFilter, StringFunctionFilter, ColumnReplaceFilter ]:
     filter_types[ filter_type.tag ] = filter_type
@@ -524,5 +525,5 @@
 def compare_re_search( value1, value2 ):
     #checks pattern=value2 in value1
     return bool( re.search( value2, value1 ) )
-
+    
 compare_types = { 'eq':compare_eq, 'neq':compare_neq, 'gt':compare_gt, 'gte':compare_gte, 'lt':compare_lt, 'lte':compare_lte, 'in':compare_in, 'startswith':compare_startswith, 'endswith':compare_endswith, "re_search":compare_re_search }

diff -r 2c281898882a3c35713166e489667fe084ce5413 -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d tool-data/bowtie2_indices.loc.sample
--- a/tool-data/bowtie2_indices.loc.sample
+++ b/tool-data/bowtie2_indices.loc.sample
@@ -1,37 +1,37 @@
-#This is a sample file distributed with Galaxy that enables tools
-#to use a directory of Bowtie2 indexed sequences data files. You will
-#need to create these data files and then create a bowtie_indices.loc
-#file similar to this one (store it in this directory) that points to
-#the directories in which those files are stored. The bowtie2_indices.loc
-#file has this format (longer white space characters are TAB characters):
+# bowtie2_indices.loc.sample
+# This is a *.loc.sample file distributed with Galaxy that enables tools
+# to use a directory of indexed data files. This one is for Bowtie2 and Tophat2.
+# See the wiki: http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup
+# First create these data files and save them in your own data directory structure.
+# Then, create a bowtie_indices.loc file to use those indexes with tools.
+# Copy this file, save it with the same name (minus the .sample), 
+# follow the format examples, and store the result in this directory.
+# The file should include an one line entry for each index set.
+# The path points to the "basename" for the set, not a specific file.
+# It has four text columns seperated by TABS.
 #
-#<unique_build_id><dbkey><display_name><file_base_path>
+# <unique_build_id><dbkey><display_name><file_base_path>
 #
-#So, for example, if you had hg18 indexed stored in
-#/depot/data2/galaxy/bowtie2/hg18/,
-#then the bowtie2_indices.loc entry would look like this:
+# So, for example, if you had hg18 indexes stored in:
 #
-#hg18	hg18	hg18	/depot/data2/galaxy/bowtie2/hg18/hg18
+#    /depot/data2/galaxy/hg19/bowtie2/
 #
-#and your /depot/data2/galaxy/bowtie2/hg18/ directory
-#would contain hg18.*.ebwt files:
+# containing hg19 genome and hg19.*.bt2 files, such as:
+#    -rw-rw-r-- 1 james   james   914M Feb 10 18:56 hg19canon.fa
+#    -rw-rw-r-- 1 james   james   914M Feb 10 18:56 hg19canon.1.bt2
+#    -rw-rw-r-- 1 james   james   683M Feb 10 18:56 hg19canon.2.bt2
+#    -rw-rw-r-- 1 james   james   3.3K Feb 10 16:54 hg19canon.3.bt2
+#    -rw-rw-r-- 1 james   james   683M Feb 10 16:54 hg19canon.4.bt2
+#    -rw-rw-r-- 1 james   james   914M Feb 10 20:45 hg19canon.rev.1.bt2
+#    -rw-rw-r-- 1 james   james   683M Feb 10 20:45 hg19canon.rev.2.bt2
 #
-#-rw-r--r--  1 james    universe 830134 2005-09-13 10:12 hg18.1.ebwt
-#-rw-r--r--  1 james    universe 527388 2005-09-13 10:12 hg18.2.ebwt
-#-rw-r--r--  1 james    universe 269808 2005-09-13 10:12 hg18.3.ebwt
-#...etc...
+# then the bowtie2_indices.loc entry could look like this:
 #
-#Your bowtie2_indices.loc file should include an entry per line for each
-#index set you have stored. The "file" in the path does not actually
-#exist, but it is the prefix for the actual index files. For example:
+#hg19	hg19	Human (hg19)	/depot/data2/galaxy/hg19/bowtie2/hg19canon
 #
-#hg18canon			hg18	hg18 Canonical	/depot/data2/galaxy/bowtie2/hg18/hg18canon
-#hg18full			hg18	hg18 Full		/depot/data2/galaxy/bowtie2/hg18/hg18full
-#/orig/path/hg19	hg19	hg19			/depot/data2/galaxy/bowtie2/hg19/hg19
-#...etc...
+#More examples:
 #
-#Note that for backwards compatibility with workflows, the unique ID of
-#an entry must be the path that was in the original loc file, because that
-#is the value stored in the workflow for that parameter. That is why the
-#hg19 entry above looks odd. New genomes can be better-looking.
+#mm10	mm10	Mouse (mm10)	/depot/data2/galaxy/mm10/bowtie2/mm10
+#dm3	dm3		D. melanogaster (dm3)	/depot/data2/galaxy/mm10/bowtie2/dm3
 #
+#


https://bitbucket.org/galaxy/galaxy-central/commits/99be46d34966/
Changeset:   99be46d34966
User:        fubar
Date:        2013-10-08 01:54:45
Summary:     backout -r 10806
Affected #:  1 file

diff -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d -r 99be46d34966236412d0cf69198143ce71a8958d tools/sr_mapping/bowtie2_wrapper.xml
--- a/tools/sr_mapping/bowtie2_wrapper.xml
+++ b/tools/sr_mapping/bowtie2_wrapper.xml
@@ -87,23 +87,22 @@
         ## read group information
         #if str($read_group.selection) == "yes":
             #if $read_group.rgid and $read_group.rglb and $read_group.rgpl and $read_group.rgsm:
-                --rg-id $read_group.rgid
-                --rg LB:$read_group.rglb
-                --rg PL:$read_group.rgpl
-                --rg SM:$read_group.rgsm
+                --rg-id "$read_group.rgid"
+                --rg "LB:$read_group.rglb"
+                --rg "PL:$read_group.rgpl"
+                --rg "SM:$read_group.rgsm"
             #end if
         #end if
         
         ## view/sort and output file
-        | samtools view -Su - | samtools sort -o - - > $output;
+        | samtools view -Su - | samtools sort -o - - > $output
         
         ## rename unaligned sequence files
         #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r:
             #set left  = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' )
             #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' )
-        
-            mv $left $output_unaligned_reads_l;
-            mv $right $output_unaligned_reads_r;
+            ;mv $left $output_unaligned_reads_l;
+            mv $right $output_unaligned_reads_r
         #end if
     </command>
     
@@ -218,7 +217,7 @@
 
     <!-- define outputs --><outputs>
-        <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_{jobtitle}_unaligL.fastq" >
+        <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" ><filter>unaligned_file is True</filter><actions><action type="format">
@@ -226,7 +225,7 @@
                 </action></actions></data>
-        <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_{jobtitle}_unaligR.fastq)">
+        <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format">
@@ -234,7 +233,7 @@
                 </action></actions></data>
-        <data format="bam" name="output" label="${tool.name} on ${on_string}_{jobtitle}_aligned.bam">
+        <data format="bam" name="output" label="${tool.name} on ${on_string}_${jobtitle}_aligned.bam"><actions><conditional name="reference_genome.source"><when value="indexed">
@@ -256,6 +255,17 @@
     </outputs><tests>
+        <test>
+            <!-- basic test on single paired default run -->
+            <param name="type" value="single"/>
+            <param name="selection" value="no"/>
+            <param name="full" value="no"/>
+            <param name="unaligned_file" value="false"/>
+            <param name="source" value="history" />
+            <param name="input_1" value="bowtie2/phix_reads.fastq" ftype="fastqsanger"/>
+            <param name="own_file" value="bowtie2/phix_genome.fasta" />
+            <output name="output" file="bowtie2/phix_mapped.bam" />
+        </test></tests><help>


https://bitbucket.org/galaxy/galaxy-central/commits/99404250db29/
Changeset:   99404250db29
User:        fubar
Date:        2013-10-08 01:55:24
Summary:     backout -r 10808
Affected #:  3 files

diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a lib/galaxy/tools/actions/__init__.py
--- a/lib/galaxy/tools/actions/__init__.py
+++ b/lib/galaxy/tools/actions/__init__.py
@@ -1,5 +1,6 @@
 import os
 import galaxy.tools
+import re
 
 from galaxy.exceptions import ObjectInvalid
 from galaxy.model import LibraryDatasetDatasetAssociation
@@ -191,14 +192,20 @@
                 data = data.to_history_dataset_association( None )
                 inp_data[name] = data
             
-            else: # HDA
-                if data.hid:
-                    input_names.append( 'data %s' % data.hid )
+#            else: # HDA
+#                if data.hid:
+#                    input_names.append( 'data %s' % data.hid )
             input_ext = data.ext
             
             if data.dbkey not in [None, '?']:
                 input_dbkey = data.dbkey
-
+            data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name)
+            if trans.app.config.use_data_id_on_string:
+                # we want names in our on_strings not numbers
+                input_names.append(data_name_sane)
+            else:
+                if data.hid:
+                    input_names.append('data %s' % data.hid)
         # Collect chromInfo dataset and add as parameters to incoming
         db_datasets = {}
         db_dataset = trans.db_dataset_for( input_dbkey )
@@ -232,11 +239,14 @@
         if len( input_names ) == 1:
             on_text = input_names[0]
         elif len( input_names ) == 2:
-            on_text = '%s and %s' % tuple(input_names[0:2])
+            #on_text = '%s and %s' % tuple(input_names[0:2])
+            on_text = '%s_%s' % tuple(input_names[0:2])
         elif len( input_names ) == 3:
-            on_text = '%s, %s, and %s' % tuple(input_names[0:3])
+            #on_text = '%s, %s, and %s' % tuple(input_names[0:3])
+            on_text = '%s_%s_%s' % tuple(input_names[0:3])
         elif len( input_names ) > 3:
-            on_text = '%s, %s, and others' % tuple(input_names[0:2])
+            #on_text = '%s, %s, and others' % tuple(input_names[0:2])
+            on_text = '%s_%s_and_others' % tuple(input_names[0:2])
         else:
             on_text = ""
         # Add the dbkey to the incoming parameters

diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a scripts/functional_tests.py
--- a/scripts/functional_tests.py
+++ b/scripts/functional_tests.py
@@ -454,5 +454,4 @@
         return 1
 
 if __name__ == "__main__":
-    print '\n\n\n\n#### SGE_ROOT=', os.environ.get('SGE_ROOT','##### no SGE_ROOT!'),'\n\n\n'
     sys.exit( main() )

diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a tool_conf.xml.sample
--- a/tool_conf.xml.sample
+++ b/tool_conf.xml.sample
@@ -1,7 +1,6 @@
 <?xml version="1.0"?><toolbox><section name="Get Data" id="getext">
-    <tool file="rlGAT/rlGAT.xml"/><tool file="data_source/upload.xml"/><tool file="data_source/ucsc_tablebrowser.xml" /><tool file="data_source/ucsc_tablebrowser_test.xml" />


https://bitbucket.org/galaxy/galaxy-central/commits/68b694a22c59/
Changeset:   68b694a22c59
User:        fubar
Date:        2013-10-08 01:56:03
Summary:     backout -r 10811
Affected #:  2 files

diff -r 99404250db290eafa86f1e97e13100d9f705284a -r 68b694a22c59b6419a568330bbd0a02186068526 run.sh
--- a/run.sh
+++ b/run.sh
@@ -1,6 +1,7 @@
 #!/bin/sh
 
 cd `dirname $0`
+
 python ./scripts/check_python.py
 [ $? -ne 0 ] && exit 1
 

diff -r 99404250db290eafa86f1e97e13100d9f705284a -r 68b694a22c59b6419a568330bbd0a02186068526 tool-data/shared/ucsc/ucsc_build_sites.txt
--- a/tool-data/shared/ucsc/ucsc_build_sites.txt
+++ b/tool-data/shared/ucsc/ucsc_build_sites.txt
@@ -1,6 +1,8 @@
 #Harvested from http://genome.ucsc.edu/cgi-bin/das/dsn
-main	http://genome.ucsc.edu/cgi-bin/hgTracks?	latCha1,eriEur1,petMar2,papAnu2,ce6,anoCar2,ce2,rn3,loxAfr3,rn5,rn4,droYak1,oreNil2,droYak2,dp3,dp2,vicPac1,vicPac2,caeRem2,caeRem3,geoFor1,sorAra1,caePb2,gadMor1,droAna1,droAna2,triMan1,sacCer2,sacCer3,oryCun2,dasNov3,droGri1,caeJap1,anoCar1,choHof1,taeGut1,sacCer1,tupBel1,ce4,macEug2,ochPri2,gorGor3,mm8,mm7,pteVam1,micMur1,galGal3,galGal2,proCap1,ornAna1,sarHar1,equCab2,equCab1,myoLuc2,rheMac3,rheMac2,droPer1,apiMel1,droVir2,droVir1,cerSim1,dm1,ailMel1,cb3,calJac1,calJac3,melUnd1,droSim1,hg16,hg17,hg18,hg19,monDom5,monDom4,oviAri1,petMar1,oviAri3,droSec1,saiBol1,aplCal1,strPur1,mm9,braFlo1,oryLat2,susScr2,susScr3,canFam1,canFam3,canFam2,echTel2,caePb1,echTel1,ci2,ci1,dm3,papHam1,ponAbe2,ce10,mm10,xenTro1,xenTro3,xenTro2,fr3,fr2,fr1,galGal4,gasAcu1,dm2,nomLeu3,nomLeu2,nomLeu1,apiMel2,priPac1,cavPor3,panTro4,panTro1,panTro3,panTro2,danRer7,danRer6,danRer5,danRer4,danRer3,dipOrd1,tetNig2,tetNig1,droMoj2,bosTau3,bosTau2,bosTau4,musFur1,bosTau6,melGal1,droEre1,hetGla2,turTru2,hetGla1,monDom1,cb1,droMoj1,speTri2,allMis1,strPur2,otoGar3,bosTau7,tarSyr1,felCat3,chrPic1,anoGam1,felCat4,felCat5
+main	http://genome.ucsc.edu/cgi-bin/hgTracks?	priPac1,danRer4,mm9,mm8,droAna1,mm5,caeRem2,mm7,mm6,panTro1,dm3,panTro2,anoCar1,ce4,galGal3,galGal2,ce1,rn3,rn2,droMoj1,droMoj2,rn4,droYak1,droYak2,dp3,dp2,dm1,canFam1,danRer5,canFam2,danRer3,danRer2,ornAna1,ci2,ci1,tetNig1,bosTau1,bosTau3,bosTau2,equCab1,oryLat1,droAna2,droEre1,ponAbe2,rheMac2,sacCer1,droPer1,droSim1,monDom1,cb1,dm2,droSec1,strPur1,droVir2,droVir1,strPur2,sc1,xenTro1,droGri1,xenTro2,cb3,gasAcu1,caePb1,anoGam1,fr2,fr1,hg15,hg16,hg17,hg18,hg19,felCat3,apiMel2,monDom4,apiMel1,ce2
 #Harvested from http://archaea.ucsc.edu/cgi-bin/das/dsn
-archaea	http://archaea.ucsc.edu/cgi-bin/hgTracks?	therSibi1,symbTher_IAM14863,haloBori1,zymoMobi_ZM4,therFusc_YX,methIgne1,methPalus1,nocaJS61,his2Viru,acidCryp_JF_5,lactLact,paraSp_UWE25,geobKaus_HTA426,rhizEtli_CFN_42,jannCCS1,syntFuma_MPOB,dichNodo_VCS1703A,eschColi_O157H7,burk383,alteMacl_ATCC27126,bradJapo,lebron_Phage,aerPer1,acinSp_ADP1,anapMarg_ST_MARIES,sulfIsla_L_D_8_5,desuPsyc_LSV54,sulfIsla_ROD_SHAPED_VIRUS_2,shewDeni,methBoon1,shewLoihPV4,igniHosp1,therTher_HB8,ferrAcid2,sulfIsla_REY15A,acidCell_11B,stapHell1,acidBoon1,salmTyph_LT2,methBurt2,pyroYaya1,haloHisp1,glucOxyd_621H,geobSulf,acidFila_VIRUS_6,soliUsit_ELLIN6076,methAeol1,procMari_CCMP1375,oenoOeni_PSU_1,alcaBork_SK2,wiggBrev,therUzon1,peloProp_DSM2379,orieTsut_BORYONG,salmEnte_PARATYPI_ATC,therOnnu1,onioYell_PHYTOPLASMA,mariAqua_VT8,heliPylo_J99,pf5,psycIngr_37,methMahi1,candPela_UBIQUE_HTCC1,myxoXant_DK_1622,salmTyph_TY2,chloChlo_CAD3,mesoFlor_L1,shewW318,pyro1860_1,closDiff_630,pyroFuri_COM1,methMaze1,shewWood,haloPhag_HF2,sulfViru_KAMCHATKA_1,ureaUrea,streCoel,polaJS66,rhodRubr_ATCC11170,rhodRHA1,pireSp,geobBemi_BEM,enteCloa_ATCC13047_uid48363,franTula_TULARENSIS,syneSp_WH8102,archProf1,methPetr_PM1,heliAcin_SHEEBA,thioCrun_XCL_2,haloHalo1,idioLoih_L2TR,xantCamp,sodaGlos_MORSITANS,coryEffi_YS_314,mycoTube_H37RV,polyQLWP,peloLute_DSM273,burkCeno_AU_1054,vibrVuln_CMCP6_1,ente638,neorSenn_MIYAYAMA,burkCepa_AMMD,methMari_C5_1,methZhil1,pyroFuma1,methBark1,geobMeta_GS15,eschColi_CFT073,metMar1,enteFaec_V583,hodgCica_DSEM,erwiCaro_ATROSEPTICA,methFerv1,igniAggr1,campJeju_81_176,pseuHalo_TAC125,chlaTrac,heliPylo_SHI470,chroViol,mycoGeni,methSp_FS406_1,therTher_HB27,sulfIsla_Y_G_57_14,uncuMeth_RCI,haloUtah1,sulfSolf_98_2,propAcne_KPA171202,acidJS42,chroSale_DSM3043,tropWhip_TW08_27,acidFila_VIRUS_2,natrPhar1,salmEnte_CHOLERAESUIS,rubrXyla_DSM9941,methVulc1,franCcI3,persMariEXH1,caldSacc_DSM8903,haloSali1,lactPlan,strePneu_R6_uid57859,siliPome_DSS_3,nitrWino_NB_255,mesoLoti,coxiBurn,acidFila_VIRUS_7,acidBact_ELLIN345,caldMaqu1,acidFila_VIRUS_3,amycMedi_U32,acidFila_VIRUS_1,therAggr1,acidFila_VIRUS_8,acidFila_VIRUS_9,methKand1,methCaps_BATH,burkCeno_HI2424,psycArct_273_4,buchSp,pyroCali1,chloTepi_TLS,bifiLong,picrTorr1,shewANA3,bdelBact,gramFors_KT0803,peloCarb,sulfIsla_ROD_SHAPED_V,haloHF1,sulfViru_STSV1,clavMich_NCPPB_382,baciSubt2,methPhag_PSIM100,therNeut1,saliTrop_CNB_440,campJeju,blocFlor,pastMult,saliRube_DSM13855,carbHydrZ2901,mycoPhag_BPS,halaJeot1,dehaEthe_195,mycoPhag_CORNDOG,pyroAbys_VIRUS_1,shewMR7,shewMR4,desuMuco1,geobUran_RF4,desuRedu_MI_1,methStad1,legiPneu_PHILADELPHIA,methSWAN1,gloeViol,moorTher_ATCC39073,acidRods_VIRUS_1,roseDeni_OCH_114,natrMaga1,xantOryz_MAFF_311018,shewHali,burkXeno_LB400,heliPylo_26695,geobFRC_32,eschColi_K_12_SUBSTR_W3110,haloArch_DL31,leptInte,methEves1,hydrY04A,tricEryt_IMS101,photLumi,peloTher_SI,therKoda1,desuAuda,lawsIntr_PHE_MN1_00,mycoSmeg_MC2_155,therBaro1,methOkin1,metaCupr1,therElon,shewPutrCN32,eschColi_O157H7_EDL93,mariMari_MCS10,methInfe1,shigFlex_2A,magnMagn_AMB_1,sulfIsla_FILAMENTOUS,erytLito_HTCC2594,coryGlut_Bielefeld,pyrAer1,thioDeni_ATCC25259,deinRadi,yersPest_CO92,therVolc1,wolbEndo_OF_DROSOPHIL,acidHosp1,shewSedi_HAW_EB3,vulcMout1,ralsSola,crimd_Phage,natrPhag_PHICH1,neisMeni_Z2491_1,shewPeal,anabVari_ATCC29413,colwPsyc_34H,xantAuto_PY2,hyphNept_ATCC15444,vibrChol1,deinGeot_DSM11300,salmEnte_ENTERITIDIS,metaSedu,mycoPhag_GILES,lactSali_UCC118,pyroOgun2,flavJohn_UW101,mannSucc_MBEL55E,haemSomn_129PT,therTena2,haloXana1,methFerv_DSM2088,koraCryp1,oceaIhey,haloTurk1,haloWals1,baciHalo,hypeButy1,agroTume_C58_UWASH,pyroIsla1,cytoHutc_ATCC33406,nitrEuro,therMari,pyroArse1,nitrMari1,vibrFisc_ES114_1,methJann1,heliPylo_G27,vibrPara1,metAce1,neisGono_FA1090_1,methConc1,chlaPneu_CWL029,strePyog_M1_GAS,haloLacu1,novoArom_DSM12444,eschColi_536,azoaSp_EBN1,sulfToko1,listInno,sulfAcid1,geobTher_NG80_2,sulfSpin_VIRUS_4,arthFB24,porpGing_W83,leifXyli_XYLI_CTCB0,campFetu_82_40,mycoPhag_Firecracker,mycoPhag_Dori,methTher1,nostSp,sphiAlas_RB2256,saccDegr_2_40,bordBron,rickBell_RML369_C,thioDeni_ATCC33889,methSmit1,alkaMeta_QYMF,sulfIsla_M_16_27,halMar1,vermEise_EF01_2,granBeth_CGDNIH1,therPend1,sulfIsla_L_S_2_15,vibrChol_O395_1,nitrOcea_ATCC19707,campJeju_RM1221,pyroSphe_VIRUS,yersPseu_IP32953,woliSucc,caldSubt1,sulfTurr_ICOSAHEDRAL,methMarb1,aquiAeol,acidSacc1,burkPseu_1106A,tremPrin_PCIT,azorCaul2,synePCC6,hydrSp_128_5R1,baumCica_HOMALODISCA,shewBalt,therTeng,alkaEhrl_MLHE_1,sulSol1,eschColi_UTI89,methHung1,desuKamc1,pediPent_ATCC25745,therGamm1,nanEqu1,shewAmaz,hydr3684,desuVulg_HILDENBOROUG,haheChej_KCTC_2396,sulfIsla_Y_N_15_51,candBloc_FLORIDANUS,haloPhag_SH1,candCars_RUDDII,burkMall_ATCC23344,burkViet_G4,acidTwot_VIRUS,methVann1,archFulg1,fusoNucl,heliHepa,sulfViru_1,sulfViru_2,syntAcid_SB,salmTyph,trepPall,neisMeni_MC58_1,syntWolf_GOETTINGEN,syneElon_PCC_7942,haemInfl_KW20,haloHalo_SL1,plasFalc1,rhodSpha_2_4_1,archVene1,rhodPalu_CGA009,stapMari1,sinoMeli,sulfYell_SS_5,archBJ1_VIRUS,bartHens_HOUSTON_1,sulfAzorFu1,methMari_C6,methMari_C7,caulCres,crocWats_WH8501_0,rhodSpha_ATCC17025,magnMC1,rhodFerr_DSM15236,hydrTher1,heliPylo_HPAG1,pyrHor1,acidBott_VIRUS,sulfIsla_M_16_4,actiPleu_L20,paraDeni_PD1222,borrBurg,pseuAeru,methRumi1,methVolt1,cenaSymb1,geobLovl_SZ,pyrFur2,saccEryt_NRRL_2338,vibrChol_MO10_1,methPetr1,pyroNA2,yersPseu_IP31758,ehrlRumi_WELGEVONDEN,methLabrZ_1,vibrVuln_YJ016_1,vulcDist1,sulfIsla_M_14_25,his1Viru,methTherPT1,methFlag_KT,therAcid1,therPetr_RKU_1,shewFrig,photProf_SS9,burkThai_E264,nitrMult_ATCC25196,methMari_XI,neisMeni_FAM18_1,hermArse,haloVolc1,desuHafn_Y51,nocaFarc_IFM10152,anaeDeha_2CP_C,methPalud1,eschColi_APEC_O1,xyleFast,leucMese_ATCC8293,bactThet_VPI_5482,brucMeli,ferrPlac1,ralsEutr_JMP134,sulfViru_RAGGED_HILLS,pyrAby1,mculMari1,dechArom_RCB,stapAure_MU50,methPhag_PSIM2,eschColi_K12,sulfYO3A,haloMuko1,aeroHydr_ATCC7966,ther4557,baciAnth_AMES,shewOnei
+archaea	http://archaea.ucsc.edu/cgi-bin/hgTracks?	alkaEhrl_MLHE_1,shewW318,idioLoih_L2TR,sulSol1,erwiCaro_ATROSEPTICA,symbTher_IAM14863,moorTher_ATCC39073,therFusc_YX,methHung1,bradJapo,therElon,shewPutrCN32,pediPent_ATCC25745,mariMari_MCS10,nanEqu1,baciSubt,chlaTrac,magnMagn_AMB_1,chroViol,ralsSola,acidCryp_JF_5,erytLito_HTCC2594,desuVulg_HILDENBOROUG,pyrAer1,sulfToko1,shewANA3,paraSp_UWE25,geobKaus_HTA426,rhizEtli_CFN_42,uncuMeth_RCI,candBloc_FLORIDANUS,deinRadi,yersPest_CO92,saccEryt_NRRL_2338,rhodRHA1,candCars_RUDDII,burkMall_ATCC23344,eschColi_O157H7,burk383,psycIngr_37,rhodSpha_2_4_1,wolbEndo_OF_DROSOPHIL,burkViet_G4,propAcne_KPA171202,enteFaec_V583,campJeju_81_176,acidJS42,heliPylo_26695,pseuHalo_TAC125,chroSale_DSM3043,methVann1,archFulg1,neisMeni_Z2491_1,fusoNucl,vermEise_EF01_2,anabVari_ATCC29413,tropWhip_TW08_27,heliHepa,acinSp_ADP1,anapMarg_ST_MARIES,natrPhar1,haheChej_KCTC_2396,therPetr_RKU_1,neisGono_FA1090_1,colwPsyc_34H,desuPsyc_LSV54,hyphNept_ATCC15444,vibrChol1,deinGeot_DSM11300,strePyog_M1_GAS,franCcI3,salmTyph,metaSedu,lactSali_UCC118,trepPall,neisMeni_MC58_1,syntWolf_GOETTINGEN,flavJohn_UW101,methBoon1,haemSomn_129PT,shewLoihPV4,igniHosp1,haemInfl_KW20,haloHalo_SL1,ferrAcid1,sphiAlas_RB2256,candPela_UBIQUE_HTCC1,caldSacc_DSM8903,aerPer1,lactPlan,carbHydr_Z_2901,therTher_HB8,vibrVuln_YJ016_1,rhodPalu_CGA009,acidCell_11B,siliPome_DSS_3,therVolc1,haloWals1,rubrXyla_DSM9941,shewAmaz,nocaJS61,vibrVuln_CMCP6_1,sinoMeli,ureaUrea,baciHalo,bartHens_HOUSTON_1,nitrWino_NB_255,hypeButy1,methBurt2,polaJS66,mesoLoti,methMari_C7,caulCres,neisMeni_FAM18_1,acidBact_ELLIN345,caldMaqu1,salmEnte_PARATYPI_ATC,glucOxyd_621H,cytoHutc_ATCC33406,nitrEuro,therMari,coxiBurn,woliSucc,heliPylo_HPAG1,mesoFlor_L1,pyrHor1,methAeol1,procMari_CCMP1375,pyroArse1,oenoOeni_PSU_1,alcaBork_SK2,wiggBrev,actiPleu_L20,lactLact,methJann1,paraDeni_PD1222,borrBurg,pyroIsla1,orieTsut_BORYONG,shewMR4,methKand1,methCaps_BATH,onioYell_PHYTOPLASMA,bordBron,cenaSymb1,burkCeno_HI2424,franTula_TULARENSIS,pyrFur2,mariAqua_VT8,heliPylo_J99,psycArct_273_4,vibrChol_MO10_1,vibrPara1,rickBell_RML369_C,metAce1,buchSp,ehrlRumi_WELGEVONDEN,methLabrZ_1,chlaPneu_CWL029,thioCrun_XCL_2,pyroCali1,chloTepi_TLS,stapAure_MU50,novoArom_DSM12444,magnMC1,zymoMobi_ZM4,salmTyph_TY2,chloChlo_CAD3,azoaSp_EBN1,therTher_HB27,bifiLong,picrTorr1,listInno,bdelBact,gramFors_KT0803,sulfAcid1,geobTher_NG80_2,peloCarb,ralsEutr_JMP134,mannSucc_MBEL55E,syneSp_WH8102,methTherPT1,clavMich_NCPPB_382,therAcid1,syntAcid_SB,porpGing_W83,therNeut0,leifXyli_XYLI_CTCB0,shewFrig,photProf_SS9,thioDeni_ATCC25259,methMaze1,desuRedu_MI_1,burkThai_E264,campFetu_82_40,blocFlor,jannCCS1,nitrMult_ATCC25196,streCoel,soliUsit_ELLIN6076,pastMult,saliRube_DSM13855,methTher1,nostSp,shigFlex_2A,saccDegr_2_40,oceaIhey,dehaEthe_195,rhodRubr_ATCC11170,arthFB24,shewMR7,pireSp,anaeDeha_2CP_C,haloVolc1,dichNodo_VCS1703A,tricEryt_IMS101,mycoGeni,thioDeni_ATCC33889,methSmit1,geobUran_RF4,shewDeni,halMar1,desuHafn_Y51,methStad1,granBeth_CGDNIH1,therPend1,legiPneu_PHILADELPHIA,vibrChol_O395_1,nitrOcea_ATCC19707,campJeju_RM1221,methPetr_PM1,heliAcin_SHEEBA,eschColi_APEC_O1,peloTher_SI,haloHalo1,syntFuma_MPOB,xyleFast,gloeViol,leucMese_ATCC8293,bactThet_VPI_5482,xantCamp,sodaGlos_MORSITANS,geobSulf,roseDeni_OCH_114,coryEffi_YS_314,brucMeli,mycoTube_H37RV,vibrFisc_ES114_1,pyrAby1,burkXeno_LB400,polyQLWP,stapMari1,peloLute_DSM273,burkCeno_AU_1054,shewBalt,nocaFarc_IFM10152,ente638,mculMari1,saliTrop_CNB_440,neorSenn_MIYAYAMA,aquiAeol,dechArom_RCB,myxoXant_DK_1622,burkPseu_1106A,burkCepa_AMMD,methMari_C5_1,azorCaul2,methFlag_KT,leptInte,eschColi_K12,synePCC6,baumCica_HOMALODISCA,methBark1,pseuAeru,geobMeta_GS15,eschColi_CFT073,photLumi,metMar1,hermArse,campJeju,therKoda1,aeroHydr_ATCC7966,baciAnth_AMES,shewOnei,therTeng,lawsIntr_PHE_MN1_00
 #Harvested from http://genome-test.cse.ucsc.edu/cgi-bin/das/dsn
-test	http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?	punNye1,homIni20,danRer4,amaVit1,latCha1,eriEur1,criGri1,eriEur2,odoRosDiv1,lepWed1,papAnu2,ce6,ce5,anoCar2,ce3,ce2,chiLan1,rn3,loxAfr3,loxAfr2,simHuman,droMoj2,rn5,rn4,haeCon1,tetNig2,oreNil2,droYak2,dp3,dp2,mhcRef,vicPac1,eptFus1,caePb3,vicPac2,strPur2,caeRem3,geoFor1,caeRem1,sorAra1,oviAri1,galGal4,astMex1,musFur1,borEut13,homIni14,gadMor1,cacTus1,cacTus2,cacTus3,oryCun1,sacCer3,oryCun2,micOch1,colLiv1,caeJap3,melInc1,chlSab1,kirRefV3,kirRefV2,choHof1,droSim1,saiBol1,kirRefV5,kirRefV4,octDeg1,centromers1,droGri1,sc1,CHM1,mhcRefnoNs8,mhcRefnoNs9,dasNov2,dasNov3,taeGut1,caeJap4,taeGut2,caeSp71,equCab2,mhcRefnoNs2,mhcRefnoNs3,mhcRefnoNs4,mhcRefnoNs5,tupBel1,hlaRef2,galGal2,caeSp91,ce4,croPor0,bosTau7,bosTau6,ficAlb2,caePb2,panHod1,gorGor3,gorGor2,gorGor1,mayZeb1,mm7,pteVam1,macEug1,loxAfr1,venter1,micMur1,galGal3,mhcRefnoNs6,priPac2,proCap1,priPac1,mhcRefnoNs7,droEre1,sorAra2,mm10Strains1,ornAna1,xipMac1,anaPla1,mmtv,eutHer13,braNey1,droPer1,bioGla0,ce9,sarHar1,mhcRefnoNs,hlaRef3,simHumanMammal,hlaRef1,equCab1,simHumanPrim,cioSav2,myoLuc1,rheMac3,rheMac2,oreNil1,ailMel1,droSec1,canHg12,simRefMam,apiMel1,homIni13,nemVec1,turTru1,sacCer2,droVir2,droVir1,ce7,myoLuc2,repBase0,repBase1,dm1,susScr2,cb3,calJac1,macFas5,calJac3,dm3,triMan1,neoBri1,canFam1,conCri1,caeAng1,hg15,hg16,hg17,hg18,hg19,lotGig0,monDom5,monDom4,euaGli13,pteAle1,panPan1,petMar1,oviAri3,cb4,caeRem2,cioSav1,aplCal0,aplCal1,gliRes13,strPur1,anoCar1,dasNov1,mm9,tetNig1,mm8,braFlo1,oryLat2,caeJap2,braFlo2,lauRas13,susScr1,jacJac1,susScr3,hg19LggInv,caeJap1,canFam3,canFam2,echTel2,caePb1,echTel1,simHumanMam,ci2,ci1,canFamPoodle1,camFer1,papHam1,ce8,cerSim1,capHir1,droMoj1,oryLat1,ponAbe2,ce10,pseHum1,mm10Patch1,calMil0,caeSp111,hapBur1,cheMyd1,hg19Haps,mm10,falciparum,mhcRefnoNs24,mhcRefnoNs23,mhcRefnoNs22,mhcRefnoNs21,mhcRefnoNs20,xenTro1,xenTro3,xenTro2,xenTro7,falPer1,nonAfr13,homPan20,fr3,fr2,fr1,macEug2,caeRem4,gasAcu1,dm2,nomLeu3,nomLeu2,nomLeu1,apiMel2,melHap1,zonAlb1,tupChi1,ecoRef1,ecoRef3,ecoRef2,chrAsi1,ecoRef4,cavPor3,triCas2,panTro4,pelSin1,panTro1,droAna2,panTro3,panTro2,mhcRefnoNs18,mhcRefnoNs19,lepOcu1,hg19Patch10,strPur3,mhcRefnoNs12,mhcRefnoNs13,mhcRefnoNs10,mhcRefnoNs11,mhcRefnoNs16,mhcRefnoNs17,mhcRefnoNs14,mhcRefnoNs15,sacCer1,melUnd1,apaSpi1,danRer6,danRer5,mesAur1,danRer3,dipOrd1,eleEdw1,hg19Patch2,hg19Patch5,droAna1,hg19Patch9,bruMal1,bosTau3,bosTau2,bosTau5,bosTau4,ochPri2,ochPri3,melGal1,petMar2,rodEnt13,oryAfe1,evoSim1,priMat13,hetGla2,turTru2,repeats2,hetGla1,cb2,monDom1,cb1,speTri1,speTri2,catArr1,allMis1,allMis0,plasFalc3D7_2,araMac1,strPur4,otoGar1,otoGar3,kirRef,myoDav1,falChe1,tarSyr1,letCam1,afrOth13,danRer7,takFla1,droYak1,felCat3,orcOrc1,chrPic1,anoGam1,felCat4,felCat5
+test	http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks?	anoCar1,ce4,ce3,ce2,ce1,loxAfr1,rn2,eschColi_O157H7_1,rn4,droYak1,heliPylo_J99_1,droYak2,dp3,dp2,caeRem2,caeRem1,oryLat1,eschColi_K12_1,homIni13,homIni14,droAna1,droAna2,oryCun1,sacCer1,heliHepa1,droGri1,sc1,dasNov1,choHof1,tupBel1,mm9,mm8,vibrChol1,mm5,mm4,mm7,mm6,mm3,mm2,rn3,venter1,galGal3,galGal2,ornAna1,equCab1,cioSav2,rheMac2,eutHer13,droPer1,droVir2,droVir1,heliPylo_26695_1,euaGli13,calJac1,campJeju1,droSim1,hg13,hg15,hg16,hg17,monDom1,monDom4,droMoj1,petMar1,droMoj2,vibrChol_MO10_1,vibrPara1,gliRes13,vibrVuln_YJ016_1,braFlo1,cioSav1,lauRas13,dm1,canFam1,canFam2,ci1,echTel1,ci2,caePb1,dm3,ponAbe2,falciparum,xenTro1,xenTro2,nonAfr13,fr2,fr1,gasAcu1,dm2,apiMel1,apiMel2,eschColi_O157H7EDL933_1,priPac1,panTro1,hg18,panTro2,campJeju_RM1221_1,canHg12,vibrChol_O395_1,vibrFisc_ES114_1,danRer5,danRer4,danRer3,danRer2,danRer1,tetNig1,afrOth13,bosTau1,eschColi_CFT073_1,bosTau3,bosTau2,bosTau4,rodEnt13,droEre1,priMat13,vibrVuln_CMCP6_1,cb2,cb3,cb1,borEut13,droSec1,felCat3,strPur1,strPur2,otoGar1,catArr1,anoGam1,triCas2
+ucla	http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks?	araTha1
+psu bx main	http://main.genome-browser.bx.psu.edu/cgi-bin/hgTracks?	hg18,hg19,mm8,mm9


https://bitbucket.org/galaxy/galaxy-central/commits/c2561f65d6f9/
Changeset:   c2561f65d6f9
User:        fubar
Date:        2013-10-08 04:34:54
Summary:     revert bowtie2 and tophat2 wrappers
Affected #:  2 files

diff -r 68b694a22c59b6419a568330bbd0a02186068526 -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 tools/ngs_rna/tophat2_wrapper.xml
--- a/tools/ngs_rna/tophat2_wrapper.xml
+++ b/tools/ngs_rna/tophat2_wrapper.xml
@@ -1,5 +1,5 @@
-<tool id="tophat2" name="Tophat2" version="0.5">
-    <!-- Wrapper compatible with Tophat version 2.0.0 -->
+<tool id="tophat2" name="Tophat2" version="0.6">
+    <!-- Wrapper compatible with Tophat version 2.0.0+ --><description>Gapped-read mapper for RNA-seq data</description><version_command>tophat2 --version</version_command><requirements>
@@ -268,24 +268,25 @@
     </inputs><stdio>
-        <regex match="Exception" source="both" level="fatal" description="Tool exception"/>
+        <regex match="Exception|Error" source="both" level="fatal" description="Tool execution failed"/><regex match=".*" source="both" level="log" description="tool progress"/></stdio><outputs>
-        <data format="tabular" name="fusions" label="${on_string}_tophat2fus.xls" from_work_dir="tophat_out/fusions.out">
+        <data format="txt" name="align_summary" label="${tool.name} on ${on_string}: align_summary" from_work_dir="tophat_out/align_summary.txt"/>
+        <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data>
-        <data format="bed" name="insertions" label="${on_string}_tophat2ins.bed" from_work_dir="tophat_out/insertions.bed">
+        <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="deletions" label="${on_string}_tophhat2del.bed" from_work_dir="tophat_out/deletions.bed">
+        <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bed" name="junctions" label="${on_string}tophat2sj.bed" from_work_dir="tophat_out/junctions.bed">
+        <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data>
-        <data format="bam" name="accepted_hits" label="${on_string}tophat2hits.bam" from_work_dir="tophat_out/accepted_hits.bam">
+        <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs>
@@ -312,6 +313,7 @@
         </actions></macro></macros>
+
     <tests><!-- Test base-space single-end reads with pre-built index and preset parameters --><test>
@@ -450,7 +452,8 @@
     <help>
 **Tophat Overview**
 
-TopHat_ is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Please cite: Trapnell, C., Pachter, L. and Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111 (2009).        
+TopHat_ is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie(2), and then analyzes the mapping results to identify splice junctions between exons. Please cite: Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, and Salzberg SL. TopHat2: accurate alignment
+of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36, 2013.
 
 .. _Tophat: http://tophat.cbcb.umd.edu/
         

diff -r 68b694a22c59b6419a568330bbd0a02186068526 -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 tools/sr_mapping/bowtie2_wrapper.xml
--- a/tools/sr_mapping/bowtie2_wrapper.xml
+++ b/tools/sr_mapping/bowtie2_wrapper.xml
@@ -95,14 +95,15 @@
         #end if
         
         ## view/sort and output file
-        | samtools view -Su - | samtools sort -o - - > $output
+        | samtools view -Su - | samtools sort -o - - > $output;
         
         ## rename unaligned sequence files
         #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r:
             #set left  = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' )
             #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' )
-            ;mv $left $output_unaligned_reads_l;
-            mv $right $output_unaligned_reads_r
+        
+            mv $left $output_unaligned_reads_l;
+            mv $right $output_unaligned_reads_r;
         #end if
     </command>
     
@@ -112,8 +113,6 @@
     </stdio><inputs>
-        <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" />
-
         <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?">
@@ -217,7 +216,7 @@
 
     <!-- define outputs --><outputs>
-        <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" >
+        <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" ><filter>unaligned_file is True</filter><actions><action type="format">
@@ -225,7 +224,7 @@
                 </action></actions></data>
-        <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)">
+        <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format">
@@ -233,7 +232,7 @@
                 </action></actions></data>
-        <data format="bam" name="output" label="${tool.name} on ${on_string}_${jobtitle}_aligned.bam">
+        <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads"><actions><conditional name="reference_genome.source"><when value="indexed">


https://bitbucket.org/galaxy/galaxy-central/commits/a2ffcbc614d4/
Changeset:   a2ffcbc614d4
User:        fubar
Date:        2013-10-08 04:39:22
Summary:     restore original tools/actions/__init__.py
Affected #:  1 file

diff -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 -r a2ffcbc614d42f343f4bab09035e771bb50b0dcc lib/galaxy/tools/actions/__init__.py
--- a/lib/galaxy/tools/actions/__init__.py
+++ b/lib/galaxy/tools/actions/__init__.py
@@ -1,6 +1,5 @@
 import os
 import galaxy.tools
-import re
 
 from galaxy.exceptions import ObjectInvalid
 from galaxy.model import LibraryDatasetDatasetAssociation
@@ -23,13 +22,13 @@
     """
     def execute( self, tool, trans, incoming={}, set_output_hid=True ):
         raise TypeError("Abstract method")
-    
+
 class DefaultToolAction( object ):
     """Default tool action is to run an external command"""
-    
+
     def collect_input_datasets( self, tool, param_values, trans ):
         """
-        Collect any dataset inputs from incoming. Returns a mapping from 
+        Collect any dataset inputs from incoming. Returns a mapping from
         parameter name to Dataset instance for each tool parameter that is
         of the DataToolParameter type.
         """
@@ -118,9 +117,9 @@
         def make_dict_copy( from_dict ):
             """
             Makes a copy of input dictionary from_dict such that all values that are dictionaries
-            result in creation of a new dictionary ( a sort of deepcopy ).  We may need to handle 
-            other complex types ( e.g., lists, etc ), but not sure... 
-            Yes, we need to handle lists (and now are)... 
+            result in creation of a new dictionary ( a sort of deepcopy ).  We may need to handle
+            other complex types ( e.g., lists, etc ), but not sure...
+            Yes, we need to handle lists (and now are)...
             """
             copy_from_dict = {}
             for key, value in from_dict.items():
@@ -169,11 +168,11 @@
                     input_values[ input.name ] = galaxy.tools.SelectToolParameterWrapper( input, input_values[ input.name ], tool.app, other_values = incoming )
                 else:
                     input_values[ input.name ] = galaxy.tools.InputValueWrapper( input, input_values[ input.name ], incoming )
-        
+
         # Set history.
         if not history:
             history = tool.get_default_history_by_trans( trans, create=True )
-        
+
         out_data = odict()
         # Collect any input datasets from the incoming parameters
         inp_data = self.collect_input_datasets( tool, incoming, trans )
@@ -186,26 +185,20 @@
             if not data:
                 data = NoneDataset( datatypes_registry = trans.app.datatypes_registry )
                 continue
-                
+
             # Convert LDDA to an HDA.
             if isinstance(data, LibraryDatasetDatasetAssociation):
                 data = data.to_history_dataset_association( None )
                 inp_data[name] = data
-            
-#            else: # HDA
-#                if data.hid:
-#                    input_names.append( 'data %s' % data.hid )
+
+            else: # HDA
+                if data.hid:
+                    input_names.append( 'data %s' % data.hid )
             input_ext = data.ext
-            
+
             if data.dbkey not in [None, '?']:
                 input_dbkey = data.dbkey
-            data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name)
-            if trans.app.config.use_data_id_on_string:
-                # we want names in our on_strings not numbers
-                input_names.append(data_name_sane)
-            else:
-                if data.hid:
-                    input_names.append('data %s' % data.hid)
+
         # Collect chromInfo dataset and add as parameters to incoming
         db_datasets = {}
         db_dataset = trans.db_dataset_for( input_dbkey )
@@ -221,13 +214,13 @@
                 if 'fasta' in custom_build_dict:
                     build_fasta_dataset = trans.sa_session.query( trans.app.model.HistoryDatasetAssociation ).get( custom_build_dict[ 'fasta' ] )
                     chrom_info = build_fasta_dataset.get_converted_dataset( trans, 'len' ).file_name
-            
+
             if not chrom_info:
                 # Default to built-in build.
                 chrom_info = os.path.join( trans.app.config.len_file_path, "%s.len" % input_dbkey )
             incoming[ "chromInfo" ] = chrom_info
         inp_data.update( db_datasets )
-        
+
         # Determine output dataset permission/roles list
         existing_datasets = [ inp for inp in inp_data.values() if inp ]
         if existing_datasets:
@@ -239,20 +232,17 @@
         if len( input_names ) == 1:
             on_text = input_names[0]
         elif len( input_names ) == 2:
-            #on_text = '%s and %s' % tuple(input_names[0:2])
-            on_text = '%s_%s' % tuple(input_names[0:2])
+            on_text = '%s and %s' % tuple(input_names[0:2])
         elif len( input_names ) == 3:
-            #on_text = '%s, %s, and %s' % tuple(input_names[0:3])
-            on_text = '%s_%s_%s' % tuple(input_names[0:3])
+            on_text = '%s, %s, and %s' % tuple(input_names[0:3])
         elif len( input_names ) > 3:
-            #on_text = '%s, %s, and others' % tuple(input_names[0:2])
-            on_text = '%s_%s_and_others' % tuple(input_names[0:2])
+            on_text = '%s, %s, and others' % tuple(input_names[0:2])
         else:
             on_text = ""
         # Add the dbkey to the incoming parameters
         incoming[ "dbkey" ] = input_dbkey
         params = None #wrapped params are used by change_format action and by output.label; only perform this wrapping once, as needed
-        # Keep track of parent / child relationships, we'll create all the 
+        # Keep track of parent / child relationships, we'll create all the
         # datasets first, then create the associations
         parent_to_child_pairs = []
         child_dataset_names = set()
@@ -268,7 +258,7 @@
                 if output.parent:
                     parent_to_child_pairs.append( ( output.parent, name ) )
                     child_dataset_names.add( name )
-                ## What is the following hack for? Need to document under what 
+                ## What is the following hack for? Need to document under what
                 ## conditions can the following occur? (james@bx.psu.edu)
                 # HACK: the output data has already been created
                 #      this happens i.e. as a result of the async controller
@@ -289,7 +279,7 @@
                             ext = input_extension
                         except Exception, e:
                             pass
-                    
+
                     #process change_format tags
                     if output.change_format:
                         if params is None:
@@ -332,14 +322,14 @@
                 object_store_id = data.dataset.object_store_id      # these will be the same thing after the first output
                 # This may not be neccesary with the new parent/child associations
                 data.designation = name
-                # Copy metadata from one of the inputs if requested. 
+                # Copy metadata from one of the inputs if requested.
                 if output.metadata_source:
                     data.init_meta( copy_from=inp_data[output.metadata_source] )
                 else:
                     data.init_meta()
                 # Take dbkey from LAST input
                 data.dbkey = str(input_dbkey)
-                # Set state 
+                # Set state
                 # FIXME: shouldn't this be NEW until the job runner changes it?
                 data.state = data.states.QUEUED
                 data.blurb = "queued"
@@ -361,7 +351,7 @@
                         params = make_dict_copy( incoming )
                         wrap_values( tool.inputs, params, skip_missing_values = not tool.check_values )
                     data.name = self._get_default_data_name( data, tool, on_text=on_text, trans=trans, incoming=incoming, history=history, params=params, job_params=job_params )
-                # Store output 
+                # Store output
                 out_data[ name ] = data
                 if output.actions:
                     #Apply pre-job tool-output-dataset actions; e.g. setting metadata, changing format
@@ -383,7 +373,7 @@
             parent_dataset = out_data[ parent_name ]
             child_dataset = out_data[ child_name ]
             parent_dataset.children.append( child_dataset )
-        # Store data after custom code runs 
+        # Store data after custom code runs
         trans.sa_session.flush()
         # Create the job object
         job = trans.app.model.Job()
@@ -464,7 +454,7 @@
             for name in inp_data.keys():
                 dataset = inp_data[ name ]
             redirect_url = tool.parse_redirect_url( dataset, incoming )
-            # GALAXY_URL should be include in the tool params to enable the external application 
+            # GALAXY_URL should be include in the tool params to enable the external application
             # to send back to the current Galaxy instance
             GALAXY_URL = incoming.get( 'GALAXY_URL', None )
             assert GALAXY_URL is not None, "GALAXY_URL parameter missing in tool config."


https://bitbucket.org/galaxy/galaxy-central/commits/3df68022beef/
Changeset:   3df68022beef
User:        fubar
Date:        2013-10-08 07:15:56
Summary:     Big cleanup - remove material in tools/rgenetics accidentally added in August when a push was made from a BakerIDI repository - apologies to all. This push will revert all those changes and also removes the old rgenetics SNP tools which are now deprecated.
Affected #:  1 file

diff -r a2ffcbc614d42f343f4bab09035e771bb50b0dcc -r 3df68022beef2077798ed50a33afb03be75a841a tools/rgenetics/rgFastQC.py
--- a/tools/rgenetics/rgFastQC.py
+++ b/tools/rgenetics/rgFastQC.py
@@ -24,10 +24,29 @@
 import optparse
 import shutil
 import tempfile
-from rgutils import getFileString
 import zipfile
 import gzip
 
+
+def getFileString(fpath, outpath):
+    """
+    format a nice file size string
+    """
+    size = ''
+    fp = os.path.join(outpath, fpath)
+    s = '? ?'
+    if os.path.isfile(fp):
+        n = float(os.path.getsize(fp))
+        if n > 2**20:
+            size = ' (%1.1f MB)' % (n/2**20)
+        elif n > 2**10:
+            size = ' (%1.1f KB)' % (n/2**10)
+        elif n > 0:
+            size = ' (%d B)' % (int(n))
+        s = '%s %s' % (fpath, size) 
+    return s
+
+
 class FastQC():
     """wrapper
     """


https://bitbucket.org/galaxy/galaxy-central/commits/c8b55344e779/
Changeset:   c8b55344e779
User:        fubar
Date:        2013-10-08 07:30:54
Summary:     Proper removal of rgenetics deprecated tool wrappers
Affected #:  41 files

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/listFiles.py
--- a/tools/rgenetics/listFiles.py
+++ /dev/null
@@ -1,227 +0,0 @@
-#Provides Upload tool with access to list of available files
-import glob,sys
-import galaxy.app as thisapp
-import galaxy.util
-
-from elementtree.ElementTree import XML
-
-librepos = '/usr/local/galaxy/data/rg'
-myrepos = '/home/rerla/galaxy'
-marchinirepos = '/usr/local/galaxy/data/rg/snptest'
-
-from galaxy.tools.parameters import DataToolParameter
-
-#Provides Upload tool with access to list of available builds
-
-builds = []
-#Read build names and keys from galaxy.util
-for dbkey, build_name in galaxy.util.dbnames:
-    builds.append((build_name,dbkey,False))
-
-#Return available builds
-def get_available_builds(defval='hg18'):
-    for i,x in enumerate(builds):
-        if x[1] == defval:
-           x = list(x)
-           x[2] = True
-           builds[i] = tuple(x)
-    return builds
-
-
-
-def get_tabular_cols( input, outformat='gg' ):
-    """numeric only other than rs for strict genome graphs
-    otherwise tabular. Derived from galaxy tool source around August 2007 by Ross"""
-    columns = []
-    seenCnames = {}
-    elems = []
-    colnames = ['Col%d' % x for x in range(input.metadata.columns+1)]
-    strict = (outformat=='gg')
-    for i, line in enumerate( file ( input.file_name ) ):
-        if line and not line.startswith( '#' ): 
-            line = line.rstrip('\r\n')
-            elems = line.split( '\t' )
-    
-            """
-            Strict gg note:
-            Since this tool requires users to select only those columns
-            that contain numerical values, we'll restrict the column select
-            list appropriately other than the first column which must be a marker
-            """
-            if len(elems) > 0:
-                for col in range(1, input.metadata.columns+1):
-		    isFloat = False # short circuit common result
-                    try:
-                        val = float(elems[col-1])
-                        isFloat = True
-                    except:
-                        val = elems[col-1]
-                        if val:
-                            if i == 0: # header row
-                               colnames[col] = val
-                    if isFloat or (not strict) or (col == 1): # all in if not GG
-                        option = colnames[col]
-                        if not seenCnames.get(option,None): # new
-                              columns.append((option,str(col),False))
-                              seenCnames[option] = option
-            #print 'get_tab: %d=%s. Columns=%s' % (i,line,str(columns))
-            if len(columns) > 0 and i > 10:
-                """
-                We have our select list built, so we can break out of the outer most for loop
-                """
-                break 
-        if i == 30:
-            break # Hopefully we never get here...
-    for option in range(min(5,len(columns))):
-      (x,y,z) = columns[option]
-      columns[option] = (x,y,True)
-    return columns # sorted select options
-
-def get_marchini_dir():
-    """return the filesystem directory for snptest style files"""
-    return marchinirepos
-
-
-def get_lib_SNPTESTCaCofiles():
-    """return a list of file names - without extensions - available for caco studies
-    These have a common file name with both _1 and _2 suffixes"""
-    d = get_marchini_dir()
-    testsuffix = '.gen_1' # glob these
-    flist = glob.glob('%s/*%s' % (d,testsuffix))
-    flist = [x.split(testsuffix)[0] for x in flist] # leaves with a list of file set names
-    if len(flist) > 0:
-        dat = [(flist[0],flist[0],True),]
-	dat += [(x,x,False) for x in flist[1:]]
-    else:
-        dat = [('No Marchini CaCo files found in %s - convert some using the Marchini converter tool' % d,'None',True),]
-    return dat
-
-def getChropt():
-    """return dynamic chromosome select options
-    """
-    c = ['X','Y']
-    c += ['%d' % x for x in range(1,23)]
-    dat = [(x,x,False) for x in c]
-    x,y,z = dat[3]
-    dat[3] = (x,y,True)
-    return dat
-
-
-def get_phecols(fname=''):
-   """ return a list of phenotype columns for a multi-select list
-   prototype:
-   foo = ('fake - not yet implemented','not implemented','False')
-   dat = [foo for x in range(5)]
-   return dat
-   """
-   try:
-   	header = file(fname,'r').next().split()
-   except:
-        return [('get_phecols unable to open file %s' % fname,'None',False),]
-   dat = [(x,x,False) for x in header]
-   return dat
-
-#Return various kinds of files
-
-def get_lib_pedfiles():
-    dat = glob.glob('%s/ped/*.ped' % librepos)
-    dat += glob.glob('%s/ped/*.ped' % myrepos)
-    dat.sort()
-    if len(dat) > 0:
-        dat = [x.split('.ped')[0] for x in dat]
-    	dat = [(x,x,'True') for x in dat]
-    else:
-        dat = [('No ped files - add some to %s/ped or %s/ped' % (librepos,myrepos),'None',True),]
-    return dat
-
-def get_lib_phefiles():
-    ext = 'phe'
-    dat = glob.glob('%s/pheno/*.%s' % (librepos,ext))
-    dat += glob.glob('%s/pheno/*.%s' % (myrepos,ext))
-    dat.sort()
-    if len(dat) > 0:
-    	dat = [(x,x,'False') for x in dat]
-    else:
-        dat = [('No %s files - add some to %s/pheno or %s/pheno' % (ext,librepos,myrepos),'None',True),]
-    return dat
-
-def get_lib_bedfiles():
-    dat = glob.glob('%s/plinkbed/*.bed' % librepos)
-    dat += glob.glob('%s/plinkbed/*.bed' % myrepos)
-    dat.sort()
-    if len(dat) > 0:
-        dat = [x.split('.bed')[0] for x in dat]
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No bed files - Please import some to %s/plinkbed or %s/plinkbed' % (librepos,myrepos),'None',True),]
-    return dat
-
-def get_lib_fbatfiles():
-    dat = glob.glob('%s/plinkfbat/*.ped' % librepos)
-    dat += glob.glob('%s/plinkfbat/*.ped' % myrepos)
-    dat.sort()
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No fbat bed files - Please import some to %s/plinkfbat or %s/plinkfbat' % (librepos,myrepos),'None',True),]
-    return dat
-
-def get_lib_mapfiles():
-    dat = glob.glob('%s/ped/*.map' % librepos)
-    dat += glob.glob('%s/ped/*.map' % myrepos)
-    dat.sort()
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No map files - add some to %s/ped' % librepos,'None',True),]
-    return dat
-
-def get_my_pedfiles():
-    dat = glob.glob('%s/*.ped' % myrepos)
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-def get_my_mapfiles():
-    dat = glob.glob('%s/*.map' % myrepos)
-    if len(dat) > 0:
-    	dat = [(x,x,'True') for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-def get_lib_xlsfiles():
-    dat = glob.glob('%s/*.xls' % librepos)
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-def get_lib_htmlfiles():
-    dat = glob.glob('%s/*.html' % librepos)
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-def get_my_xlsfiles():
-    dat = glob.glob('%s/*.xls' %  myrepos)
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-def get_my_htmlfiles():
-    dat = glob.glob('%s/*.html' % myrepos)
-    if len(dat) > 0:
-    	dat = [(x,x,False) for x in dat]
-    else:
-        dat = [('No ped files - add some to %s' % librepos,'None',True),]
-    return dat
-
-

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/plinkbinJZ.py
--- a/tools/rgenetics/plinkbinJZ.py
+++ /dev/null
@@ -1,868 +0,0 @@
-#!/usr/bin/env python2.4
-"""
-"""
-
-import optparse,os,subprocess,gzip,struct,time,commands
-from array import array
-
-#from AIMS import util
-#from pga import util as pgautil
-
-__FILE_ID__ = '$Id: plinkbinJZ.py,v 1.14 2009/07/13 20:16:50 rejpz Exp $'
-
-VERBOSE = True
-
-MISSING_ALLELES = set(['N', '0', '.', '-',''])
-
-AUTOSOMES = set(range(1, 23) + [str(c) for c in range(1, 23)])
-
-MAGIC_BYTE1 = '00110110'
-MAGIC_BYTE2 = '11011000'
-FORMAT_SNP_MAJOR_BYTE = '10000000'
-FORMAT_IND_MAJOR_BYTE = '00000000'
-MAGIC1 = (0, 3, 1, 2)
-MAGIC2 = (3, 1, 2, 0)
-FORMAT_SNP_MAJOR = (2, 0, 0, 0)
-FORMAT_IND_MAJOR = (0, 0, 0, 0)
-HEADER_LENGTH = 3
-
-HOM0 = 3
-HOM1 = 0
-MISS = 2
-HET  = 1
-HOM0_GENO = (0, 0)
-HOM1_GENO = (1, 1)
-HET_GENO = (0, 1)
-MISS_GENO = (-9, -9)
-
-GENO_TO_GCODE = {
-    HOM0_GENO: HOM0, 
-    HET_GENO: HET, 
-    HOM1_GENO: HOM1, 
-    MISS_GENO: MISS, 
-    }
-
-CHROM_REPLACE = {
-    'X': '23',
-    'Y': '24',
-    'XY': '25',
-    'MT': '26',
-    'M': '26',
-}
-
-MAP_LINE_EXCEPTION_TEXT = """
-One or more lines in the *.map file has only three fields.
-The line was:
-
-%s
-
-If you are running rgGRR through EPMP, this is usually a
-sign that you are using an old version of the map file.
-You can correct the problem by re-running Subject QC.  If
-you have already tried this, please contact the developers,
-or file a bug.
-"""
-
-INT_TO_GCODE = {
-     0: array('i', (0, 0, 0, 0)),   1: array('i', (2, 0, 0, 0)),   2: array('i', (1, 0, 0, 0)),   3: array('i', (3, 0, 0, 0)), 
-     4: array('i', (0, 2, 0, 0)),   5: array('i', (2, 2, 0, 0)),   6: array('i', (1, 2, 0, 0)),   7: array('i', (3, 2, 0, 0)), 
-     8: array('i', (0, 1, 0, 0)),   9: array('i', (2, 1, 0, 0)),  10: array('i', (1, 1, 0, 0)),  11: array('i', (3, 1, 0, 0)), 
-    12: array('i', (0, 3, 0, 0)),  13: array('i', (2, 3, 0, 0)),  14: array('i', (1, 3, 0, 0)),  15: array('i', (3, 3, 0, 0)), 
-    16: array('i', (0, 0, 2, 0)),  17: array('i', (2, 0, 2, 0)),  18: array('i', (1, 0, 2, 0)),  19: array('i', (3, 0, 2, 0)), 
-    20: array('i', (0, 2, 2, 0)),  21: array('i', (2, 2, 2, 0)),  22: array('i', (1, 2, 2, 0)),  23: array('i', (3, 2, 2, 0)), 
-    24: array('i', (0, 1, 2, 0)),  25: array('i', (2, 1, 2, 0)),  26: array('i', (1, 1, 2, 0)),  27: array('i', (3, 1, 2, 0)), 
-    28: array('i', (0, 3, 2, 0)),  29: array('i', (2, 3, 2, 0)),  30: array('i', (1, 3, 2, 0)),  31: array('i', (3, 3, 2, 0)), 
-    32: array('i', (0, 0, 1, 0)),  33: array('i', (2, 0, 1, 0)),  34: array('i', (1, 0, 1, 0)),  35: array('i', (3, 0, 1, 0)), 
-    36: array('i', (0, 2, 1, 0)),  37: array('i', (2, 2, 1, 0)),  38: array('i', (1, 2, 1, 0)),  39: array('i', (3, 2, 1, 0)), 
-    40: array('i', (0, 1, 1, 0)),  41: array('i', (2, 1, 1, 0)),  42: array('i', (1, 1, 1, 0)),  43: array('i', (3, 1, 1, 0)), 
-    44: array('i', (0, 3, 1, 0)),  45: array('i', (2, 3, 1, 0)),  46: array('i', (1, 3, 1, 0)),  47: array('i', (3, 3, 1, 0)), 
-    48: array('i', (0, 0, 3, 0)),  49: array('i', (2, 0, 3, 0)),  50: array('i', (1, 0, 3, 0)),  51: array('i', (3, 0, 3, 0)), 
-    52: array('i', (0, 2, 3, 0)),  53: array('i', (2, 2, 3, 0)),  54: array('i', (1, 2, 3, 0)),  55: array('i', (3, 2, 3, 0)), 
-    56: array('i', (0, 1, 3, 0)),  57: array('i', (2, 1, 3, 0)),  58: array('i', (1, 1, 3, 0)),  59: array('i', (3, 1, 3, 0)), 
-    60: array('i', (0, 3, 3, 0)),  61: array('i', (2, 3, 3, 0)),  62: array('i', (1, 3, 3, 0)),  63: array('i', (3, 3, 3, 0)), 
-    64: array('i', (0, 0, 0, 2)),  65: array('i', (2, 0, 0, 2)),  66: array('i', (1, 0, 0, 2)),  67: array('i', (3, 0, 0, 2)), 
-    68: array('i', (0, 2, 0, 2)),  69: array('i', (2, 2, 0, 2)),  70: array('i', (1, 2, 0, 2)),  71: array('i', (3, 2, 0, 2)), 
-    72: array('i', (0, 1, 0, 2)),  73: array('i', (2, 1, 0, 2)),  74: array('i', (1, 1, 0, 2)),  75: array('i', (3, 1, 0, 2)), 
-    76: array('i', (0, 3, 0, 2)),  77: array('i', (2, 3, 0, 2)),  78: array('i', (1, 3, 0, 2)),  79: array('i', (3, 3, 0, 2)), 
-    80: array('i', (0, 0, 2, 2)),  81: array('i', (2, 0, 2, 2)),  82: array('i', (1, 0, 2, 2)),  83: array('i', (3, 0, 2, 2)), 
-    84: array('i', (0, 2, 2, 2)),  85: array('i', (2, 2, 2, 2)),  86: array('i', (1, 2, 2, 2)),  87: array('i', (3, 2, 2, 2)), 
-    88: array('i', (0, 1, 2, 2)),  89: array('i', (2, 1, 2, 2)),  90: array('i', (1, 1, 2, 2)),  91: array('i', (3, 1, 2, 2)), 
-    92: array('i', (0, 3, 2, 2)),  93: array('i', (2, 3, 2, 2)),  94: array('i', (1, 3, 2, 2)),  95: array('i', (3, 3, 2, 2)), 
-    96: array('i', (0, 0, 1, 2)),  97: array('i', (2, 0, 1, 2)),  98: array('i', (1, 0, 1, 2)),  99: array('i', (3, 0, 1, 2)), 
-   100: array('i', (0, 2, 1, 2)), 101: array('i', (2, 2, 1, 2)), 102: array('i', (1, 2, 1, 2)), 103: array('i', (3, 2, 1, 2)), 
-   104: array('i', (0, 1, 1, 2)), 105: array('i', (2, 1, 1, 2)), 106: array('i', (1, 1, 1, 2)), 107: array('i', (3, 1, 1, 2)), 
-   108: array('i', (0, 3, 1, 2)), 109: array('i', (2, 3, 1, 2)), 110: array('i', (1, 3, 1, 2)), 111: array('i', (3, 3, 1, 2)), 
-   112: array('i', (0, 0, 3, 2)), 113: array('i', (2, 0, 3, 2)), 114: array('i', (1, 0, 3, 2)), 115: array('i', (3, 0, 3, 2)), 
-   116: array('i', (0, 2, 3, 2)), 117: array('i', (2, 2, 3, 2)), 118: array('i', (1, 2, 3, 2)), 119: array('i', (3, 2, 3, 2)), 
-   120: array('i', (0, 1, 3, 2)), 121: array('i', (2, 1, 3, 2)), 122: array('i', (1, 1, 3, 2)), 123: array('i', (3, 1, 3, 2)), 
-   124: array('i', (0, 3, 3, 2)), 125: array('i', (2, 3, 3, 2)), 126: array('i', (1, 3, 3, 2)), 127: array('i', (3, 3, 3, 2)), 
-   128: array('i', (0, 0, 0, 1)), 129: array('i', (2, 0, 0, 1)), 130: array('i', (1, 0, 0, 1)), 131: array('i', (3, 0, 0, 1)), 
-   132: array('i', (0, 2, 0, 1)), 133: array('i', (2, 2, 0, 1)), 134: array('i', (1, 2, 0, 1)), 135: array('i', (3, 2, 0, 1)), 
-   136: array('i', (0, 1, 0, 1)), 137: array('i', (2, 1, 0, 1)), 138: array('i', (1, 1, 0, 1)), 139: array('i', (3, 1, 0, 1)), 
-   140: array('i', (0, 3, 0, 1)), 141: array('i', (2, 3, 0, 1)), 142: array('i', (1, 3, 0, 1)), 143: array('i', (3, 3, 0, 1)), 
-   144: array('i', (0, 0, 2, 1)), 145: array('i', (2, 0, 2, 1)), 146: array('i', (1, 0, 2, 1)), 147: array('i', (3, 0, 2, 1)), 
-   148: array('i', (0, 2, 2, 1)), 149: array('i', (2, 2, 2, 1)), 150: array('i', (1, 2, 2, 1)), 151: array('i', (3, 2, 2, 1)), 
-   152: array('i', (0, 1, 2, 1)), 153: array('i', (2, 1, 2, 1)), 154: array('i', (1, 1, 2, 1)), 155: array('i', (3, 1, 2, 1)), 
-   156: array('i', (0, 3, 2, 1)), 157: array('i', (2, 3, 2, 1)), 158: array('i', (1, 3, 2, 1)), 159: array('i', (3, 3, 2, 1)), 
-   160: array('i', (0, 0, 1, 1)), 161: array('i', (2, 0, 1, 1)), 162: array('i', (1, 0, 1, 1)), 163: array('i', (3, 0, 1, 1)), 
-   164: array('i', (0, 2, 1, 1)), 165: array('i', (2, 2, 1, 1)), 166: array('i', (1, 2, 1, 1)), 167: array('i', (3, 2, 1, 1)), 
-   168: array('i', (0, 1, 1, 1)), 169: array('i', (2, 1, 1, 1)), 170: array('i', (1, 1, 1, 1)), 171: array('i', (3, 1, 1, 1)), 
-   172: array('i', (0, 3, 1, 1)), 173: array('i', (2, 3, 1, 1)), 174: array('i', (1, 3, 1, 1)), 175: array('i', (3, 3, 1, 1)), 
-   176: array('i', (0, 0, 3, 1)), 177: array('i', (2, 0, 3, 1)), 178: array('i', (1, 0, 3, 1)), 179: array('i', (3, 0, 3, 1)), 
-   180: array('i', (0, 2, 3, 1)), 181: array('i', (2, 2, 3, 1)), 182: array('i', (1, 2, 3, 1)), 183: array('i', (3, 2, 3, 1)), 
-   184: array('i', (0, 1, 3, 1)), 185: array('i', (2, 1, 3, 1)), 186: array('i', (1, 1, 3, 1)), 187: array('i', (3, 1, 3, 1)), 
-   188: array('i', (0, 3, 3, 1)), 189: array('i', (2, 3, 3, 1)), 190: array('i', (1, 3, 3, 1)), 191: array('i', (3, 3, 3, 1)), 
-   192: array('i', (0, 0, 0, 3)), 193: array('i', (2, 0, 0, 3)), 194: array('i', (1, 0, 0, 3)), 195: array('i', (3, 0, 0, 3)), 
-   196: array('i', (0, 2, 0, 3)), 197: array('i', (2, 2, 0, 3)), 198: array('i', (1, 2, 0, 3)), 199: array('i', (3, 2, 0, 3)), 
-   200: array('i', (0, 1, 0, 3)), 201: array('i', (2, 1, 0, 3)), 202: array('i', (1, 1, 0, 3)), 203: array('i', (3, 1, 0, 3)), 
-   204: array('i', (0, 3, 0, 3)), 205: array('i', (2, 3, 0, 3)), 206: array('i', (1, 3, 0, 3)), 207: array('i', (3, 3, 0, 3)), 
-   208: array('i', (0, 0, 2, 3)), 209: array('i', (2, 0, 2, 3)), 210: array('i', (1, 0, 2, 3)), 211: array('i', (3, 0, 2, 3)), 
-   212: array('i', (0, 2, 2, 3)), 213: array('i', (2, 2, 2, 3)), 214: array('i', (1, 2, 2, 3)), 215: array('i', (3, 2, 2, 3)), 
-   216: array('i', (0, 1, 2, 3)), 217: array('i', (2, 1, 2, 3)), 218: array('i', (1, 1, 2, 3)), 219: array('i', (3, 1, 2, 3)), 
-   220: array('i', (0, 3, 2, 3)), 221: array('i', (2, 3, 2, 3)), 222: array('i', (1, 3, 2, 3)), 223: array('i', (3, 3, 2, 3)), 
-   224: array('i', (0, 0, 1, 3)), 225: array('i', (2, 0, 1, 3)), 226: array('i', (1, 0, 1, 3)), 227: array('i', (3, 0, 1, 3)), 
-   228: array('i', (0, 2, 1, 3)), 229: array('i', (2, 2, 1, 3)), 230: array('i', (1, 2, 1, 3)), 231: array('i', (3, 2, 1, 3)), 
-   232: array('i', (0, 1, 1, 3)), 233: array('i', (2, 1, 1, 3)), 234: array('i', (1, 1, 1, 3)), 235: array('i', (3, 1, 1, 3)), 
-   236: array('i', (0, 3, 1, 3)), 237: array('i', (2, 3, 1, 3)), 238: array('i', (1, 3, 1, 3)), 239: array('i', (3, 3, 1, 3)), 
-   240: array('i', (0, 0, 3, 3)), 241: array('i', (2, 0, 3, 3)), 242: array('i', (1, 0, 3, 3)), 243: array('i', (3, 0, 3, 3)), 
-   244: array('i', (0, 2, 3, 3)), 245: array('i', (2, 2, 3, 3)), 246: array('i', (1, 2, 3, 3)), 247: array('i', (3, 2, 3, 3)), 
-   248: array('i', (0, 1, 3, 3)), 249: array('i', (2, 1, 3, 3)), 250: array('i', (1, 1, 3, 3)), 251: array('i', (3, 1, 3, 3)), 
-   252: array('i', (0, 3, 3, 3)), 253: array('i', (2, 3, 3, 3)), 254: array('i', (1, 3, 3, 3)), 255: array('i', (3, 3, 3, 3)), 
-   }
-
-GCODE_TO_INT = dict([(tuple(v),k) for (k,v) in INT_TO_GCODE.items()])
-
-### Exceptions
-class DuplicateMarkerInMapFile(Exception): pass
-class MapLineTooShort(Exception): pass
-class ThirdAllele(Exception): pass
-class PedError(Exception): pass
-class BadMagic(Exception):
-    """ Raised when one of the MAGIC bytes in a bed file does not match
-    """
-    pass
-class BedError(Exception):
-    """ Raised when parsing a bed file runs into problems
-    """
-    pass
-class UnknownGenocode(Exception):
-    """ Raised when we get a 2-bit genotype that is undecipherable (is it possible?)
-    """
-    pass
-class UnknownGeno(Exception): pass
-
-### Utility functions
-
-def timenow():
-    """return current time as a string
-    """
-    return time.strftime('%d/%m/%Y %H:%M:%S', time.localtime(time.time()))
-
-def ceiling(n, k):
-    ''' Return the least multiple of k which is greater than n
-    '''
-    m = n % k
-    if m == 0:
-        return n
-    else:
-        return n + k - m
-
-def nbytes(n):
-    ''' Return the number of bytes required for n subjects
-    '''
-    return 2*ceiling(n, 4)/8
-
-### Primary module functionality
-class LPed:
-    """ The uber-class for processing the Linkage-format *.ped/*.map files
-    """
-    def __init__(self,  base):
-        self.base = base
-        self._ped = Ped('%s.ped' % (self.base))
-        self._map = Map('%s.map' % (self.base))
-
-        self._markers = {}
-        self._ordered_markers = []
-        self._marker_allele_lookup = {}
-        self._autosomal_indices = set()
-        
-        self._subjects = {}
-        self._ordered_subjects = []
-        
-        self._genotypes = []
-
-    def parse(self):
-        """
-        """
-        if VERBOSE: print 'plinkbinJZ: Analysis started: %s' % (timenow())
-        self._map.parse()
-        self._markers = self._map._markers
-        self._ordered_markers = self._map._ordered_markers
-        self._autosomal_indices = self._map._autosomal_indices
-        
-        self._ped.parse(self._ordered_markers)
-        self._subjects = self._ped._subjects
-        self._ordered_subjects = self._ped._ordered_subjects
-        self._genotypes = self._ped._genotypes
-        self._marker_allele_lookup = self._ped._marker_allele_lookup
-        
-        ### Adjust self._markers based on the allele information
-        ### we got from parsing the ped file
-        for m,  name in enumerate(self._ordered_markers):
-            a1,  a2 = self._marker_allele_lookup[m][HET]
-            self._markers[name][-2] = a1
-            self._markers[name][-1] = a2
-        if VERBOSE: print 'plinkbinJZ: Analysis finished: %s' % (timenow())
-
-    def getSubjectInfo(self, fid, oiid):
-        """
-        """
-        return self._subject_info[(fid, oiid)]
-
-    def getSubjectInfoByLine(self, line):
-        """
-        """
-        return self._subject_info[self._ordered_subjects[line]]
-    
-    def getGenotypesByIndices(self, s, mlist, format):
-        """ needed for grr if lped - deprecated but..
-        """
-        mlist = dict(zip(mlist,[True,]*len(mlist))) # hash quicker than 'in' ?
-        raw_array = array('i', [row[s] for m,row in enumerate(self._genotypes) if mlist.get(m,None)])            
-        if format == 'raw':
-            return raw_array
-        elif format == 'ref':
-            result = array('i', [0]*len(mlist))
-            for m, gcode in enumerate(raw_array):
-                if gcode == HOM0:
-                    nref = 3
-                elif gcode == HET:
-                    nref = 2
-                elif gcode == HOM1:
-                    nref = 1
-                else:
-                    nref = 0
-                result[m] = nref
-            return result
-        else:
-            result = []
-            for m, gcode in enumerate(raw_array):
-                result.append(self._marker_allele_lookup[m][gcode])
-            return result
-    
-    def writebed(self, base):
-        """
-        """
-        dst_name = '%s.fam' % (base)        
-        print 'Writing pedigree information to [ %s ]' % (dst_name)
-        dst = open(dst_name, 'w')
-        for skey in self._ordered_subjects:            
-            (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid) = self._subjects[skey]
-            dst.write('%s %s %s %s %s %s\n' % (fid, iid, did, mid, sex, phe))
-        dst.close()
-
-        dst_name = '%s.bim' % (base)        
-        print 'Writing map (extended format) information to [ %s ]' % (dst_name)
-        dst = open(dst_name, 'w')        
-        for m, marker in enumerate(self._ordered_markers):
-            chrom, name, genpos, abspos,  a1,  a2 = self._markers[marker]
-            dst.write('%s\t%s\t%s\t%s\t%s\t%s\n' % (chrom, name, genpos, abspos, a1, a2))
-        dst.close()
-
-        bed_name = '%s.bed' % (base)        
-        print 'Writing genotype bitfile to [ %s ]' % (bed_name)
-        print 'Using (default) SNP-major mode'
-        bed = open(bed_name, 'w')
-
-        ### Write the 3 header bytes
-        bed.write(struct.pack('B', int(''.join(reversed(MAGIC_BYTE1)), 2)))
-        bed.write(struct.pack('B', int(''.join(reversed(MAGIC_BYTE2)), 2)))
-        bed.write(struct.pack('B', int(''.join(reversed(FORMAT_SNP_MAJOR_BYTE)), 2)))
-        
-        ### Calculate how many "pad bits" we should add after the last subject
-        nsubjects = len(self._ordered_subjects)
-        nmarkers = len(self._ordered_markers)
-        total_bytes = nbytes(nsubjects)
-        nbits = nsubjects  * 2
-        pad_nibbles = ((total_bytes * 8) - nbits)/2
-        pad = array('i', [0]*pad_nibbles)
-
-        ### And now write genotypes to the file
-        for m in xrange(nmarkers):
-            geno = self._genotypes[m]
-            geno.extend(pad)
-            bytes = len(geno)/4
-            for b in range(bytes):
-                idx = b*4
-                gcode = tuple(geno[idx:idx+4])
-                try:
-                    byte = struct.pack('B', GCODE_TO_INT[gcode])
-                except KeyError:
-                    print m, b, gcode
-                    raise
-                bed.write(byte)
-        bed.close()
-        
-    def autosomal_indices(self):
-        """ Return the indices of markers in this ped/map that are autosomal.
-            This is used by rgGRR so that it can select a random set of markers
-            from the autosomes (sex chroms screw up the plot)
-        """
-        return self._autosomal_indices
-
-class Ped:
-    def __init__(self, path):
-        self.path = path
-        self._subjects = {}
-        self._ordered_subjects = []
-        self._genotypes = []
-        self._marker_allele_lookup = {}
-        
-    def lineCount(self,infile):
-        """ count the number of lines in a file - efficiently using wget
-        """
-        return int(commands.getoutput('wc -l %s' % (infile)).split()[0])    
-         
-
-    def parse(self,  markers):
-        """ Parse a given file -- this needs to be memory-efficient so that large
-            files can be parsed (~1 million markers on ~5000 subjects?).  It
-            should also be fast, if possible.
-        """
-                
-        ### Find out how many lines are in the file so we can ...
-        nsubjects = self.lineCount(self.path)
-        ### ... Pre-allocate the genotype arrays
-        nmarkers = len(markers)
-        _marker_alleles = [['0', '0'] for _ in xrange(nmarkers)]
-        self._genotypes = [array('i', [-1]*nsubjects) for _ in xrange(nmarkers)]
-
-        if self.path.endswith('.gz'):
-            pfile = gzip.open(self.path, 'r')
-        else:
-            pfile = open(self.path, 'r')
-
-        for s, line in enumerate(pfile):
-            line = line.strip()
-            if not line:
-                continue
-            
-            fid, iid, did, mid, sex, phe, genos = line.split(None, 6)
-            sid = iid.split('.')[0]
-            d_sid = did.split('.')[0]
-            m_sid = mid.split('.')[0]
-            
-            skey = (fid, iid)            
-            self._subjects[skey] = (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid)
-            self._ordered_subjects.append(skey)
-
-            genotypes = genos.split()
-            
-            for m, marker in enumerate(markers):
-                idx = m*2
-                a1, a2 = genotypes[idx:idx+2] # Alleles for subject s, marker m
-                s1, s2 = seen = _marker_alleles[m] # Alleles seen for marker m
-                
-                ### FIXME: I think this can still be faster, and simpler to read
-                # Two pieces of logic intertwined here:  first, we need to code
-                # this genotype as HOM0, HOM1, HET or MISS.  Second, we need to
-                # keep an ongoing record of the genotypes seen for this marker
-                if a1 == a2:
-                    if a1 in MISSING_ALLELES:
-                        geno = MISS_GENO
-                    else:
-                        if s1 == '0':
-                            seen[0] = a1
-                        elif s1 == a1 or s2 == a2:
-                            pass
-                        elif s2 == '0':
-                            seen[1] = a1
-                        else:
-                            raise ThirdAllele('a1=a2=%s, seen=%s?' % (a1, str(seen)))
-                    
-                        if a1 == seen[0]:
-                            geno = HOM0_GENO
-                        elif a1 == seen[1]:
-                            geno = HOM1_GENO
-                        else:
-                            raise PedError('Cannot assign geno for a1=a2=%s from seen=%s' % (a1, str(seen)))
-                elif a1 in MISSING_ALLELES or a2 in MISSING_ALLELES:
-                    geno = MISS_GENO
-                else:
-                    geno = HET_GENO
-                    if s1 == '0':
-                        seen[0] = a1
-                        seen[1] = a2
-                    elif s2 == '0':
-                        if s1 == a1:
-                            seen[1] = a2
-                        elif s1 == a2:
-                            seen[1] = a1
-                        else:
-                            raise ThirdAllele('a1=%s, a2=%s, seen=%s?' % (a1, a2, str(seen)))
-                    else:
-                        if sorted(seen) != sorted((a1, a2)):
-                            raise ThirdAllele('a1=%s, a2=%s, seen=%s?' % (a1, a2, str(seen)))
-                                        
-                gcode = GENO_TO_GCODE.get(geno, None)
-                if gcode is None:
-                    raise UnknownGeno(str(geno))
-                self._genotypes[m][s] = gcode
-
-        # Build the _marker_allele_lookup table
-        for m,  alleles in enumerate(_marker_alleles):                
-            if len(alleles) == 2:
-                a1,  a2 = alleles
-            elif len(alleles) == 1:
-                a1 = alleles[0]
-                a2 = '0'
-            else:
-                print 'All alleles blank for %s: %s' % (m,  str(alleles))
-                raise
-
-            self._marker_allele_lookup[m] = {
-                HOM0: (a2, a2),
-                HOM1: (a1, a1),
-                HET : (a1, a2),
-                MISS: ('0','0'),
-                }
-
-        if VERBOSE: print '%s(%s) individuals read from [ %s ]' % (len(self._subjects),  nsubjects,  self.path)
-        
-class Map:
-    def __init__(self, path=None):
-        self.path = path
-        self._markers = {}
-        self._ordered_markers = []
-        self._autosomal_indices = set()
-
-    def __len__(self):
-        return len(self._markers)
-
-    def parse(self):
-        """ Parse a Linkage-format map file
-        """
-        if self.path.endswith('.gz'):
-            fh = gzip.open(self.path, 'r')
-        else:
-            fh = open(self.path, 'r')
-            
-        for i, line in enumerate(fh):
-            line = line.strip()
-            if not line:
-                continue
-
-            fields = line.split()
-            if len(fields) < 4:
-                raise MapLineTooShort(MAP_LINE_EXCEPTION_TEXT % (str(line),  len(fields)))
-            else:
-                chrom, name, genpos, abspos = fields
-            if name in self._markers:
-                raise DuplicateMarkerInMapFile('Marker %s was found twice in map file %s' % (name, self.path))
-            abspos = int(abspos)
-            if abspos < 0:
-                continue
-            if chrom in AUTOSOMES:
-                self._autosomal_indices.add(i)
-            chrom = CHROM_REPLACE.get(chrom, chrom)
-            self._markers[name] = [chrom, name, genpos, abspos,  None,  None]
-            self._ordered_markers.append(name)
-        fh.close()
-        if VERBOSE: print '%s (of %s) markers to be included from [ %s ]' % (len(self._ordered_markers),  i,  self.path)
-
-class BPed:
-    """ The uber-class for processing Plink's Binary Ped file format *.bed/*.bim/*.fam
-    """
-    def __init__(self,  base):
-        self.base = base
-        self._bed = Bed('%s.bed' % (self.base))
-        self._bim = Bim('%s.bim' % (self.base))
-        self._fam = Fam('%s.fam' % (self.base))
-
-        self._markers = {}
-        self._ordered_markers = []
-        self._marker_allele_lookup = {}
-        self._autosomal_indices = set()
-        
-        self._subjects = {}
-        self._ordered_subjects = []
-        
-        self._genotypes = []
-        
-    def parse(self,  quick=False):
-        """
-        """
-        self._quick = quick
-        
-        self._bim.parse()
-        self._markers = self._bim._markers
-        self._ordered_markers = self._bim._ordered_markers
-        self._marker_allele_lookup = self._bim._marker_allele_lookup
-        self._autosomal_indices = self._bim._autosomal_indices
-        
-        self._fam.parse()
-        self._subjects = self._fam._subjects
-        self._ordered_subjects = self._fam._ordered_subjects
-
-        self._bed.parse(self._ordered_subjects,  self._ordered_markers,  quick=quick)
-        self._bedf = self._bed._fh
-        self._genotypes = self._bed._genotypes
-        self.nsubjects = len(self._ordered_subjects)
-        self.nmarkers = len(self._ordered_markers)
-        self._bytes_per_marker = nbytes(self.nsubjects)
-
-    def writeped(self, path=None):
-        """
-        """
-        path = self.path = path or self.path
-        
-        map_name = self.path.replace('.bed', '.map')
-        print 'Writing map file [ %s ]' % (map_name)
-        dst = open(map_name, 'w')
-        for m in self._ordered_markers:
-            chrom, snp, genpos, abspos, a1, a2 = self._markers[m]
-            dst.write('%s\t%s\t%s\t%s\n' % (chrom, snp, genpos, abspos))
-        dst.close()
-
-        ped_name = self.path.replace('.bed', '.ped')
-        print 'Writing ped file [ %s ]' % (ped_name)
-        ped = open(ped_name, 'w')
-        firstyikes = False
-        for s, skey in enumerate(self._ordered_subjects):
-            idx = s*2
-            (fid, iid, did, mid, sex, phe, oiid, odid, omid) = self._subjects[skey]
-            ped.write('%s %s %s %s %s %s' % (fid, iid, odid, omid, sex, phe))
-            genotypes_for_subject = self.getGenotypesForSubject(s)
-            for m, snp in enumerate(self._ordered_markers):
-                #a1, a2 = self.getGenotypeByIndices(s, m)
-                a1,a2 = genotypes_for_subject[m]
-                ped.write(' %s %s' % (a1, a2))
-            ped.write('\n')
-        ped.close()
-
-    def getGenotype(self, subject, marker):
-        """ Retrieve a genotype for a particular subject/marker pair
-        """
-        m = self._ordered_markers.index(marker)
-        s = self._ordered_subjects.index(subject)
-        return self.getGenotypeByIndices(s, m)
-
-    def getGenotypesForSubject(self, s, raw=False):
-        """ Returns list of genotypes for all m markers
-            for subject s.  If raw==True, then an array
-            of raw integer gcodes is returned instead
-        """
-        if self._quick:
-            nmarkers = len(self._markers)
-            raw_array = array('i', [0]*nmarkers)
-            seek_nibble = s % 4
-            for m in xrange(nmarkers):
-                seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH
-                self._bedf.seek(seek_byte)
-                geno = struct.unpack('B', self._bedf.read(1))[0]
-                quartet = INT_TO_GCODE[geno]
-                gcode = quartet[seek_nibble]
-                raw_array[m] = gcode
-        else:
-            raw_array = array('i', [row[s] for row in self._genotypes])
-            
-        if raw:
-            return raw_array
-        else:
-            result = []
-            for m, gcode in enumerate(raw_array):
-                result.append(self._marker_allele_lookup[m][gcode])
-            return result
-        
-    def getGenotypeByIndices(self, s, m):
-        """
-        """
-        if self._quick:
-            # Determine which byte we need to seek to, and
-            # which nibble within the byte we need
-            seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH
-            seek_nibble = s % 4
-            self._bedf.seek(seek_byte)
-            geno = struct.unpack('B', self._bedf.read(1))[0]
-            quartet = INT_TO_GCODE[geno]
-            gcode = quartet[seek_nibble]
-        else:
-            # Otherwise, just grab the genotypes from the
-            # list of arrays
-            genos_for_marker = self._genotypes[m]
-            gcode = genos_for_marker[s]
-
-        return self._marker_allele_lookup[m][gcode]
-
-    def getGenotypesByIndices(self, s, mlist, format):
-        """
-        """
-        if self._quick:
-            raw_array = array('i', [0]*len(mlist))
-            seek_nibble = s % 4
-            for i,m in enumerate(mlist):
-                seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH
-                self._bedf.seek(seek_byte)
-                geno = struct.unpack('B', self._bedf.read(1))[0]
-                quartet = INT_TO_GCODE[geno]
-                gcode = quartet[seek_nibble]
-                raw_array[i] = gcode
-            mlist = set(mlist)
-        else:
-            mlist = set(mlist)
-            raw_array = array('i', [row[s] for m,row in enumerate(self._genotypes) if m in mlist])
-            
-        if format == 'raw':
-            return raw_array
-        elif format == 'ref':
-            result = array('i', [0]*len(mlist))
-            for m, gcode in enumerate(raw_array):
-                if gcode == HOM0:
-                    nref = 3
-                elif gcode == HET:
-                    nref = 2
-                elif gcode == HOM1:
-                    nref = 1
-                else:
-                    nref = 0
-                result[m] = nref
-            return result
-        else:
-            result = []
-            for m, gcode in enumerate(raw_array):
-                result.append(self._marker_allele_lookup[m][gcode])
-            return result
-    
-    def getSubject(self, s):
-        """
-        """
-        skey = self._ordered_subjects[s]
-        return self._subjects[skey]
-    
-    def autosomal_indices(self):
-        """ Return the indices of markers in this ped/map that are autosomal.
-            This is used by rgGRR so that it can select a random set of markers
-            from the autosomes (sex chroms screw up the plot)
-        """
-        return self._autosomal_indices
-
-class Bed:
-
-    def __init__(self, path):
-        self.path = path
-        self._genotypes = []
-        self._fh = None
-
-    def parse(self, subjects,  markers,  quick=False):
-        """ Parse the bed file, indicated either by the path parameter,
-            or as the self.path indicated in __init__.  If quick is
-            True, then just parse the bim and fam, then genotypes will
-            be looked up dynamically by indices
-        """
-        self._quick = quick
-        
-        ordered_markers = markers
-        ordered_subjects = subjects
-        nsubjects = len(ordered_subjects)
-        nmarkers = len(ordered_markers)
-        
-        bed = open(self.path, 'rb')
-        self._fh = bed
-
-        byte1 = bed.read(1)
-        byte2 = bed.read(1)
-        byte3 = bed.read(1)
-        format_flag = struct.unpack('B', byte3)[0]
-
-        h1 = tuple(INT_TO_GCODE[struct.unpack('B', byte1)[0]])
-        h2 = tuple(INT_TO_GCODE[struct.unpack('B', byte2)[0]])
-        h3 = tuple(INT_TO_GCODE[format_flag])
-
-        if h1 != MAGIC1 or h2 != MAGIC2:
-            raise BadMagic('One or both MAGIC bytes is wrong: %s==%s or %s==%s' % (h1, MAGIC1, h2, MAGIC2))
-        if format_flag:
-            print 'Detected that binary PED file is v1.00 SNP-major mode (%s, "%s")\n' % (format_flag, h3)
-        else:
-            raise 'BAD_FORMAT_FLAG? (%s, "%s")\n' % (format_flag, h3)
-
-        print 'Parsing binary ped file for %s markers and %s subjects' % (nmarkers, nsubjects)
-
-        ### If quick mode was specified, we're done ...
-        self._quick = quick
-        if quick:
-            return
-            
-        ### ... Otherwise, parse genotypes into an array, and append that
-        ### array to self._genotypes
-        ngcodes = ceiling(nsubjects, 4)
-        bytes_per_marker = nbytes(nsubjects)
-        for m in xrange(nmarkers):
-            genotype_array = array('i', [-1]*(ngcodes))
-            for byte in xrange(bytes_per_marker):
-                intval = struct.unpack('B', bed.read(1))[0]
-                idx = byte*4
-                genotype_array[idx:idx+4] = INT_TO_GCODE[intval]
-            self._genotypes.append(genotype_array)
-        
-class Bim:
-    def __init__(self, path):
-        """
-        """
-        self.path = path
-        self._markers = {}
-        self._ordered_markers = []
-        self._marker_allele_lookup = {}
-        self._autosomal_indices = set()
-
-    def parse(self):
-        """
-        """
-        print 'Reading map (extended format) from [ %s ]' % (self.path)
-        bim = open(self.path, 'r')
-        for m, line in enumerate(bim):
-            chrom, snp, gpos, apos, a1, a2 = line.strip().split()
-            self._markers[snp] = (chrom, snp, gpos, apos, a1, a2)
-            self._marker_allele_lookup[m] = {
-                HOM0: (a2, a2),
-                HOM1: (a1, a1),
-                HET : (a1, a2),
-                MISS: ('0','0'),
-                }
-            self._ordered_markers.append(snp)
-            if chrom in AUTOSOMES:
-                self._autosomal_indices.add(m)
-        bim.close()
-        print '%s markers to be included from [ %s ]' % (m+1, self.path)
-
-class Fam:
-    def __init__(self, path):
-        """
-        """
-        self.path = path
-        self._subjects = {}
-        self._ordered_subjects = []
-
-    def parse(self):
-        """
-        """
-        print 'Reading pedigree information from [ %s ]' % (self.path)
-        fam = open(self.path, 'r')
-        for s, line in enumerate(fam):
-            fid, iid, did, mid, sex, phe = line.strip().split()
-            sid = iid.split('.')[0]
-            d_sid = did.split('.')[0]
-            m_sid = mid.split('.')[0]
-            skey = (fid, iid)
-            self._ordered_subjects.append(skey)
-            self._subjects[skey] = (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid)
-        fam.close()
-        print '%s individuals read from [ %s ]' % (s+1, self.path)
-
-### Command-line functionality and testing
-def test(arg):
-    '''
-    '''
-    
-    import time
-
-    if arg == 'CAMP_AFFY.ped':
-        print 'Testing bed.parse(quick=True)'
-        s = time.time()
-        bed = Bed(arg.replace('.ped', '.bed'))
-        bed.parse(quick=True)
-        print bed.getGenotype(('400118', '10300283'), 'rs2000467')
-        print bed.getGenotype(('400118', '10101384'), 'rs2294019')
-        print bed.getGenotype(('400121', '10101149'), 'rs2294019')        
-        print bed.getGenotype(('400123', '10200290'), 'rs2294019')        
-        assert bed.getGenotype(('400118', '10101384'), 'rs2294019') == ('4','4')
-        e = time.time()
-        print 'e-s = %s\n' % (e-s)
-    
-    print 'Testing bed.parse'
-    s = time.time()
-    bed = BPed(arg)
-    bed.parse(quick=False)
-    e = time.time()
-    print 'e-s = %s\n' % (e-s)
-
-    print 'Testing bed.writeped'
-    s = time.time()
-    outname = '%s_BEDTEST' % (arg)
-    bed.writeped(outname)
-    e = time.time()
-    print 'e-s = %s\n' % (e-s)
-    del(bed)
-
-    print 'Testing ped.parse'
-    s = time.time()
-    ped = LPed(arg)
-    ped.parse()
-    e = time.time()
-    print 'e-s = %s\n' % (e-s)
-
-    print 'Testing ped.writebed'
-    s = time.time()
-    outname = '%s_PEDTEST' % (arg)
-    ped.writebed(outname)
-    e = time.time()
-    print 'e-s = %s\n' % (e-s)
-    del(ped)
-    
-def profile_bed(arg):
-    """
-    """
-    bed = BPed(arg)
-    bed.parse(quick=False)
-    outname = '%s_BEDPROFILE' % (arg)
-    bed.writeped(outname)
-
-def profile_ped(arg):
-    """
-    """
-    ped = LPed(arg)
-    ped.parse()
-    outname = '%s_PEDPROFILE' % (arg)
-    ped.writebed(outname)
-
-if __name__ == '__main__':
-    """ Run as a command-line, this script should get one or more arguments,
-        each one a ped file to be parsed with the PedParser (unit tests?)
-    """
-    op = optparse.OptionParser()
-    op.add_option('--profile-bed', action='store_true', default=False)
-    op.add_option('--profile-ped', action='store_true', default=False)
-    opts, args = op.parse_args()
-    
-    if opts.profile_bed:
-        import profile
-        import pstats
-        profile.run('profile_bed(args[0])', 'fooprof')
-        p = pstats.Stats('fooprof')
-        p.sort_stats('cumulative').print_stats(10)
-    elif opts.profile_ped:
-        import profile
-        import pstats
-        profile.run('profile_ped(args[0])', 'fooprof')
-        p = pstats.Stats('fooprof')
-        p.sort_stats('cumulative').print_stats(10)
-    else:
-        for arg in args:
-            test(arg)
-    
-    ### Code used to generate the INT_TO_GCODE dictionary
-    #print '{\n  ',
-    #for i in range(256):
-    #   b = INT2BIN[i]
-    #    ints = []
-    #    s = str(i).rjust(3)
-    #    #print b
-    #    for j in range(4):
-    #        idx = j*2
-    #        #print i, j, idx, b[idx:idx+2], int(b[idx:idx+2], 2)
-    #        ints.append(int(b[idx:idx+2], 2))
-    #    print '%s: array(\'i\', %s),' % (s,tuple(ints)),
-    #    if i > 0 and (i+1) % 4 == 0:
-    #        print '\n  ',
-    #print '}'
-
-

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgCaCo.py
--- a/tools/rgenetics/rgCaCo.py
+++ /dev/null
@@ -1,271 +0,0 @@
-#!/usr/local/bin/python
-# hack to run and process a plink case control association
-# expects args as  
-# bfilepath outname jobname outformat (wig,xls)
-# ross lazarus 
-# for wig files, we need annotation so look for map file or complain
-"""
-Parameters for wiggle track definition lines
-All options are placed in a single line separated by spaces:
-
-  track type=wiggle_0 name=track_label description=center_label \
-        visibility=display_mode color=r,g,b altColor=r,g,b \
-        priority=priority autoScale=on|off \
-        gridDefault=on|off maxHeightPixels=max:default:min \
-        graphType=bar|points viewLimits=lower:upper \
-        yLineMark=real-value yLineOnOff=on|off \
-        windowingFunction=maximum|mean|minimum smoothingWindow=off|2-16
-"""
-
-import sys,math,shutil,subprocess,os,time,tempfile,string
-from os.path import abspath
-from rgutils import timenow, plinke
-imagedir = '/static/rg' # if needed for images
-myversion = 'V000.1 April 2007'
-verbose = False
-
-def makeGFF(resf='',outfname='',logf=None,twd='.',name='track name',description='track description',topn=1000):
-    """
-    score must be scaled to 0-1000
-    
-    Want to make some wig tracks from each analysis
-    Best n -log10(p). Make top hit the window.
-    we use our tab output which has
-    rs	chrom	offset	ADD_stat	ADD_p	ADD_log10p
-    rs3094315	1	792429	1.151	0.2528	0.597223
-
-    """
-
-    def is_number(s):
-        try:
-            float(s)
-            return True
-        except ValueError:
-            return False
-    header = 'track name=%s description="%s" visibility=2 useScore=1 color=0,60,120\n' % (name,description)          
-    column_names = [ 'Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group' ]
-    halfwidth=100
-    resfpath = os.path.join(twd,resf)
-    resf = open(resfpath,'r')
-    resfl = resf.readlines() # dumb but convenient for millions of rows
-    resfl = [x.split() for x in resfl]
-    headl = resfl[0]
-    resfl = resfl[1:]
-    headl = [x.strip().upper() for x in headl]
-    headIndex = dict(zip(headl,range(0,len(headl))))
-    whatwewant = ['CHR','RS','OFFSET','LOG10ARMITAGEP']
-    wewant = [headIndex.get(x,None) for x in whatwewant]
-    if None in wewant: # missing something
-       logf.write('### Error missing a required header from %s in makeGFF - headIndex=%s\n' % (whatwewant,headIndex))
-       return
-    ppos = wewant[3] # last in list
-    resfl = [x for x in resfl if x[ppos] > '' and x[ppos] <> 'NA']
-    resfl = [(float(x[ppos]),x) for x in resfl] # decorate
-    resfl.sort()
-    resfl.reverse() # using -log10 so larger is better
-    pvals = [x[0] for x in resfl] # need to scale
-    resfl = [x[1] for x in resfl] # drop decoration  
-    resfl = resfl[:topn] # truncate
-    maxp = max(pvals) # need to scale
-    minp = min(pvals)
-    prange = abs(maxp-minp) + 0.5 # fudge
-    scalefact = 1000.0/prange
-    logf.write('###maxp=%f,minp=%f,prange=%f,scalefact=%f\n' % (maxp,minp,prange,scalefact))
-    for i,row in enumerate(resfl):
-        row[ppos] = '%d' % (int(scalefact*pvals[i])) 
-        resfl[i] = row # replace
-    outf = file(outfname,'w')
-    outf.write(header)
-    outres = [] # need to resort into chrom offset order
-    for i,lrow in enumerate(resfl):
-        chrom,snp,offset,p, = [lrow[x] for x in wewant]
-        gff = ('chr%s' % chrom,'rgCaCo','variation','%d' % (int(offset)-halfwidth),
-               '%d' % (int(offset)+halfwidth),p,'.','.','%s logp=%1.2f' % (snp,pvals[i]))
-        outres.append(gff)
-    outres = [(x[0],int(x[3]),x) for x in outres] # decorate
-    outres.sort() # into chrom offset
-    outres=[x[2] for x in outres] # undecorate
-    outres = ['\t'.join(x) for x in outres]    
-    outf.write('\n'.join(outres))
-    outf.write('\n')
-    outf.close()
-
-
-def plink_assocToGG(plinkout="hm",tag='test'):
-   """ plink --assoc output looks like this
-   #  CHR         SNP   A1      F_A      F_U   A2        CHISQ            P           OR 
-   #   1   rs3094315    G   0.6685   0.1364    A        104.1    1.929e-24        12.77 
-   # write as a genegraph input file
-   """
-   inf = file('%s.assoc' % plinkout,'r')
-   outf = file('%sassoc.xls' % plinkout,'w')
-   res = ['rs\tlog10p%s\tFakeInvOR%s\tRealOR%s' % (tag,tag,tag),] # output header for ucsc genome graphs
-   head = inf.next()
-   for l in inf:
-    ll = l.split()
-    if len(ll) >= 8:
-      p = float(ll[7])
-      if p <> 'NA': # eesh
-          logp = '%9.9f' % -math.log10(p)
-      else:
-          logp = 'NA'
-      try:
-         orat = ll[8]
-      except:
-         orat = 'NA'
-      orat2 = orat
-      # invert large negative odds ratios
-      if float(orat) < 1 and float(orat) > 0.0:
-         orat2 = '%9.9f' % (1.0/float(orat))
-      outl = [ll[1],logp, orat2, orat]
-      res.append('\t'.join(outl))
-   outf.write('\n'.join(res))
-   outf.write('\n')
-   outf.close()
-   inf.close()
-
-def xformModel(infname='',resf='',outfname='',
-               name='foo',mapf='/usr/local/galaxy/data/rg/ped/x.bim',flog=None):
-    """munge a plink .model file into either a ucsc track or an xls file
-    rerla@meme ~/plink]$ head hmYRI_CEU.model 
-    CHR         SNP     TEST            AFF          UNAFF        CHISQ   DF            P
-    1   rs3094315     GENO       41/37/11        0/24/64           NA   NA           NA
-    1   rs3094315    TREND         119/59         24/152        81.05    1    2.201e-19
-    1   rs3094315  ALLELIC         119/59         24/152        104.1    1    1.929e-24
-    1   rs3094315      DOM          78/11          24/64           NA   NA           NA
-
-    bim file has
-[rerla@beast pbed]$ head plink_wgas1_example.bim
-1	rs3094315	0.792429	792429	G	A
-1	rs6672353	0.817376	817376	A	G
-    """
-    if verbose:
-        print 'Rgenetics rgCaCo.xformModel got resf=%s,  outfname=%s' % (resf,outfname)
-    res = []
-    rsdict = {}       
-    map = file(mapf,'r')
-    for l in map: # plink map 
-        ll = l.strip().split()
-        if len(ll) >= 3:
-            rs=ll[1].strip()
-            chrom = ll[0]
-            if chrom.lower() == 'x':
-                chrom='23'
-            elif chrom.lower() == 'y':
-                chrom = 24
-            elif chrom.lower() == 'mito':
-                chrom = 25
-            offset = ll[3]
-            rsdict[rs] = (chrom,offset)
-    res.append('rs\tChr\tOffset\tGenop\tlog10Genop\tArmitagep\tlog10Armitagep\tAllelep\tlog10Allelep\tDomp\tlog10Domp') 
-    f = open(resf,'r')
-    headl = f.readline()
-    if headl.find('\t') <> -1:
-       headl = headl.split('\t')
-       delim = '\t'
-    else:
-       headl = headl.split()
-       delim = None
-    whatwewant = ['CHR','SNP','TEST','AFF','UNAFF','CHISQ','P']
-    wewant = [headl.index(x) for x in whatwewant]
-    llen = len(headl)
-    lnum = anum = 0
-    lastsnp = None # so we know when to write out a gg line
-    outl = {}
-    f.seek(0)
-    for lnum,l in enumerate(f):
-        if lnum == 0:
-            continue
-        ll = l.split()
-        if delim:
-           ll = l.split(delim)
-        if len(ll) >= llen: # valid line
-            chr,snp,test,naff,nuaff,chi,p = [ll[x] for x in wewant]
-            snp = snp.strip()
-            chrom,offset = rsdict.get(snp,(None,None))
-            anum += 1
-            fp = 1.0 # if NA
-            lp = 0.0
-            try:
-                fp = float(p)
-                if fp > 0:
-                  lp = -math.log10(fp)
-                else:
-                    fp = 9e-100
-                    flog.write('### WARNING - Plink calculated %s for %s p value!!! 9e-100 substituted!\n' % (p,test))
-                    flog.write('### offending line #%d in %s = %s' % (lnum,l))
-            except:
-                pass
-            if snp <> lastsnp:
-                if len(outl.keys()) > 3:
-                    sl = [outl.get(x,'?') for x in ('snp','chrom','offset','GENO','TREND','ALLELIC','DOM')]
-                    res.append('\t'.join(sl)) # last snp line
-                outl = {'snp':snp,'chrom':chrom,'offset':offset} # first 3 cols for gg line
-                lastsnp = snp # reset for next marker
-            #if p == 'NA':
-            #      p = 1.0 
-            # let's pass downstream for handling R is fine?
-            outl[test] = '%s\t%f' % (p,lp)
-    if len(outl.keys()) > 3:
-        l = [outl.get(x,'?') for x in ('snp','chrom','offset','GENO','TREND','ALLELIC','DOM')]
-        res.append('\t'.join(l)) # last snp line
-    f = file(outfname,'w')
-    res.append('')
-    f.write('\n'.join(res))
-    f.close()
-
-
-
-                
-if __name__ == "__main__":
-    """
-    # called as 
-    <command interpreter="python">
-        rgCaCo.py '$i.extra_files_path/$i.metadata.base_name' "$name" 
-        '$out_file1' '$logf' '$logf.files_path' '$gffout'
-    </command></command>
-    """
-    if len(sys.argv) < 7:
-       s = 'rgCaCo.py needs 6 params - got %s \n' % (sys.argv)
-       print >> sys.stdout, s
-       sys.exit(0)
-    bfname = sys.argv[1]
-    name = sys.argv[2]
-    killme = string.punctuation + string.whitespace
-    trantab = string.maketrans(killme,'_'*len(killme))
-    name = name.translate(trantab)
-    outfname = sys.argv[3]
-    logf = sys.argv[4]
-    logoutdir = sys.argv[5]
-    gffout = sys.argv[6]
-    topn = 1000
-    try:
-        os.makedirs(logoutdir)
-    except:
-        pass
-    map_file = None
-    me = sys.argv[0]
-    amapf = '%s.bim' % bfname # to decode map in xformModel
-    flog = file(logf,'w')
-    logme = []
-    cdir = os.getcwd()
-    s = 'Rgenetics %s http://rgenetics.org Galaxy Tools, rgCaCo.py started %s\n' % (myversion,timenow())
-    print >> sys.stdout, s # so will appear as blurb for file
-    logme.append(s)
-    if verbose:
-        s = 'rgCaCo.py:  bfname=%s, logf=%s, argv = %s\n' % (bfname, logf, sys.argv) 
-        print >> sys.stdout, s # so will appear as blurb for file
-        logme.append(s)
-    twd = tempfile.mkdtemp(suffix='rgCaCo') # make sure plink doesn't spew log file into the root!
-    tname = os.path.join(twd,name)
-    vcl = [plinke,'--noweb','--bfile',bfname,'--out',name,'--model']
-    p=subprocess.Popen(' '.join(vcl),shell=True,stdout=flog,cwd=twd)
-    retval = p.wait()
-    resf = '%s.model' % tname # plink output is here we hope
-    xformModel(bfname,resf,outfname,name,amapf,flog) # leaves the desired summary file
-    makeGFF(resf=outfname,outfname=gffout,logf=flog,twd=twd,name='rgCaCo_TopTable',description=name,topn=topn)
-    flog.write('\n'.join(logme))
-    flog.close() # close the log used
-    #shutil.copytree(twd,logoutdir)
-    shutil.rmtree(twd) # clean up
-

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgCaCo.xml
--- a/tools/rgenetics/rgCaCo.xml
+++ /dev/null
@@ -1,104 +0,0 @@
-<tool id="rgCaCo1" name="Case Control:">
-    <requirements><requirement type="package">plink</requirement></requirements>
-    <description>for unrelated subjects</description>
-    <command interpreter="python">
-        rgCaCo.py '$i.extra_files_path/$i.metadata.base_name' "$title"  '$out_file1' '$logf' '$logf.files_path' '$gffout'
-    </command>
-    <inputs>
-      <param name="i"  type="data" label="RGenetics genotype data from your current history"
-      format="pbed" />
-       <param name='title' type='text' size="132" value='CaseControl' label="Title for this job"/>
-
-    </inputs>
-
-   <outputs>
-       <data format="tabular" name="out_file1" label="${title}_rgCaCo.xls" />
-       <data format="txt" name="logf" label="${title}_rgCaCo.log"/>
-       <data format="gff" name="gffout" label="${title}_rgCaCoTop.gff" />
-   </outputs>
-<tests>
- <test>
- <param name='i' value='tinywga' ftype='pbed' >
-   <metadata name='base_name' value='tinywga' />
-   <composite_data value='tinywga.bim' />
-   <composite_data value='tinywga.bed' />
-   <composite_data value='tinywga.fam' />
-   <edit_attributes type='name' value='tinywga' /> 
- </param>
- <param name='title' value='rgCaCotest1' />
- <output name='out_file1' file='rgCaCotest1_CaCo.xls' ftype='tabular' compare='diff' />
- <output name='logf' file='rgCaCotest1_CaCo_log.txt' ftype='txt' compare='diff' lines_diff='20' />
- <output name='gffout' file='rgCaCotest1_CaCo_topTable.gff' ftype='gff' compare='diff' />
- </test>
-</tests>
-<help>
-
-.. class:: infomark
-
-**Syntax**
-
-- **Genotype file** is the input case control data chosen from available library Plink binary files
-- **Map file** is the linkage format .map file corresponding to the genotypes in the Genotype file
-- **Type of test** is the kind of test statistic to report such as Armitage trend test or genotype test
-- **Format** determines how your data will be returned to your Galaxy workspace
-
------
-
-**Summary**
-
-This tool will perform some standard statistical tests comparing subjects designated as
-affected (cases) and unaffected subjects (controls). To avoid bias, it is important that
-controls who had been affected would have been eligible for sampling as cases. This may seem
-odd, but it requires that the cases and controls are drawn from the same sampling frame.
-
-The armitage trend test is robust to departure from HWE and so very attractive - after all, a real disease
-mutation may well result in distorted HWE at least in cases. All the others are susceptible to
-bias in the presence of HWE departures.
-
-All of these tests are exquisitely sensitive to non-differential population stratification in cases
-compared to controls and this must be tested before believing any results here. Use the PCA method for
-100k markers or more.
-
-If you don't see the genotype data set you want here, it can be imported using one of the methods available from
-the Galaxy Get Data tool page.
-
-Output format can be UCSC .bed if you want to see your
-results as a fully fledged UCSC track. A map file containing the chromosome and offset for each marker is required for
-writing this kind of output.
-Alternatively you can use .gg for the UCSC Genome Graphs tool which has all of the advantages
-of the the .bed track, plus a neat, visual front end that displays a lot of useful clues.
-Either of these are a very useful way of quickly getting a look
-at your data in full genomic context.
-
-Finally, if you can't live without
-spreadsheet data, choose the .xls tab delimited format. It's not a stupid binary excel file. Just a plain old tab delimited
-one with a header. Fortunately excel is dumb enough to open these without much protest.
-
-
------
-
-.. class:: infomark
-
-**Attribution**
-
-This Galaxy tool relies on Plink (see Plinksrc_) to test Casae Control association models. 
-
-So, we rely on the author (Shaun Purcell) for the documentation you need specific to those settings - they are very nicely documented - see
-DOC_
-
-Tool and Galaxy datatypes originally designed and written for the Rgenetics
-series of whole genome scale statistical genetics tools by ross lazarus (ross.lazarus@gmail.com)
-
-Copyright Ross Lazarus March 2007
-This Galaxy wrapper is released licensed under the LGPL_ but is about as useful as a chocolate teapot without Plink which is GPL.
-
-I'm no lawyer, but it looks like you got GPL if you use this software. Good luck.
-
-.. _Plinksrc: http://pngu.mgh.harvard.edu/~purcell/plink/ 
-
-.. _LGPL: http://www.gnu.org/copyleft/lesser.html
-
-.. _DOC: http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#cc
-
-</help>
-</tool>

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgClean.py
--- a/tools/rgenetics/rgClean.py
+++ /dev/null
@@ -1,160 +0,0 @@
-"""
-# galaxy tool xml files can define a galaxy supplied output filename
-# that must be passed to the tool and used to return output
-# here, the plink log file is copied to that file and removed
-# took a while to figure this out!
-# use exec_before_job to give files sensible names
-#
-# ross april 14 2007
-# plink cleanup script
-# ross lazarus March 2007 for camp illumina whole genome data
-# note problems with multiple commands being ignored - eg --freq --missing --mendel 
-# only the first seems to get done...
-#
-##Summary statistics versus inclusion criteria
-##
-##Feature                         As summary statistic    As inclusion criteria
-##Missingness per individual      --missing               --mind N
-##Missingness per marker          --missing               --geno N        
-##Allele frequency                --freq                  --maf N
-##Hardy-Weinberg equilibrium      --hardy                 --hwe N
-##Mendel error rates              --mendel                --me N M
-#
-# call as plinkClean.py $i $o $mind $geno $hwe $maf $mef $mei $outfile 
-# note plinkClean_code.py does some renaming before the job starts
-
-    
-    <command interpreter="python2.4">
-        rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind' '$geno' '$hwe' '$maf' 
-        '$mef' '$mei' '$out_file1' '$out_file1.files_path' '$userId' 
- 
-  
-"""
-import sys,shutil,os,subprocess, glob, string, tempfile, time
-from rgutils import galhtmlprefix, timenow, plinke
-prog = os.path.split(sys.argv[0])[-1]
-myversion = 'January 4 2010'
-verbose=False
-
-
-def fixoutaff(outpath='',newaff='1'):
-    """ quick way to create test data sets - set all aff to 1 or 2 for
-    some hapmap data and then merge
-    [rerla@beast galaxy]$ head tool-data/rg/library/pbed/affyHM_CEU.fam
-    1341 14 0 0 2 1
-    1341 2 13 14 2 1
-    1341 13 0 0 1 1
-    1340 9 0 0 1 1
-    1340 10 0 0 2 1
-    """
-    nchanged = 0
-    fam = '%s.fam' % outpath
-    famf = open(fam,'r')
-    fl = famf.readlines()
-    famf.close()
-    for i,row in enumerate(fl):
-        lrow = row.split()
-        if lrow[-1] <> newaff:
-            lrow[-1] = newaff
-            fl[i] = ' '.join(lrow)
-            fl[i] += '\n'
-            nchanged += 1
-    fo = open(fam,'w')
-    fo.write(''.join(fl))
-    fo.close()
-    return nchanged
-            
-
-
-def clean():
-    """
-    """
-    if len(sys.argv) < 16:
-        print >> sys.stdout, '## %s expected 12 params in sys.argv, got %d - %s' % (prog,len(sys.argv),sys.argv)
-        print >> sys.stdout, """this script will filter a linkage format ped
-        and map file containing genotypes. It takes 14 parameters - the plink --f parameter and"
-        a new filename root for the output clean data followed by the mind,geno,hwe,maf, mef and mei"
-        documented in the plink docs plus the file to be returned to Galaxy
-        called as:
-        <command interpreter="python">
-        rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind'
-        '$geno' '$hwe' '$maf' '$mef' '$mei' '$out_file1' '$out_file1.files_path'
-        '$relfilter' '$afffilter' '$sexfilter' '$fixaff'
-        </command>
-
-        """
-        sys.exit(1)
-    plog = []
-    inpath = sys.argv[1]
-    inbase = sys.argv[2]
-    killme = string.punctuation + string.whitespace
-    trantab = string.maketrans(killme,'_'*len(killme))
-    title = sys.argv[3].translate(trantab)
-    mind = sys.argv[4]
-    geno = sys.argv[5]
-    hwe = sys.argv[6]
-    maf = sys.argv[7]
-    me1 = sys.argv[8]
-    me2 = sys.argv[9]
-    outfname = sys.argv[10]
-    outfpath = sys.argv[11]
-    relf = sys.argv[12]
-    afff = sys.argv[13]
-    sexf = sys.argv[14]
-    fixaff = sys.argv[15]
-    output = os.path.join(outfpath,outfname)
-    outpath = os.path.join(outfpath,title)
-    outprunepath = os.path.join(outfpath,'ldprune_%s' % title)
-    try:
-      os.makedirs(outfpath)
-    except:
-      pass
-    bfile = os.path.join(inpath,inbase)
-    outf = file(outfname,'w')
-    vcl = [plinke,'--noweb','--bfile',bfile,'--make-bed','--out',
-          outpath,'--set-hh-missing','--mind',mind,
-          '--geno',geno,'--maf',maf,'--hwe',hwe,'--me',me1,me2]
-    # yes - the --me parameter takes 2 values - mendels per snp and per family
-    if relf == 'oo': # plink filters are what they leave...
-        vcl.append('--filter-nonfounders') # leave only offspring
-    elif relf == 'fo':
-        vcl.append('--filter-founders')
-    if afff == 'affonly':
-        vcl.append('--filter-controls')
-    elif relf == 'unaffonly':
-        vcl.append('--filter-cases')
-    if sexf == 'fsex':
-        vcl.append('--filter-females')
-    elif relf == 'msex':
-        vcl.append('--filter-males')        
-    p=subprocess.Popen(' '.join(vcl),shell=True,cwd=outfpath)
-    retval = p.wait()
-    plog.append('%s started, called as %s' % (prog,' '.join(sys.argv)))
-    outf.write(galhtmlprefix % prog)
-    outf.write('<ul>\n')
-    plogf = '%s.log' % os.path.join(outfpath,title)
-    try:
-        plogl = file(plogf,'r').readlines()
-        plog += [x.strip() for x in plogl]
-    except:
-        plog += ['###Cannot open plink log file %s' % plogf,]
-    # if fixaff, want to 'fix' the fam file
-    if fixaff <> '0':
-        nchanged = fixoutaff(outpath=outpath,newaff=fixaff)
-        plog += ['## fixaff was requested  %d subjects affection status changed to %s' % (nchanged,fixaff)] 
-    pf = file(plogf,'w')
-    pf.write('\n'.join(plog))
-    pf.close()
-    globme = os.path.join(outfpath,'*')
-    flist = glob.glob(globme)
-    flist.sort()
-    for i, data in enumerate( flist ):
-        outf.write('<li><a href="%s">%s</a></li>\n' % (os.path.split(data)[-1],os.path.split(data)[-1]))
-    outf.write('</ul>\n')
-    outf.write("</ul></br></div></body></html>")
-    outf.close()
-
-
-if __name__ == "__main__":
-    clean()
-

diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgClean.xml
--- a/tools/rgenetics/rgClean.xml
+++ /dev/null
@@ -1,155 +0,0 @@
-<tool id="rgClean1" name="Clean genotypes:">
-    <requirements><requirement type="package">plink</requirement></requirements>
-    <description>filter markers, subjects</description>
-
-    <command interpreter="python">
-        rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind'
-        '$geno' '$hwe' '$maf' '$mef' '$mei' '$out_file1' '$out_file1.files_path'
-        '$relfilter' '$afffilter' '$sexfilter' '$fixaff'
-    </command>
-
-    <inputs>
-       <param name="input_file"  type="data" label="RGenetics genotype library file in compressed Plink format"
-         size="120" format="pbed" />
-       <param name="title" type="text" size="80" label="Descriptive title for cleaned genotype file" value="Cleaned_data"/>
-       <param name="geno"  type="text" label="Maximum Missing Fraction: Markers" value="0.05" />
-       <param name="mind" type="text" value="0.1" label="Maximum Missing Fraction: Subjects"/>
-       <param name="mef"  type="text" label="Maximum Mendel Error Rate: Family" value="0.05"/>
-       <param name="mei"  type="text" label="Maximum Mendel Error Rate: Marker" value="0.05"/>
-       <param name="hwe" type="text" value="0" label="Smallest HWE p value (set to 0 for all)" />
-       <param name="maf" type="text" value="0.01"
-       label="Smallest Minor Allele Frequency (set to 0 for all)"/>
-       <param name='relfilter' label = "Filter on pedigree relatedness" type="select"
-   	     optional="false" size="132"
-         help="Optionally remove related subjects if pedigree identifies founders and their offspring">
-         <option value="all" selected='true'>No filter on relatedness</option>
-         <option value="fo" >Keep Founders only (pedigree m/f ID = "0")</option>
-         <option value="oo" >Keep Offspring only (one randomly chosen if >1 sibs in family)</option>
-   		</param>
-       <param name='afffilter' label = "Filter on affection status" type="select"
-   	     optional="false" size="132"
-         help="Optionally remove affected or non affected subjects">
-         <option value="allaff" selected='true'>No filter on affection status</option>
-         <option value="affonly" >Keep Controls only (affection='1')</option>
-         <option value="unaffonly" >Keep Cases only (affection='2')</option>
-   		</param>
-       <param name='sexfilter' label = "Filter on gender" type="select"
-   	     optional="false" size="132"
-         help="Optionally remove all male or all female subjects">
-         <option value="allsex" selected='true'>No filter on gender status</option>
-         <option value="msex" >Keep Males only (pedigree gender='1')</option>
-         <option value="fsex" >Keep Females only (pedigree gender='2')</option>
-   		</param>
-       <param name="fixaff" type="text" value="0"
-          label = "Change ALL subjects affection status to (0=no change,1=unaff,2=aff)"
-          help="Use this option to switch the affection status to a new value for all output subjects" />
-   </inputs>
-
-   <outputs>
-       <data format="pbed" name="out_file1" metadata_source="input_file" label="${title}_rgClean.pbed"  />
-   </outputs>
-
-<tests>
- <test>
-    <param name='input_file' value='tinywga' ftype='pbed' >
-    <metadata name='base_name' value='tinywga' />
-    <composite_data value='tinywga.bim' />
-    <composite_data value='tinywga.bed' />
-    <composite_data value='tinywga.fam' />
-    <edit_attributes type='name' value='tinywga' /> 
-    </param>
-    <param name='title' value='rgCleantest1' />
-    <param name="geno" value="1" />
-    <param name="mind" value="1" />
-    <param name="mef" value="0" />
-    <param name="mei" value="0" />
-    <param name="hwe" value="0" />
-    <param name="maf" value="0" />
-    <param name="relfilter" value="all" />
-    <param name="afffilter" value="allaff" />
-    <param name="sexfilter" value="allsex" />
-    <param name="fixaff" value="0" />
-    <output name='out_file1' file='rgtestouts/rgClean/rgCleantest1.pbed' compare="diff" lines_diff="25" >
-    <extra_files type="file" name='rgCleantest1.bim' value="rgtestouts/rgClean/rgCleantest1.bim" compare="diff" />
-    <extra_files type="file" name='rgCleantest1.fam' value="rgtestouts/rgClean/rgCleantest1.fam" compare="diff" />
-    <extra_files type="file" name='rgCleantest1.bed' value="rgtestouts/rgClean/rgCleantest1.bed" compare="diff" />
-    </output>
- </test>
-</tests>
-<help>
-
-.. class:: infomark
-
-**Syntax**
-
-- **Genotype data** is the input genotype file chosen from your current history
-- **Descriptive title** is the name to use for the filtered output file
-- **Missfrac threshold: subjects** is the threshold for missingness by subject. Subjects with more than this fraction missing will be excluded from the import
-- **Missfrac threshold: markers** is the threshold for missingness by marker. Markers with more than this fraction missing will be excluded from the import
-- **MaxMendel Individuals** Mendel error fraction above which to exclude subjects with more than the specified fraction of mendelian errors in transmission (for family data only)
-- **MaxMendel Families** Mendel error fraction above which to exclude families with more than the specified fraction of mendelian errors in transmission (for family data only)
-- **HWE** is the threshold for HWE test p values below which the marker will not be imported. Set this to -1 and all markers will be imported regardless of HWE p value
-- **MAF** is the threshold for minor allele frequency - SNPs with lower MAF will be excluded
-- **Filters** for founders/offspring or affected/unaffected or males/females are optionally available if needed
-- **Change Affection** is only needed if you want to change the affection status for creating new analysis datasets
-
------
-
-**Attribution**
-
-This tool relies on the work of many people. It uses Plink http://pngu.mgh.harvard.edu/~purcell/plink/,
-and the R http://cran.r-project.org/ and
-Bioconductor http://www.bioconductor.org/ projects.
-respectively.
-
-In particular, http://pngu.mgh.harvard.edu/~purcell/plink/
-has excellent documentation describing the parameters you can set here.
-
-This implementation is a Galaxy tool wrapper around these third party applications.
-It was originally designed and written for family based data from the CAMP Illumina run of 2007 by
-ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit.
-
-Rgenetics merely exposes them, wrapping Plink so you can use it in Galaxy.
-
------
-
-**Summary**
-
-Reliable statistical inference depends on reliable data. Poor quality samples and markers
-may add more noise than signal, decreasing statistical power. Removing the worst of them
-can be done by setting thresholds for some of the commonly used technical quality measures
-for genotype data. Of course discordant replicate calls are also very informative but are not
-in scope here.
-
-Marker cleaning: Filters are available to remove markers below a specific minor allele
-frequency, beyond a Hardy Wienberg threshold, below a minor allele frequency threshold,
-or above a threshold for missingness. If family data are available, thresholds for Mendelian
-error can be set.
-
-Subject cleaning: Filters are available to remove subjects with many missing calls. Subjects and markers for family data can be filtered by proportions
-of Mendelian errors in observed transmission. Use the QC reporting tool to
-generate a comprehensive series of reports for quality control.
-
-Note that ancestry and cryptic relatedness should also be checked using the relevant tools.
-
------
-
-.. class:: infomark
-
-**Tip**
-
-You can check that you got what you asked for by running the QC tool to ensure that the distributions
-are truncated the way you expect. Note that you do not expect that the thresholds will be exactly
-what you set - some bad assays and subjects are out in multiple QC measures, so you sometimes have
-more samples or markers than you exactly set for each threshold. Finally, the ordering of
-operations matters and Plink is somewhat restrictive about what it will do on each pass
-of the data. At least it's fixed.
-
------
-
-This Galaxy tool was written by Ross Lazarus for the Rgenetics project
-It uses Plink for most calculations - for full Plink attribution, source code and documentation,
-please see http://pngu.mgh.harvard.edu/~purcell/plink/ plus some custom python code
-
-</help>
-</tool>

This diff is so big that we needed to truncate the remainder.

Repository URL: https://bitbucket.org/galaxy/galaxy-central/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.

    

commits-noreply＠bitbucket.org

tags

participants (1)