11 new commits in galaxy-central: https://bitbucket.org/galaxy/galaxy-central/commits/26e7555cd499/ Changeset: 26e7555cd499 User: fubar Date: 2013-10-08 01:24:59 Summary: hg backout -r 1092b0d backout unintended commits to tophat and bowtie2 wrappers Affected #: 3 files diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/ngs_rna/tophat2_wrapper.xml --- a/tools/ngs_rna/tophat2_wrapper.xml +++ b/tools/ngs_rna/tophat2_wrapper.xml @@ -126,14 +126,6 @@ </command><inputs> - <param name="jobname" type="text" value="Tophat2" size="80" label="Job title for outputs" - help="Output name to remind you what this was for"> - <sanitizer invalid_char=""> - <valid initial="string.letters,string.digits"> - <add value="_" /> - </valid> - </sanitizer> - </param><conditional name="singlePaired"><param name="sPaired" type="select" label="Is this library mate-paired?"><option value="single">Single-end</option> @@ -281,19 +273,19 @@ </stdio><outputs> - <data format="tabular" name="fusions" label="${on_string}_${jobname}_fusions.xls" from_work_dir="tophat_out/fusions.out"> + <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data> - <data format="bed" name="insertions" label="${on_string}_${jobname}_ins.bed" from_work_dir="tophat_out/insertions.bed"> + <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="deletions" label="${on_string}_${jobname}_del.bed" from_work_dir="tophat_out/deletions.bed"> + <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="junctions" label="${on_string}_${jobname}_splicejunc.bed" from_work_dir="tophat_out/junctions.bed"> + <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data> - <data format="bam" name="accepted_hits" label="${on_string}_${jobname}_hits.bam" from_work_dir="tophat_out/accepted_hits.bam"> + <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs> diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/rgenetics/rgFastQC.py --- a/tools/rgenetics/rgFastQC.py +++ b/tools/rgenetics/rgFastQC.py @@ -27,7 +27,6 @@ from rgutils import getFileString import zipfile import gzip -import bz2 class FastQC(): """wrapper @@ -63,43 +62,27 @@ infname = self.opts.inputfilename linf = infname.lower() trimext = False - isgz = False - isbz2 = False - iszip = False # decompression at upload currently does NOT remove this now bogus ending - fastqc will barf # patched may 29 2013 until this is fixed properly - f = gzip.open(self.opts.input) - try: + if ( linf.endswith('.gz') or linf.endswith('.gzip') ): + f = gzip.open(self.opts.input) + try: testrow = f.readline() - if not (linf.endswith('.gz') or linf.endswith('.gzip')): - isgz = True - except: - if linf.endswith('.gz') or linf.endswith('.gzip'): - trimext = True - f.close() - f = bz2.BZ2File(self.opts.input) - try: - x = read(f,2) # will work if a real bz2 - if not linf.endswith('bz2'): - isbz2 = True - except: - if linf.endswith('bz2'): - trimext = True - f.close() - if zipfile.is_zipfile(self.opts.input): - if not linf.endswith('.zip'): - iszip = True - else: - if linf.endswith('.zip'): - trimext = True + except: + trimext = True + f.close() + elif linf.endswith('bz2'): + f = bz2.open(self.opts.input,'rb') + try: + f.readline() + except: + trimext = True + f.close() + elif linf.endswith('.zip'): + if not zipfile.is_zipfile(self.opts.input): + trimext = True if trimext: infname = os.path.splitext(infname)[0] - elif isgz: - infname = '%s.gz' % infname - elif isbz2: - infname = '%s.bzip2' % infname - elif iszip: - infname = '%s.zip' % infname fastqinfilename = re.sub(ur'[^a-zA-Z0-9_\-\.]', '_', os.path.basename(infname)) link_name = os.path.join(self.opts.outputdir, fastqinfilename) os.symlink(self.opts.input, link_name) diff -r f76c280cdee539b65315ac583f93bfa52feb790d -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 tools/sr_mapping/bowtie2_wrapper.xml --- a/tools/sr_mapping/bowtie2_wrapper.xml +++ b/tools/sr_mapping/bowtie2_wrapper.xml @@ -87,22 +87,23 @@ ## read group information #if str($read_group.selection) == "yes": #if $read_group.rgid and $read_group.rglb and $read_group.rgpl and $read_group.rgsm: - --rg-id "$read_group.rgid" - --rg "LB:$read_group.rglb" - --rg "PL:$read_group.rgpl" - --rg "SM:$read_group.rgsm" + --rg-id $read_group.rgid + --rg LB:$read_group.rglb + --rg PL:$read_group.rgpl + --rg SM:$read_group.rgsm #end if #end if ## view/sort and output file - | samtools view -Su - | samtools sort -o - - > $output + | samtools view -Su - | samtools sort -o - - > $output; ## rename unaligned sequence files #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r: #set left = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' ) #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' ) - ;mv $left $output_unaligned_reads_l; - mv $right $output_unaligned_reads_r + + mv $left $output_unaligned_reads_l; + mv $right $output_unaligned_reads_r; #end if </command> @@ -112,8 +113,6 @@ </stdio><inputs> - <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" /> - <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?"> @@ -217,7 +216,7 @@ <!-- define outputs --><outputs> - <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" > + <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" ><filter>unaligned_file is True</filter><actions><action type="format"> @@ -225,7 +224,7 @@ </action></actions></data> - <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)"> + <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format"> @@ -233,7 +232,7 @@ </action></actions></data> - <data format="bam" name="output" label="${on_string}_${jobtitle}_aligned.bam"> + <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads"><actions><conditional name="reference_genome.source"><when value="indexed"> @@ -255,17 +254,6 @@ </outputs><tests> - <test> - <!-- basic test on single paired default run --> - <param name="type" value="single"/> - <param name="selection" value="no"/> - <param name="full" value="no"/> - <param name="unaligned_file" value="false"/> - <param name="source" value="history" /> - <param name="input_1" value="bowtie2/phix_reads.fastq" ftype="fastqsanger"/> - <param name="own_file" value="bowtie2/phix_genome.fasta" /> - <output name="output" file="bowtie2/phix_mapped.bam" /> - </test></tests><help> https://bitbucket.org/galaxy/galaxy-central/commits/962d981799cf/ Changeset: 962d981799cf User: fubar Date: 2013-10-08 01:50:10 Summary: backout -r 10799 Affected #: 191 files diff -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 tools/rgedgeR/rgToolFactory.py --- a/tools/rgedgeR/rgToolFactory.py +++ /dev/null @@ -1,605 +0,0 @@ -# rgToolFactory.py -# see https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home -# -# copyright ross lazarus (ross stop lazarus at gmail stop com) May 2012 -# -# all rights reserved -# Licensed under the LGPL -# suggestions for improvement and bug fixes welcome at https://bitbucket.org/fubar/galaxytoolfactory/wiki/Home -# -# july 2013 -# added ability to combine images and individual log files into html output -# just make sure there's a log file foo.log and it will be output -# together with all images named like "foo_*.pdf -# otherwise old format for html -# -# January 2013 -# problem pointed out by Carlos Borroto -# added escaping for <>$ - thought I did that ages ago... -# -# August 11 2012 -# changed to use shell=False and cl as a sequence - -# This is a Galaxy tool factory for simple scripts in python, R or whatever ails ye. -# It also serves as the wrapper for the new tool. -# -# you paste and run your script -# Only works for simple scripts that read one input from the history. -# Optionally can write one new history dataset, -# and optionally collect any number of outputs into links on an autogenerated HTML page. - -# DO NOT install on a public or important site - please. - -# installed generated tools are fine if the script is safe. -# They just run normally and their user cannot do anything unusually insecure -# but please, practice safe toolshed. -# Read the fucking code before you install any tool -# especially this one - -# After you get the script working on some test data, you can -# optionally generate a toolshed compatible gzip file -# containing your script safely wrapped as an ordinary Galaxy script in your local toolshed for -# safe and largely automated installation in a production Galaxy. - -# If you opt for an HTML output, you get all the script outputs arranged -# as a single Html history item - all output files are linked, thumbnails for all the pdfs. -# Ugly but really inexpensive. -# -# Patches appreciated please. -# -# -# long route to June 2012 product -# Behold the awesome power of Galaxy and the toolshed with the tool factory to bind them -# derived from an integrated script model -# called rgBaseScriptWrapper.py -# Note to the unwary: -# This tool allows arbitrary scripting on your Galaxy as the Galaxy user -# There is nothing stopping a malicious user doing whatever they choose -# Extremely dangerous!! -# Totally insecure. So, trusted users only -# -# preferred model is a developer using their throw away workstation instance - ie a private site. -# no real risk. The universe_wsgi.ini admin_users string is checked - only admin users are permitted to run this tool. -# - -import sys -import shutil -import subprocess -import os -import time -import tempfile -import optparse -import tarfile -import re -import shutil -import math - -progname = os.path.split(sys.argv[0])[1] -myversion = 'V000.2 June 2012' -verbose = False -debug = False -toolFactoryURL = 'https://bitbucket.org/fubar/galaxytoolfactory' - -def timenow(): - """return current time as a string - """ - return time.strftime('%d/%m/%Y %H:%M:%S', time.localtime(time.time())) - -html_escape_table = { - "&": "&", - ">": ">", - "<": "<", - "$": "\$" - } - -def html_escape(text): - """Produce entities within text.""" - return "".join(html_escape_table.get(c,c) for c in text) - -def cmd_exists(cmd): - return subprocess.call("type " + cmd, shell=True, - stdout=subprocess.PIPE, stderr=subprocess.PIPE) == 0 - - -class ScriptRunner: - """class is a wrapper for an arbitrary script - """ - - def __init__(self,opts=None,treatbashSpecial=True): - """ - cleanup inputs, setup some outputs - - """ - self.useGM = cmd_exists('gm') - self.useIM = cmd_exists('convert') - self.useGS = cmd_exists('gs') - self.treatbashSpecial = treatbashSpecial - if opts.output_dir: # simplify for the tool tarball - os.chdir(opts.output_dir) - self.thumbformat = 'png' - self.opts = opts - self.toolname = re.sub('[^a-zA-Z0-9_]+', '', opts.tool_name) # a sanitizer now does this but.. - self.toolid = self.toolname - self.myname = sys.argv[0] # get our name because we write ourselves out as a tool later - self.pyfile = self.myname # crude but efficient - the cruft won't hurt much - self.xmlfile = '%s.xml' % self.toolname - s = open(self.opts.script_path,'r').readlines() - s = [x.rstrip() for x in s] # remove pesky dos line endings if needed - self.script = '\n'.join(s) - fhandle,self.sfile = tempfile.mkstemp(prefix=self.toolname,suffix=".%s" % (opts.interpreter)) - tscript = open(self.sfile,'w') # use self.sfile as script source for Popen - tscript.write(self.script) - tscript.close() - self.indentedScript = '\n'.join([' %s' % x for x in s]) # for restructured text in help - self.escapedScript = '\n'.join([html_escape(x) for x in s]) - self.elog = os.path.join(self.opts.output_dir,"%s_error.log" % self.toolname) - if opts.output_dir: # may not want these complexities - self.tlog = os.path.join(self.opts.output_dir,"%s_runner.log" % self.toolname) - art = '%s.%s' % (self.toolname,opts.interpreter) - artpath = os.path.join(self.opts.output_dir,art) # need full path - artifact = open(artpath,'w') # use self.sfile as script source for Popen - artifact.write(self.script) - artifact.close() - self.cl = [] - self.html = [] - a = self.cl.append - a(opts.interpreter) - if self.treatbashSpecial and opts.interpreter in ['bash','sh']: - a(self.sfile) - else: - a('-') # stdin - a(opts.input_tab) - a(opts.output_tab) - self.outFormats = 'tabular' # TODO make this an option at tool generation time - self.inputFormats = 'tabular' # TODO make this an option at tool generation time - self.test1Input = '%s_test1_input.xls' % self.toolname - self.test1Output = '%s_test1_output.xls' % self.toolname - self.test1HTML = '%s_test1_output.html' % self.toolname - - def makeXML(self): - """ - Create a Galaxy xml tool wrapper for the new script as a string to write out - fixme - use templating or something less fugly than this example of what we produce - - <tool id="reverse" name="reverse" version="0.01"> - <description>a tabular file</description> - <command interpreter="python"> - reverse.py --script_path "$runMe" --interpreter "python" - --tool_name "reverse" --input_tab "$input1" --output_tab "$tab_file" - </command> - <inputs> - <param name="input1" type="data" format="tabular" label="Select a suitable input file from your history"/><param name="job_name" type="text" label="Supply a name for the outputs to remind you what they contain" value="reverse"/> - - </inputs> - <outputs> - <data format="tabular" name="tab_file" label="${job_name}"/> - - </outputs> - <help> - -**What it Does** - -Reverse the columns in a tabular file - - </help> - <configfiles> - <configfile name="runMe"> - -# reverse order of columns in a tabular file -import sys -inp = sys.argv[1] -outp = sys.argv[2] -i = open(inp,'r') -o = open(outp,'w') -for row in i: - rs = row.rstrip().split('\t') - rs.reverse() - o.write('\t'.join(rs)) - o.write('\n') -i.close() -o.close() - - - </configfile> - </configfiles> - </tool> - - """ - newXML="""<tool id="%(toolid)s" name="%(toolname)s" version="%(tool_version)s"> - %(tooldesc)s - %(command)s - <inputs> - %(inputs)s - </inputs> - <outputs> - %(outputs)s - </outputs> - <configfiles> - <configfile name="runMe"> - %(script)s - </configfile> - </configfiles> - %(tooltests)s - <help> - %(help)s - </help> - </tool>""" # needs a dict with toolname, toolid, interpreter, scriptname, command, inputs as a multi line string ready to write, outputs ditto, help ditto - - newCommand="""<command interpreter="python"> - %(toolname)s.py --script_path "$runMe" --interpreter "%(interpreter)s" - --tool_name "%(toolname)s" %(command_inputs)s %(command_outputs)s - </command>""" # may NOT be an input or htmlout - tooltestsTabOnly = """<tests><test> - <param name="input1" value="%(test1Input)s" ftype="tabular"/> - <param name="job_name" value="test1"/> - <param name="runMe" value="$runMe"/> - <output name="tab_file" file="%(test1Output)s" ftype="tabular"/> - </test></tests>""" - tooltestsHTMLOnly = """<tests><test> - <param name="input1" value="%(test1Input)s" ftype="tabular"/> - <param name="job_name" value="test1"/> - <param name="runMe" value="$runMe"/> - <output name="html_file" file="%(test1HTML)s" ftype="html" lines_diff="5"/> - </test></tests>""" - tooltestsBoth = """<tests><test> - <param name="input1" value="%(test1Input)s" ftype="tabular"/> - <param name="job_name" value="test1"/> - <param name="runMe" value="$runMe"/> - <output name="tab_file" file="%(test1Output)s" ftype="tabular" /> - <output name="html_file" file="%(test1HTML)s" ftype="html" lines_diff="10"/> - </test></tests>""" - xdict = {} - xdict['tool_version'] = self.opts.tool_version - xdict['test1Input'] = self.test1Input - xdict['test1HTML'] = self.test1HTML - xdict['test1Output'] = self.test1Output - if self.opts.make_HTML and self.opts.output_tab <> 'None': - xdict['tooltests'] = tooltestsBoth % xdict - elif self.opts.make_HTML: - xdict['tooltests'] = tooltestsHTMLOnly % xdict - else: - xdict['tooltests'] = tooltestsTabOnly % xdict - xdict['script'] = self.escapedScript - # configfile is least painful way to embed script to avoid external dependencies - # but requires escaping of <, > and $ to avoid Mako parsing - if self.opts.help_text: - xdict['help'] = open(self.opts.help_text,'r').read() - else: - xdict['help'] = 'Please ask the tool author for help as none was supplied at tool generation' - coda = ['**Script**','Pressing execute will run the following code over your input file and generate some outputs in your history::'] - coda.append(self.indentedScript) - coda.append('**Attribution** This Galaxy tool was created by %s at %s\nusing the Galaxy Tool Factory.' % (self.opts.user_email,timenow())) - coda.append('See %s for details of that project' % (toolFactoryURL)) - coda.append('Please cite: Creating re-usable tools from scripts: The Galaxy Tool Factory. Ross Lazarus; Antony Kaspi; Mark Ziemann; The Galaxy Team. ') - coda.append('Bioinformatics 2012; doi: 10.1093/bioinformatics/bts573') - xdict['help'] = '%s\n%s' % (xdict['help'],'\n'.join(coda)) - if self.opts.tool_desc: - xdict['tooldesc'] = '<description>%s</description>' % self.opts.tool_desc - else: - xdict['tooldesc'] = '' - xdict['command_outputs'] = '' - xdict['outputs'] = '' - if self.opts.input_tab <> 'None': - xdict['command_inputs'] = '--input_tab "$input1" ' # the space may matter a lot if we append something - xdict['inputs'] = '<param name="input1" type="data" format="%s" label="Select a suitable input file from your history"/> \n' % self.inputFormats - else: - xdict['command_inputs'] = '' # assume no input - eg a random data generator - xdict['inputs'] = '' - xdict['inputs'] += '<param name="job_name" type="text" label="Supply a name for the outputs to remind you what they contain" value="%s"/> \n' % self.toolname - xdict['toolname'] = self.toolname - xdict['toolid'] = self.toolid - xdict['interpreter'] = self.opts.interpreter - xdict['scriptname'] = self.sfile - if self.opts.make_HTML: - xdict['command_outputs'] += ' --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes" ' - xdict['outputs'] += ' <data format="html" name="html_file" label="${job_name}.html"/>\n' - if self.opts.output_tab <> 'None': - xdict['command_outputs'] += ' --output_tab "$tab_file"' - xdict['outputs'] += ' <data format="%s" name="tab_file" label="${job_name}"/>\n' % self.outFormats - xdict['command'] = newCommand % xdict - xmls = newXML % xdict - xf = open(self.xmlfile,'w') - xf.write(xmls) - xf.write('\n') - xf.close() - # ready for the tarball - - - def makeTooltar(self): - """ - a tool is a gz tarball with eg - /toolname/tool.xml /toolname/tool.py /toolname/test-data/test1_in.foo ... - """ - retval = self.run() - if retval: - print >> sys.stderr,'## Run failed. Cannot build yet. Please fix and retry' - sys.exit(1) - self.makeXML() - tdir = self.toolname - os.mkdir(tdir) - if self.opts.input_tab <> 'None': # no reproducible test otherwise? TODO: maybe.. - testdir = os.path.join(tdir,'test-data') - os.mkdir(testdir) # make tests directory - shutil.copyfile(self.opts.input_tab,os.path.join(testdir,self.test1Input)) - if self.opts.output_tab <> 'None': - shutil.copyfile(self.opts.output_tab,os.path.join(testdir,self.test1Output)) - if self.opts.make_HTML: - shutil.copyfile(self.opts.output_html,os.path.join(testdir,self.test1HTML)) - if self.opts.output_dir: - shutil.copyfile(self.tlog,os.path.join(testdir,'test1_out.log')) - op = '%s.py' % self.toolname # new name - outpiname = os.path.join(tdir,op) # path for the tool tarball - pyin = os.path.basename(self.pyfile) # our name - we rewrite ourselves (TM) - notes = ['# %s - a self annotated version of %s generated by running %s\n' % (op,pyin,pyin),] - notes.append('# to make a new Galaxy tool called %s\n' % self.toolname) - notes.append('# User %s at %s\n' % (self.opts.user_email,timenow())) - pi = open(self.pyfile,'r').readlines() # our code becomes new tool wrapper (!) - first Galaxy worm - notes += pi - outpi = open(outpiname,'w') - outpi.write(''.join(notes)) - outpi.write('\n') - outpi.close() - stname = os.path.join(tdir,self.sfile) - if not os.path.exists(stname): - shutil.copyfile(self.sfile, stname) - xtname = os.path.join(tdir,self.xmlfile) - if not os.path.exists(xtname): - shutil.copyfile(self.xmlfile,xtname) - tarpath = "%s.gz" % self.toolname - tar = tarfile.open(tarpath, "w:gz") - tar.add(tdir,arcname=self.toolname) - tar.close() - shutil.copyfile(tarpath,self.opts.new_tool) - shutil.rmtree(tdir) - ## TODO: replace with optional direct upload to local toolshed? - return retval - - - def compressPDF(self,inpdf=None,thumbformat='png'): - """need absolute path to pdf - """ - assert os.path.isfile(inpdf), "## Input %s supplied to %s compressPDF not found" % (inpdf,self.myName) - hf,hlog = tempfile.mkstemp(suffix="%s.log" % self.toolname) - sto = open(hlog,'w') - outpdf = '%s_compressed' % inpdf - cl = ["gs", "-sDEVICE=pdfwrite", "-dNOPAUSE", "-dBATCH","-dPDFSETTINGS=/printer", "-sOutputFile=%s" % outpdf,inpdf] - x = subprocess.Popen(cl,stdout=sto,stderr=sto,cwd=self.opts.output_dir) - retval1 = x.wait() - if retval1 == 0: - os.unlink(inpdf) - shutil.move(outpdf,inpdf) - outpng = '%s.%s' % (os.path.splitext(inpdf)[0],thumbformat) - if self.useGM: - cl2 = ['gm convert', inpdf, outpng] - else: # assume imagemagick - cl2 = ['convert', inpdf, outpng] - x = subprocess.Popen(cl2,stdout=sto,stderr=sto,cwd=self.opts.output_dir) - retval2 = x.wait() - sto.close() - retval = retval1 or retval2 - return retval - - - def getfSize(self,fpath,outpath): - """ - format a nice file size string - """ - size = '' - fp = os.path.join(outpath,fpath) - if os.path.isfile(fp): - size = '0 B' - n = float(os.path.getsize(fp)) - if n > 2**20: - size = '%1.1f MB' % (n/2**20) - elif n > 2**10: - size = '%1.1f KB' % (n/2**10) - elif n > 0: - size = '%d B' % (int(n)) - return size - - def makeHtml(self): - """ Create an HTML file content to list all the artifacts found in the output_dir - """ - - galhtmlprefix = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> - <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> - <head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> - <meta name="generator" content="Galaxy %s tool output - see http://g2.trac.bx.psu.edu/" /> - <title></title> - <link rel="stylesheet" href="/static/style/base.css" type="text/css" /> - </head> - <body> - <div class="toolFormBody"> - """ - galhtmlattr = """<hr/><div class="infomessage">This tool (%s) was generated by the <a href="https://bitbucket.org/fubar/galaxytoolfactory/overview">Galaxy Tool Factory</a></div><br/>""" - galhtmlpostfix = """</div></body></html>\n""" - - flist = os.listdir(self.opts.output_dir) - flist = [x for x in flist if x <> 'Rplots.pdf'] - flist.sort() - html = [] - html.append(galhtmlprefix % progname) - html.append('<div class="infomessage">Galaxy Tool "%s" run at %s</div><br/>' % (self.toolname,timenow())) - fhtml = [] - if len(flist) > 0: - logfiles = [x for x in flist if x.lower().endswith('.log')] # log file names determine sections - logfiles.sort() - logfiles = [x for x in logfiles if os.path.abspath(x) <> os.path.abspath(self.tlog)] - logfiles.append(os.path.abspath(self.tlog)) # make it the last one - pdflist = [] - npdf = len([x for x in flist if os.path.splitext(x)[-1].lower() == '.pdf']) - for rownum,fname in enumerate(flist): - dname,e = os.path.splitext(fname) - sfsize = self.getfSize(fname,self.opts.output_dir) - if e.lower() == '.pdf' : # compress and make a thumbnail - thumb = '%s.%s' % (dname,self.thumbformat) - pdff = os.path.join(self.opts.output_dir,fname) - retval = self.compressPDF(inpdf=pdff,thumbformat=self.thumbformat) - if retval == 0: - pdflist.append((fname,thumb)) - if (rownum+1) % 2 == 0: - fhtml.append('<tr class="odd_row"><td><a href="%s">%s</a></td><td>%s</td></tr>' % (fname,fname,sfsize)) - else: - fhtml.append('<tr><td><a href="%s">%s</a></td><td>%s</td></tr>' % (fname,fname,sfsize)) - for logfname in logfiles: # expect at least tlog - if more - if os.path.abspath(logfname) == os.path.abspath(self.tlog): # handled later - sectionname = 'All tool run' - if (len(logfiles) > 1): - sectionname = 'Other' - ourpdfs = pdflist - else: - realname = os.path.basename(logfname) - sectionname = os.path.splitext(realname)[0].split('_')[0] # break in case _ added to log - ourpdfs = [x for x in pdflist if os.path.basename(x[0]).split('_')[0] == sectionname] - pdflist = [x for x in pdflist if os.path.basename(x[0]).split('_')[0] <> sectionname] # remove - nacross = 1 - npdf = len(ourpdfs) - - if npdf > 0: - nacross = math.sqrt(npdf) ## int(round(math.log(npdf,2))) - if int(nacross)**2 != npdf: - nacross += 1 - nacross = int(nacross) - width = min(400,int(1200/nacross)) - html.append('<div class="toolFormTitle">%s images and outputs</div>' % sectionname) - html.append('(Click on a thumbnail image to download the corresponding original PDF image)<br/>') - ntogo = nacross # counter for table row padding with empty cells - html.append('<div><table class="simple" cellpadding="2" cellspacing="2">\n<tr>') - for i,paths in enumerate(ourpdfs): - fname,thumb = paths - s= """<td><a href="%s"><img src="%s" title="Click to download a PDF of %s" hspace="5" width="%d" - alt="Image called %s"/></a></td>\n""" % (fname,thumb,fname,width,fname) - if ((i+1) % nacross == 0): - s += '</tr>\n' - ntogo = 0 - if i < (npdf - 1): # more to come - s += '<tr>' - ntogo = nacross - else: - ntogo -= 1 - html.append(s) - if html[-1].strip().endswith('</tr>'): - html.append('</table></div>\n') - else: - if ntogo > 0: # pad - html.append('<td> </td>'*ntogo) - html.append('</tr></table></div>\n') - logt = open(logfname,'r').readlines() - logtext = [x for x in logt if x.strip() > ''] - html.append('<div class="toolFormTitle">%s log output</div>' % sectionname) - if len(logtext) > 1: - html.append('\n<pre>\n') - html += logtext - html.append('\n</pre>\n') - else: - html.append('%s is empty<br/>' % logfname) - if len(fhtml) > 0: - fhtml.insert(0,'<div><table class="colored" cellpadding="3" cellspacing="3"><tr><th>Output File Name (click to view)</th><th>Size</th></tr>\n') - fhtml.append('</table></div><br/>') - html.append('<div class="toolFormTitle">All output files available for downloading</div>\n') - html += fhtml # add all non-pdf files to the end of the display - else: - html.append('<div class="warningmessagelarge">### Error - %s returned no files - please confirm that parameters are sane</div>' % self.opts.interpreter) - html.append(galhtmlpostfix) - htmlf = file(self.opts.output_html,'w') - htmlf.write('\n'.join(html)) - htmlf.write('\n') - htmlf.close() - self.html = html - - - def run(self): - """ - scripts must be small enough not to fill the pipe! - """ - if self.treatbashSpecial and self.opts.interpreter in ['bash','sh']: - retval = self.runBash() - else: - if self.opts.output_dir: - ste = open(self.elog,'w') - sto = open(self.tlog,'w') - sto.write('## Toolfactory generated command line = %s\n' % ' '.join(self.cl)) - sto.flush() - p = subprocess.Popen(self.cl,shell=False,stdout=sto,stderr=ste,stdin=subprocess.PIPE,cwd=self.opts.output_dir) - else: - p = subprocess.Popen(self.cl,shell=False,stdin=subprocess.PIPE) - p.stdin.write(self.script) - p.stdin.close() - retval = p.wait() - if self.opts.output_dir: - sto.close() - ste.close() - err = open(self.elog,'r').readlines() - if retval <> 0 and err: # problem - print >> sys.stderr,err - if self.opts.make_HTML: - self.makeHtml() - return retval - - def runBash(self): - """ - cannot use - for bash so use self.sfile - """ - if self.opts.output_dir: - s = '## Toolfactory generated command line = %s\n' % ' '.join(self.cl) - sto = open(self.tlog,'w') - sto.write(s) - sto.flush() - p = subprocess.Popen(self.cl,shell=False,stdout=sto,stderr=sto,cwd=self.opts.output_dir) - else: - p = subprocess.Popen(self.cl,shell=False) - retval = p.wait() - if self.opts.output_dir: - sto.close() - if self.opts.make_HTML: - self.makeHtml() - return retval - - -def main(): - u = """ - This is a Galaxy wrapper. It expects to be called by a special purpose tool.xml as: - <command interpreter="python">rgBaseScriptWrapper.py --script_path "$scriptPath" --tool_name "foo" --interpreter "Rscript" - </command> - """ - op = optparse.OptionParser() - a = op.add_option - a('--script_path',default=None) - a('--tool_name',default=None) - a('--interpreter',default=None) - a('--output_dir',default=None) - a('--output_html',default=None) - a('--input_tab',default="None") - a('--output_tab',default="None") - a('--user_email',default='Unknown') - a('--bad_user',default=None) - a('--make_Tool',default=None) - a('--make_HTML',default=None) - a('--help_text',default=None) - a('--tool_desc',default=None) - a('--new_tool',default=None) - a('--tool_version',default=None) - opts, args = op.parse_args() - assert not opts.bad_user,'UNAUTHORISED: %s is NOT authorized to use this tool until Galaxy admin adds %s to admin_users in universe_wsgi.ini' % (opts.bad_user,opts.bad_user) - assert opts.tool_name,'## Tool Factory expects a tool name - eg --tool_name=DESeq' - assert opts.interpreter,'## Tool Factory wrapper expects an interpreter - eg --interpreter=Rscript' - assert os.path.isfile(opts.script_path),'## Tool Factory wrapper expects a script path - eg --script_path=foo.R' - if opts.output_dir: - try: - os.makedirs(opts.output_dir) - except: - pass - r = ScriptRunner(opts) - if opts.make_Tool: - retcode = r.makeTooltar() - else: - retcode = r.run() - os.unlink(r.sfile) - if retcode: - sys.exit(retcode) # indicate failure to job runner - - -if __name__ == "__main__": - main() - - diff -r 26e7555cd49967ab01f003abc6168f885bbe0dd1 -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 tools/rgedgeR/rgedgeRpaired.xml --- a/tools/rgedgeR/rgedgeRpaired.xml +++ /dev/null @@ -1,1080 +0,0 @@ -<tool id="rgDifferentialCount" name="Differential_Count" version="0.20"> - <description>models using BioConductor packages</description> - <requirements> - <requirement type="package" version="2.12">biocbasics</requirement> - <requirement type="package" version="3.0.1">r3</requirement> - <requirement type="package" version="1.3.18">graphicsmagick</requirement> - <requirement type="package" version="9.07">ghostscript</requirement> - </requirements> - - <command interpreter="python"> - rgToolFactory.py --script_path "$runme" --interpreter "Rscript" --tool_name "DifferentialCounts" - --output_dir "$html_file.files_path" --output_html "$html_file" --make_HTML "yes" - </command> - <inputs> - <param name="input1" type="data" format="tabular" label="Select an input matrix - rows are contigs, columns are counts for each sample" - help="Use the HTSeq based count matrix preparation tool to create these matrices from BAM/SAM files and a GTF file of genomic features"/> - <param name="title" type="text" value="Differential Counts" size="80" label="Title for job outputs" - help="Supply a meaningful name here to remind you what the outputs contain"> - <sanitizer invalid_char=""> - <valid initial="string.letters,string.digits"><add value="_" /></valid> - </sanitizer> - </param> - <param name="treatment_name" type="text" value="Treatment" size="50" label="Treatment Name"/> - <param name="Treat_cols" label="Select columns containing treatment." type="data_column" data_ref="input1" numerical="True" - multiple="true" use_header_names="true" size="120" display="checkboxes"> - <validator type="no_options" message="Please select at least one column."/> - </param> - <param name="control_name" type="text" value="Control" size="50" label="Control Name"/> - <param name="Control_cols" label="Select columns containing control." type="data_column" data_ref="input1" numerical="True" - multiple="true" use_header_names="true" size="120" display="checkboxes" optional="true"> - </param> - <param name="subjectids" type="text" optional="true" size="120" value = "" - label="IF SUBJECTS NOT ALL INDEPENDENT! Enter comma separated strings to indicate sample labels for (eg) pairing - must be one for every column in input" - help="Leave blank if no pairing, but eg if data from sample id A99 is in columns 2,4 and id C21 is in 3,5 then enter 'A99,C21,A99,C21'"> - <sanitizer> - <valid initial="string.letters,string.digits"><add value="," /></valid> - </sanitizer> - </param> - <param name="fQ" type="float" value="0.3" size="5" label="Non-differential contig count quantile threshold - zero to analyze all non-zero read count contigs" - help="May be a good or a bad idea depending on the biology and the question. EG 0.3 = sparsest 30% of contigs with at least one read are removed before analysis"/> - <param name="useNDF" type="boolean" truevalue="T" falsevalue="F" checked="false" size="1" - label="Non differential filter - remove contigs below a threshold (1 per million) for half or more samples" - help="May be a good or a bad idea depending on the biology and the question. This was the old default. Quantile based is available as an alternative"/> - - <conditional name="edgeR"> - <param name="doedgeR" type="select" - label="Run this model using edgeR" - help="edgeR uses a negative binomial model and seems to be powerful, even with few replicates"> - <option value="F">Do not run edgeR</option> - <option value="T" selected="true">Run edgeR</option> - </param> - <when value="T"> - <param name="edgeR_priordf" type="integer" value="20" size="3" - label="prior.df for tagwise dispersion - lower value = more emphasis on each tag's variance. Replaces prior.n and prior.df = prior.n * residual.df" - help="0 = Use edgeR default. Use a small value to 'smooth' small samples. See edgeR docs and note below"/> - </when> - <when value="F"></when> - </conditional> - <conditional name="DESeq2"> - <param name="doDESeq2" type="select" - label="Run the same model with DESeq2 and compare findings" - help="DESeq2 is an update to the DESeq package. It uses different assumptions and methods to edgeR"> - <option value="F" selected="true">Do not run DESeq2</option> - <option value="T">Run DESeq2</option> - </param> - <when value="T"> - <param name="DESeq_fitType" type="select"> - <option value="parametric" selected="true">Parametric (default) fit for dispersions</option> - <option value="local">Local fit - this will automagically be used if parametric fit fails</option> - <option value="mean">Mean dispersion fit- use this if you really understand what you're doing - read the fine manual linked below in the documentation</option> - </param> - </when> - <when value="F"></when> - </conditional> - <param name="doVoom" type="select" - label="Run the same model with Voom/limma and compare findings" - help="Voom uses counts per million and a precise transformation of variance so count data can be analysed using limma"> - <option value="F" selected="true">Do not run VOOM</option> - <option value="T">Run VOOM</option> - </param> - <conditional name="camera"> - <param name="doCamera" type="select" label="Run the edgeR implementation of Camera GSEA for up/down gene sets" - help="If yes, you can choose a set of genesets to test and/or supply a gmt format geneset collection from your history"> - <option value="F" selected="true">Do not run GSEA tests with the Camera algorithm</option> - <option value="T">Run GSEA tests with the Camera algorithm</option> - </param> - <when value="T"> - <conditional name="gmtSource"> - <param name="refgmtSource" type="select" - label="Use a gene set (.gmt) from your history and/or use a built-in (MSigDB etc) gene set"> - <option value="indexed" selected="true">Use a built-in gene set</option> - <option value="history">Use a gene set from my history</option> - <option value="both">Add a gene set from my history to a built in gene set</option> - </param> - <when value="indexed"> - <param name="builtinGMT" type="select" label="Select a gene set matrix (.gmt) file to use for the analysis"> - <options from_data_table="gseaGMT_3.1"> - <filter type="sort_by" column="2" /> - <validator type="no_options" message="No GMT v3.1 files are available - please install them"/> - </options> - </param> - </when> - <when value="history"> - <param name="ownGMT" type="data" format="gmt" label="Select a Gene Set from your history" /> - </when> - <when value="both"> - <param name="ownGMT" type="data" format="gseagmt" label="Select a Gene Set from your history" /> - <param name="builtinGMT" type="select" label="Select a gene set matrix (.gmt) file to use for the analysis"> - <options from_data_table="gseaGMT_4"> - <filter type="sort_by" column="2" /> - <validator type="no_options" message="No GMT v4 files are available - please fix tool_data_table and loc files"/> - </options> - </param> - </when> - </conditional> - </when> - <when value="F"> - </when> - </conditional> - <param name="fdrthresh" type="float" value="0.05" size="5" label="P value threshold for FDR filtering for amily wise error rate control" - help="Conventional default value of 0.05 recommended"/> - <param name="fdrtype" type="select" label="FDR (Type II error) control method" - help="Use fdr or bh typically to control for the number of tests in a reliable way"> - <option value="fdr" selected="true">fdr</option> - <option value="BH">Benjamini Hochberg</option> - <option value="BY">Benjamini Yukateli</option> - <option value="bonferroni">Bonferroni</option> - <option value="hochberg">Hochberg</option> - <option value="holm">Holm</option> - <option value="hommel">Hommel</option> - <option value="none">no control for multiple tests</option> - </param> - </inputs> - <outputs> - <data format="tabular" name="out_edgeR" label="${title}_topTable_edgeR.xls"> - <filter>edgeR['doedgeR'] == "T"</filter> - </data> - <data format="tabular" name="out_DESeq2" label="${title}_topTable_DESeq2.xls"> - <filter>DESeq2['doDESeq2'] == "T"</filter> - </data> - <data format="tabular" name="out_VOOM" label="${title}_topTable_VOOM.xls"> - <filter>doVoom == "T"</filter> - </data> - <data format="html" name="html_file" label="${title}.html"/> - </outputs> - <stdio> - <exit_code range="4" level="fatal" description="Number of subject ids must match total number of samples in the input matrix" /> - </stdio> - <tests> -<test> -<param name='input1' value='test_bams2mx.xls' ftype='tabular' /> - <param name='treatment_name' value='case' /> - <param name='title' value='edgeRtest' /> - <param name='useNDF' value='' /> - <param name='doedgeR' value='T' /> - <param name='doVoom' value='T' /> - <param name='doDESeq2' value='T' /> - <param name='fdrtype' value='fdr' /> - <param name='edgeR_priordf' value="8" /> - <param name='fdrthresh' value="0.05" /> - <param name='control_name' value='control' /> - <param name='subjectids' value='' /> - <param name='Treat_cols' value='3,4,5,9' /> - <param name='Control_cols' value='2,6,7,8' /> - <output name='out_edgeR' file='edgeRtest1out.xls' compare='diff' /> - <output name='html_file' file='edgeRtest1out.html' compare='diff' lines_diff='20' /> -</test> -</tests> - -<configfiles> -<configfile name="runme"> -<![CDATA[ -# -# edgeR.Rscript -# updated npv 2011 for R 2.14.0 and edgeR 2.4.0 by ross -# Performs DGE on a count table containing n replicates of two conditions -# -# Parameters -# -# 1 - Output Dir - -# Original edgeR code by: S.Lunke and A.Kaspi -reallybig = log10(.Machine\$double.xmax) -reallysmall = log10(.Machine\$double.xmin) -library('stringr') -library('gplots') -library('edgeR') -hmap2 = function(cmat,nsamp=100,outpdfname='heatmap2.pdf', TName='Treatment',group=NA,myTitle='title goes here') -{ -# Perform clustering for significant pvalues after controlling FWER - samples = colnames(cmat) - gu = unique(group) - gn = rownames(cmat) - if (length(gu) == 2) { - col.map = function(g) {if (g==gu[1]) "#FF0000" else "#0000FF"} - pcols = unlist(lapply(group,col.map)) - } else { - colours = rainbow(length(gu),start=0,end=4/6) - pcols = colours[match(group,gu)] } - dm = cmat[(! is.na(gn)),] - # remove unlabelled hm rows - nprobes = nrow(dm) - # sub = paste('Showing',nprobes,'contigs ranked for evidence of differential abundance') - if (nprobes > nsamp) { - dm =dm[1:nsamp,] - #sub = paste('Showing',nsamp,'contigs ranked for evidence for differential abundance out of',nprobes,'total') - } - newcolnames = substr(colnames(dm),1,20) - colnames(dm) = newcolnames - pdf(outpdfname) - heatmap.2(dm,main=myTitle,ColSideColors=pcols,col=topo.colors(100),dendrogram="col",key=T,density.info='none', - Rowv=F,scale='row',trace='none',margins=c(8,8),cexRow=0.4,cexCol=0.5) - dev.off() -} - -hmap = function(cmat,nmeans=4,outpdfname="heatMap.pdf",nsamp=250,TName='Treatment',group=NA,myTitle="Title goes here") -{ - # for 2 groups only was - #col.map = function(g) {if (g==TName) "#FF0000" else "#0000FF"} - #pcols = unlist(lapply(group,col.map)) - gu = unique(group) - colours = rainbow(length(gu),start=0.3,end=0.6) - pcols = colours[match(group,gu)] - nrows = nrow(cmat) - mtitle = paste(myTitle,'Heatmap: n contigs =',nrows) - if (nrows > nsamp) { - cmat = cmat[c(1:nsamp),] - mtitle = paste('Heatmap: Top ',nsamp,' DE contigs (of ',nrows,')',sep='') - } - newcolnames = substr(colnames(cmat),1,20) - colnames(cmat) = newcolnames - pdf(outpdfname) - heatmap(cmat,scale='row',main=mtitle,cexRow=0.3,cexCol=0.4,Rowv=NA,ColSideColors=pcols) - dev.off() -} - -qqPlot = function(descr='qqplot',pvector, outpdf='qqplot.pdf',...) -# stolen from https://gist.github.com/703512 -{ - o = -log10(sort(pvector,decreasing=F)) - e = -log10( 1:length(o)/length(o) ) - o[o==-Inf] = reallysmall - o[o==Inf] = reallybig - maint = descr - pdf(outpdf) - plot(e,o,pch=19,cex=1, main=maint, ..., - xlab=expression(Expected~~-log[10](italic(p))), - ylab=expression(Observed~~-log[10](italic(p))), - xlim=c(0,max(e)), ylim=c(0,max(o))) - lines(e,e,col="red") - grid(col = "lightgray", lty = "dotted") - dev.off() -} - -smearPlot = function(DGEList,deTags, outSmear, outMain) - { - pdf(outSmear) - plotSmear(DGEList,de.tags=deTags,main=outMain) - grid(col="lightgray", lty="dotted") - dev.off() - } - -boxPlot = function(rawrs,cleanrs,maint,myTitle,pdfname) -{ # - nc = ncol(rawrs) - for (i in c(1:nc)) {rawrs[(rawrs[,i] < 0),i] = NA} - fullnames = colnames(rawrs) - newcolnames = substr(colnames(rawrs),1,20) - colnames(rawrs) = newcolnames - newcolnames = substr(colnames(cleanrs),1,20) - colnames(cleanrs) = newcolnames - defpar = par(no.readonly=T) - print.noquote('raw contig counts by sample:') - print.noquote(summary(rawrs)) - print.noquote('normalised contig counts by sample:') - print.noquote(summary(cleanrs)) - pdf(pdfname) - par(mfrow=c(1,2)) - boxplot(rawrs,varwidth=T,notch=T,ylab='log contig count',col="maroon",las=3,cex.axis=0.35,main=paste('Raw:',maint)) - grid(col="lightgray",lty="dotted") - boxplot(cleanrs,varwidth=T,notch=T,ylab='log contig count',col="maroon",las=3,cex.axis=0.35,main=paste('After ',maint)) - grid(col="lightgray",lty="dotted") - dev.off() - pdfname = "sample_counts_histogram.pdf" - nc = ncol(rawrs) - print.noquote(paste('Using ncol rawrs=',nc)) - ncroot = round(sqrt(nc)) - if (ncroot*ncroot < nc) { ncroot = ncroot + 1 } - m = c() - for (i in c(1:nc)) { - rhist = hist(rawrs[,i],breaks=100,plot=F) - m = append(m,max(rhist\$counts)) - } - ymax = max(m) - ncols = length(fullnames) - if (ncols > 20) - { - scale = 7*ncols/20 - pdf(pdfname,width=scale,height=scale) - } else { - pdf(pdfname) - } - par(mfrow=c(ncroot,ncroot)) - for (i in c(1:nc)) { - hist(rawrs[,i], main=paste("Contig logcount",i), xlab='log raw count', col="maroon", - breaks=100,sub=fullnames[i],cex=0.8,ylim=c(0,ymax)) - } - dev.off() - par(defpar) - -} - -cumPlot = function(rawrs,cleanrs,maint,myTitle) -{ # updated to use ecdf - pdfname = "Filtering_rowsum_bar_charts.pdf" - defpar = par(no.readonly=T) - lrs = log(rawrs,10) - lim = max(lrs) - pdf(pdfname) - par(mfrow=c(2,1)) - hist(lrs,breaks=100,main=paste('Before:',maint),xlab="# Reads (log)", - ylab="Count",col="maroon",sub=myTitle, xlim=c(0,lim),las=1) - grid(col="lightgray", lty="dotted") - lrs = log(cleanrs,10) - hist(lrs,breaks=100,main=paste('After:',maint),xlab="# Reads (log)", - ylab="Count",col="maroon",sub=myTitle,xlim=c(0,lim),las=1) - grid(col="lightgray", lty="dotted") - dev.off() - par(defpar) -} - -cumPlot1 = function(rawrs,cleanrs,maint,myTitle) -{ # updated to use ecdf - pdfname = paste(gsub(" ","", myTitle , fixed=TRUE),"RowsumCum.pdf",sep='_') - pdf(pdfname) - par(mfrow=c(2,1)) - lastx = max(rawrs) - rawe = knots(ecdf(rawrs)) - cleane = knots(ecdf(cleanrs)) - cy = 1:length(cleane)/length(cleane) - ry = 1:length(rawe)/length(rawe) - plot(rawe,ry,type='l',main=paste('Before',maint),xlab="Log Contig Total Reads", - ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle) - grid(col="blue") - plot(cleane,cy,type='l',main=paste('After',maint),xlab="Log Contig Total Reads", - ylab="Cumulative proportion",col="maroon",log='x',xlim=c(1,lastx),sub=myTitle) - grid(col="blue") - dev.off() -} - - - -doGSEAold = function(y=NULL,design=NULL,histgmt="", - bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt", - ntest=0, myTitle="myTitle", outfname="GSEA.xls", minnin=5, maxnin=2000,fdrthresh=0.05,fdrtype="BH") -{ - sink('Camera.log') - genesets = c() - if (bigmt > "") - { - bigenesets = readLines(bigmt) - genesets = bigenesets - } - if (histgmt > "") - { - hgenesets = readLines(histgmt) - if (bigmt > "") { - genesets = rbind(genesets,hgenesets) - } else { - genesets = hgenesets - } # use only history if no bi - } - print.noquote(paste("@@@read",length(genesets), 'genesets from',histgmt,bigmt)) - genesets = strsplit(genesets,'\t') # tabular. genesetid\tURLorwhatever\tgene_1\t..\tgene_n - outf = outfname - head=paste(myTitle,'edgeR GSEA') - write(head,file=outfname,append=F) - ntest=length(genesets) - urownames = toupper(rownames(y)) - upcam = c() - downcam = c() - for (i in 1:ntest) { - gs = unlist(genesets[i]) - g = gs[1] # geneset_id - u = gs[2] - if (u > "") { u = paste("<a href=\'",u,"\'>",u,"</a>",sep="") } - glist = gs[3:length(gs)] # member gene symbols - glist = toupper(glist) - inglist = urownames %in% glist - nin = sum(inglist) - if ((nin > minnin) && (nin < maxnin)) { - ### print(paste('@@found',sum(inglist),'genes in glist')) - camres = camera(y=y,index=inglist,design=design) - if (! is.null(camres)) { - rownames(camres) = g # gene set name - camres = cbind(GeneSet=g,URL=u,camres) - if (camres\$Direction == "Up") - { - upcam = rbind(upcam,camres) } else { - downcam = rbind(downcam,camres) - } - } - } - } - uscam = upcam[order(upcam\$PValue),] - unadjp = uscam\$PValue - uscam\$adjPValue = p.adjust(unadjp,method=fdrtype) - nup = max(10,sum((uscam\$adjPValue < fdrthresh))) - dscam = downcam[order(downcam\$PValue),] - unadjp = dscam\$PValue - dscam\$adjPValue = p.adjust(unadjp,method=fdrtype) - ndown = max(10,sum((dscam\$adjPValue < fdrthresh))) - write.table(uscam,file=paste('camera_up',outfname,sep='_'),quote=F,sep='\t',row.names=F) - write.table(dscam,file=paste('camera_down',outfname,sep='_'),quote=F,sep='\t',row.names=F) - print.noquote(paste('@@@@@ Camera up top',nup,'gene sets:')) - write.table(head(uscam,nup),file="",quote=F,sep='\t',row.names=F) - print.noquote(paste('@@@@@ Camera down top',ndown,'gene sets:')) - write.table(head(dscam,ndown),file="",quote=F,sep='\t',row.names=F) - sink() -} - - - - -doGSEA = function(y=NULL,design=NULL,histgmt="", - bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt", - ntest=0, myTitle="myTitle", outfname="GSEA.xls", minnin=5, maxnin=2000,fdrthresh=0.05,fdrtype="BH") -{ - sink('Camera.log') - genesets = c() - if (bigmt > "") - { - bigenesets = readLines(bigmt) - genesets = bigenesets - } - if (histgmt > "") - { - hgenesets = readLines(histgmt) - if (bigmt > "") { - genesets = rbind(genesets,hgenesets) - } else { - genesets = hgenesets - } # use only history if no bi - } - print.noquote(paste("@@@read",length(genesets), 'genesets from',histgmt,bigmt)) - genesets = strsplit(genesets,'\t') # tabular. genesetid\tURLorwhatever\tgene_1\t..\tgene_n - outf = outfname - head=paste(myTitle,'edgeR GSEA') - write(head,file=outfname,append=F) - ntest=length(genesets) - urownames = toupper(rownames(y)) - upcam = c() - downcam = c() - incam = c() - urls = c() - gsids = c() - for (i in 1:ntest) { - gs = unlist(genesets[i]) - gsid = gs[1] # geneset_id - url = gs[2] - if (url > "") { url = paste("<a href=\'",url,"\'>",url,"</a>",sep="") } - glist = gs[3:length(gs)] # member gene symbols - glist = toupper(glist) - inglist = urownames %in% glist - nin = sum(inglist) - if ((nin > minnin) && (nin < maxnin)) { - incam = c(incam,inglist) - gsids = c(gsids,gsid) - urls = c(urls,url) - } - } - incam = as.list(incam) - names(incam) = gsids - allcam = camera(y=y,index=incam,design=design) - allcamres = cbind(geneset=gsids,allcam,URL=urls) - for (i in 1:ntest) { - camres = allcamres[i] - res = try(test = (camres\$Direction == "Up")) - if ("try-error" %in% class(res)) { - cat("test failed, camres = :") - print.noquote(camres) - } else { if (camres\$Direction == "Up") - { upcam = rbind(upcam,camres) - } else { downcam = rbind(downcam,camres) - } - - } - } - uscam = upcam[order(upcam\$PValue),] - unadjp = uscam\$PValue - uscam\$adjPValue = p.adjust(unadjp,method=fdrtype) - nup = max(10,sum((uscam\$adjPValue < fdrthresh))) - dscam = downcam[order(downcam\$PValue),] - unadjp = dscam\$PValue - dscam\$adjPValue = p.adjust(unadjp,method=fdrtype) - ndown = max(10,sum((dscam\$adjPValue < fdrthresh))) - write.table(uscam,file=paste('camera_up',outfname,sep='_'),quote=F,sep='\t',row.names=F) - write.table(dscam,file=paste('camera_down',outfname,sep='_'),quote=F,sep='\t',row.names=F) - print.noquote(paste('@@@@@ Camera up top',nup,'gene sets:')) - write.table(head(uscam,nup),file="",quote=F,sep='\t',row.names=F) - print.noquote(paste('@@@@@ Camera down top',ndown,'gene sets:')) - write.table(head(dscam,ndown),file="",quote=F,sep='\t',row.names=F) - sink() - } - - -edgeIt = function (Count_Matrix=c(),group=c(),out_edgeR=F,out_VOOM=F,out_DESeq2=F,fdrtype='fdr',priordf=5, - fdrthresh=0.05,outputdir='.', myTitle='Differential Counts',libSize=c(),useNDF=F, - filterquantile=0.2, subjects=c(),mydesign=NULL, - doDESeq2=T,doVoom=T,doCamera=T,doedgeR=T,org='hg19', - histgmt="", bigmt="/data/genomes/gsea/3.1/Abetterchoice_nocgp_c2_c3_c5_symbols_all.gmt", - doCook=F,DESeq_fitType="parameteric") -{ - # Error handling - if (length(unique(group))!=2){ - print("Number of conditions identified in experiment does not equal 2") - q() - } - require(edgeR) - options(width = 512) - mt = paste(unlist(strsplit(myTitle,'_')),collapse=" ") - allN = nrow(Count_Matrix) - nscut = round(ncol(Count_Matrix)/2) - colTotmillionreads = colSums(Count_Matrix)/1e6 - counts.dataframe = as.data.frame(c()) - rawrs = rowSums(Count_Matrix) - nonzerod = Count_Matrix[(rawrs > 0),] # remove all zero count genes - nzN = nrow(nonzerod) - nzrs = rowSums(nonzerod) - zN = allN - nzN - print('# Quantiles for non-zero row counts:',quote=F) - print(quantile(nzrs,probs=seq(0,1,0.1)),quote=F) - if (useNDF == T) - { - gt1rpin3 = rowSums(Count_Matrix/expandAsMatrix(colTotmillionreads,dim(Count_Matrix)) >= 1) >= nscut - lo = colSums(Count_Matrix[!gt1rpin3,]) - workCM = Count_Matrix[gt1rpin3,] - cleanrs = rowSums(workCM) - cleanN = length(cleanrs) - meth = paste( "After removing",length(lo),"contigs with fewer than ",nscut," sample read counts >= 1 per million, there are",sep="") - print(paste("Read",allN,"contigs. Removed",zN,"contigs with no reads.",meth,cleanN,"contigs"),quote=F) - maint = paste('Filter >=1/million reads in >=',nscut,'samples') - } else { - useme = (nzrs > quantile(nzrs,filterquantile)) - workCM = nonzerod[useme,] - lo = colSums(nonzerod[!useme,]) - cleanrs = rowSums(workCM) - cleanN = length(cleanrs) - meth = paste("After filtering at count quantile =",filterquantile,", there are",sep="") - print(paste('Read',allN,"contigs. Removed",zN,"with no reads.",meth,cleanN,"contigs"),quote=F) - maint = paste('Filter below',filterquantile,'quantile') - } - cumPlot(rawrs=rawrs,cleanrs=cleanrs,maint=maint,myTitle=myTitle) - allgenes = rownames(workCM) - reg = "^chr([0-9]+):([0-9]+)-([0-9]+)" - genecards="<a href=\'http://www.genecards.org/index.php?path=/Search/keyword/" - ucsc = paste("<a href=\'http://genome.ucsc.edu/cgi-bin/hgTracks?db=",org,sep='') - testreg = str_match(allgenes,reg) - if (sum(!is.na(testreg[,1]))/length(testreg[,1]) > 0.8) # is ucsc style string - { - print("@@ using ucsc substitution for urls") - contigurls = paste0(ucsc,"&position=chr",testreg[,2],":",testreg[,3],"-",testreg[,4],"\'>",allgenes,"</a>") - } else { - print("@@ using genecards substitution for urls") - contigurls = paste0(genecards,allgenes,"\'>",allgenes,"</a>") - } - print.noquote("# urls") - print.noquote(head(contigurls)) - print(paste("# Total low count contigs per sample = ",paste(lo,collapse=',')),quote=F) - cmrowsums = rowSums(workCM) - TName=unique(group)[1] - CName=unique(group)[2] - if (is.null(mydesign)) { - if (length(subjects) == 0) - { - mydesign = model.matrix(~group) - } - else { - subjf = factor(subjects) - mydesign = model.matrix(~subjf+group) # we block on subject so make group last to simplify finding it - } - } - print.noquote(paste('Using samples:',paste(colnames(workCM),collapse=','))) - print.noquote('Using design matrix:') - print.noquote(mydesign) - if (doedgeR) { - sink('edgeR.log') - #### Setup DGEList object - DGEList = DGEList(counts=workCM, group = group) - DGEList = calcNormFactors(DGEList) - - DGEList = estimateGLMCommonDisp(DGEList,mydesign) - comdisp = DGEList\$common.dispersion - DGEList = estimateGLMTrendedDisp(DGEList,mydesign) - if (edgeR_priordf > 0) { - print.noquote(paste("prior.df =",edgeR_priordf)) - DGEList = estimateGLMTagwiseDisp(DGEList,mydesign,prior.df = edgeR_priordf) - } else { - DGEList = estimateGLMTagwiseDisp(DGEList,mydesign) - } - DGLM = glmFit(DGEList,design=mydesign) - DE = glmLRT(DGLM,coef=ncol(DGLM\$design)) # always last one - subject is first if needed - efflib = DGEList\$samples\$lib.size*DGEList\$samples\$norm.factors - normData = (1e+06*DGEList\$counts/efflib) - uoutput = cbind( - Name=as.character(rownames(DGEList\$counts)), - DE\$table, - adj.p.value=p.adjust(DE\$table\$PValue, method=fdrtype), - Dispersion=DGEList\$tagwise.dispersion,totreads=cmrowsums,normData, - DGEList\$counts - ) - soutput = uoutput[order(DE\$table\$PValue),] # sorted into p value order - for quick toptable - goodness = gof(DGLM, pcutoff=fdrthresh) - if (sum(goodness\$outlier) > 0) { - print.noquote('GLM outliers:') - print(paste(rownames(DGLM)[(goodness\$outlier)],collapse=','),quote=F) - } else { - print('No GLM fit outlier genes found\n') - } - z = limma::zscoreGamma(goodness\$gof.statistic, shape=goodness\$df/2, scale=2) - pdf("edgeR_GoodnessofFit.pdf") - qq = qqnorm(z, panel.first=grid(), main="tagwise dispersion") - abline(0,1,lwd=3) - points(qq\$x[goodness\$outlier],qq\$y[goodness\$outlier], pch=16, col="maroon") - dev.off() - estpriorn = getPriorN(DGEList) - print(paste("Common Dispersion =",comdisp,"CV = ",sqrt(comdisp),"getPriorN = ",estpriorn),quote=F) - efflib = DGEList\$samples\$lib.size*DGEList\$samples\$norm.factors - normData = (1e+06*DGEList\$counts/efflib) - uniqueg = unique(group) - #### Plot MDS - sample_colors = match(group,levels(group)) - sampleTypes = levels(factor(group)) - print.noquote(sampleTypes) - pdf("edgeR_MDSplot.pdf") - plotMDS.DGEList(DGEList,main=paste("edgeR MDS for",myTitle),cex=0.5,col=sample_colors,pch=sample_colors) - legend(x="topleft", legend = sampleTypes,col=c(1:length(sampleTypes)), pch=19) - grid(col="blue") - dev.off() - colnames(normData) = paste( colnames(normData),'N',sep="_") - print(paste('Raw sample read totals',paste(colSums(nonzerod,na.rm=T),collapse=','))) - nzd = data.frame(log(nonzerod + 1e-2,10)) - try( boxPlot(rawrs=nzd,cleanrs=log(normData,10),maint='TMM Normalisation',myTitle=myTitle,pdfname="edgeR_raw_norm_counts_box.pdf") ) - write.table(soutput,file=out_edgeR, quote=FALSE, sep="\t",row.names=F) - tt = cbind( - Name=as.character(rownames(DGEList\$counts)), - DE\$table, - adj.p.value=p.adjust(DE\$table\$PValue, method=fdrtype), - Dispersion=DGEList\$tagwise.dispersion,totreads=cmrowsums - ) - print.noquote("# edgeR Top tags\n") - tt = cbind(tt,URL=contigurls) # add to end so table isn't laid out strangely - tt = tt[order(DE\$table\$PValue),] - print.noquote(tt[1:50,]) - deTags = rownames(uoutput[uoutput\$adj.p.value < fdrthresh,]) - nsig = length(deTags) - print(paste('#',nsig,'tags significant at adj p=',fdrthresh),quote=F) - deColours = ifelse(deTags,'red','black') - pdf("edgeR_BCV_vs_abundance.pdf") - plotBCV(DGEList, cex=0.3, main="Biological CV vs abundance") - dev.off() - dg = DGEList[order(DE\$table\$PValue),] - #normData = (1e+06 * dg\$counts/expandAsMatrix(dg\$samples\$lib.size, dim(dg))) - efflib = dg\$samples\$lib.size*dg\$samples\$norm.factors - normData = (1e+06*dg\$counts/efflib) - outpdfname="edgeR_top_100_heatmap.pdf" - hmap2(normData,nsamp=100,TName=TName,group=group,outpdfname=outpdfname,myTitle=paste('edgeR Heatmap',myTitle)) - outSmear = "edgeR_smearplot.pdf" - outMain = paste("Smear Plot for ",TName,' Vs ',CName,' (FDR@',fdrthresh,' N = ',nsig,')',sep='') - smearPlot(DGEList=DGEList,deTags=deTags, outSmear=outSmear, outMain = outMain) - qqPlot(descr=paste(myTitle,'edgeR adj p QQ plot'),pvector=tt\$adj.p.value,outpdf='edgeR_qqplot.pdf') - norm.factor = DGEList\$samples\$norm.factors - topresults.edgeR = soutput[which(soutput\$adj.p.value < fdrthresh), ] - edgeRcountsindex = which(allgenes %in% rownames(topresults.edgeR)) - edgeRcounts = rep(0, length(allgenes)) - edgeRcounts[edgeRcountsindex] = 1 # Create venn diagram of hits - sink() - } ### doedgeR - if (doDESeq2 == T) - { - sink("DESeq2.log") - # DESeq2 - require('DESeq2') - library('RColorBrewer') - if (length(subjects) == 0) - { - pdata = data.frame(Name=colnames(workCM),Rx=group,row.names=colnames(workCM)) - deSEQds = DESeqDataSetFromMatrix(countData = workCM, colData = pdata, design = formula(~ Rx)) - } else { - pdata = data.frame(Name=colnames(workCM),Rx=group,subjects=subjects,row.names=colnames(workCM)) - deSEQds = DESeqDataSetFromMatrix(countData = workCM, colData = pdata, design = formula(~ subjects + Rx)) - } - #DESeq2 = DESeq(deSEQds,fitType='local',pAdjustMethod=fdrtype) - #rDESeq = results(DESeq2) - #newCountDataSet(workCM, group) - deSeqDatsizefac = estimateSizeFactors(deSEQds) - deSeqDatdisp = estimateDispersions(deSeqDatsizefac,fitType=DESeq_fitType) - resDESeq = nbinomWaldTest(deSeqDatdisp, pAdjustMethod=fdrtype) - rDESeq = as.data.frame(results(resDESeq)) - rDESeq = cbind(Contig=rownames(workCM),rDESeq,NReads=cmrowsums,URL=contigurls) - srDESeq = rDESeq[order(rDESeq\$pvalue),] - qqPlot(descr=paste(myTitle,'DESeq2 adj p qq plot'),pvector=rDESeq\$padj,outpdf='DESeq2_qqplot.pdf') - cat("# DESeq top 50\n") - print.noquote(srDESeq[1:50,]) - write.table(srDESeq,file=out_DESeq2, quote=FALSE, sep="\t",row.names=F) - topresults.DESeq = rDESeq[which(rDESeq\$padj < fdrthresh), ] - DESeqcountsindex = which(allgenes %in% rownames(topresults.DESeq)) - DESeqcounts = rep(0, length(allgenes)) - DESeqcounts[DESeqcountsindex] = 1 - pdf("DESeq2_dispersion_estimates.pdf") - plotDispEsts(resDESeq) - dev.off() - ysmall = abs(min(rDESeq\$log2FoldChange)) - ybig = abs(max(rDESeq\$log2FoldChange)) - ylimit = min(4,ysmall,ybig) - pdf("DESeq2_MA_plot.pdf") - plotMA(resDESeq,main=paste(myTitle,"DESeq2 MA plot"),ylim=c(-ylimit,ylimit)) - dev.off() - rlogres = rlogTransformation(resDESeq) - sampledists = dist( t( assay(rlogres) ) ) - sdmat = as.matrix(sampledists) - pdf("DESeq2_sample_distance_plot.pdf") - heatmap.2(sdmat,trace="none",main=paste(myTitle,"DESeq2 sample distances"), - col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255)) - dev.off() - ###outpdfname="DESeq2_top50_heatmap.pdf" - ###hmap2(sresDESeq,nsamp=50,TName=TName,group=group,outpdfname=outpdfname,myTitle=paste('DESeq2 vst rlog Heatmap',myTitle)) - sink() - result = try( (ppca = plotPCA( varianceStabilizingTransformation(deSeqDatdisp,blind=T), intgroup=c("Rx","Name")) ) ) - if ("try-error" %in% class(result)) { - print.noquote('DESeq2 plotPCA failed.') - } else { - pdf("DESeq2_PCA_plot.pdf") - #### wtf - print? Seems needed to get this to work - print(ppca) - dev.off() - } - } - - if (doVoom == T) { - sink('VOOM.log') - if (doedgeR == F) { - #### Setup DGEList object - DGEList = DGEList(counts=workCM, group = group) - DGEList = calcNormFactors(DGEList) - DGEList = estimateGLMCommonDisp(DGEList,mydesign) - DGEList = estimateGLMTrendedDisp(DGEList,mydesign) - DGEList = estimateGLMTagwiseDisp(DGEList,mydesign) - DGEList = estimateGLMTagwiseDisp(DGEList,mydesign) - norm.factor = DGEList\$samples\$norm.factors - } - pdf("VOOM_mean_variance_plot.pdf") - dat.voomed = voom(DGEList, mydesign, plot = TRUE, lib.size = colSums(workCM) * norm.factor) - dev.off() - # Use limma to fit data - fit = lmFit(dat.voomed, mydesign) - fit = eBayes(fit) - rvoom = topTable(fit, coef = length(colnames(mydesign)), adj = fdrtype, n = Inf, sort="none") - qqPlot(descr=paste(myTitle,'VOOM-limma adj p QQ plot'),pvector=rvoom\$adj.P.Val,outpdf='VOOM_qqplot.pdf') - rownames(rvoom) = rownames(workCM) - rvoom = cbind(rvoom,NReads=cmrowsums,URL=contigurls) - srvoom = rvoom[order(rvoom\$P.Value),] - cat("# VOOM top 50\n") - print(srvoom[1:50,]) - write.table(srvoom,file=out_VOOM, quote=FALSE, sep="\t",row.names=F) - # Use an FDR cutoff to find interesting samples for edgeR, DESeq and voom/limma - topresults.voom = rvoom[which(rvoom\$adj.P.Val < fdrthresh), ] - voomcountsindex = which(allgenes %in% topresults.voom\$ID) - voomcounts = rep(0, length(allgenes)) - voomcounts[voomcountsindex] = 1 - sink() - } - - if (doCamera) { - doGSEA(y=DGEList,design=mydesign,histgmt=histgmt,bigmt=bigmt,ntest=20,myTitle=myTitle, - outfname=paste(mt,"GSEA.xls",sep="_"),fdrthresh=fdrthresh,fdrtype=fdrtype) - } - - if ((doDESeq2==T) || (doVoom==T) || (doedgeR==T)) { - if ((doVoom==T) && (doDESeq2==T) && (doedgeR==T)) { - vennmain = paste(mt,'Voom,edgeR and DESeq2 overlap at FDR=',fdrthresh) - counts.dataframe = data.frame(edgeR = edgeRcounts, DESeq2 = DESeqcounts, - VOOM_limma = voomcounts, row.names = allgenes) - } else if ((doDESeq2==T) && (doedgeR==T)) { - vennmain = paste(mt,'DESeq2 and edgeR overlap at FDR=',fdrthresh) - counts.dataframe = data.frame(edgeR = edgeRcounts, DESeq2 = DESeqcounts, row.names = allgenes) - } else if ((doVoom==T) && (doedgeR==T)) { - vennmain = paste(mt,'Voom and edgeR overlap at FDR=',fdrthresh) - counts.dataframe = data.frame(edgeR = edgeRcounts, VOOM_limma = voomcounts, row.names = allgenes) - } - - if (nrow(counts.dataframe > 1)) { - counts.venn = vennCounts(counts.dataframe) - vennf = "Venn_significant_genes_overlap.pdf" - pdf(vennf) - vennDiagram(counts.venn,main=vennmain,col="maroon") - dev.off() - } - } #### doDESeq2 or doVoom - -} -#### Done - -###sink(stdout(),append=T,type="message") -builtin_gmt = "" -history_gmt = "" -history_gmt_name = "" -out_edgeR = F -out_DESeq2 = F -out_VOOM = "$out_VOOM" -doDESeq2 = $DESeq2.doDESeq2 # make these T or F -doVoom = $doVoom -doCamera = $camera.doCamera -doedgeR = $edgeR.doedgeR -edgeR_priordf = 0 - - -#if $doVoom == "T": - out_VOOM = "$out_VOOM" -#end if - -#if $DESeq2.doDESeq2 == "T": - out_DESeq2 = "$out_DESeq2" - DESeq_fitType = "$DESeq2.DESeq_fitType" -#end if - -#if $edgeR.doedgeR == "T": - out_edgeR = "$out_edgeR" - edgeR_priordf = $edgeR.edgeR_priordf -#end if - -#if $camera.doCamera == 'T' - #if $camera.gmtSource.refgmtSource == "indexed" or $camera.gmtSource.refgmtSource == "both": - builtin_gmt = "${camera.gmtSource.builtinGMT.fields.path}" - #end if - #if $camera.gmtSource.refgmtSource == "history" or $camera.gmtSource.refgmtSource == "both": - history_gmt = "${camera.gmtSource.ownGMT}" - history_gmt_name = "${camera.gmtSource.ownGMT.name}" - #end if -#end if - - -if (sum(c(doedgeR,doVoom,doDESeq2)) == 0) -{ -write("No methods chosen - nothing to do! Please try again after choosing one or more methods", stderr()) -quit(save="no",status=2) -} - -Out_Dir = "$html_file.files_path" -Input = "$input1" -TreatmentName = "$treatment_name" -TreatmentCols = "$Treat_cols" -ControlName = "$control_name" -ControlCols= "$Control_cols" -org = "$input1.dbkey" -if (org == "") { org = "hg19"} -fdrtype = "$fdrtype" -fdrthresh = $fdrthresh -useNDF = $useNDF -fQ = $fQ # non-differential centile cutoff -myTitle = "$title" -sids = strsplit("$subjectids",',') -subjects = unlist(sids) -nsubj = length(subjects) -TCols = as.numeric(strsplit(TreatmentCols,",")[[1]])-1 -CCols = as.numeric(strsplit(ControlCols,",")[[1]])-1 -cat('Got TCols=') -cat(TCols) -cat('; CCols=') -cat(CCols) -cat('\n') -useCols = c(TCols,CCols) -if (file.exists(Out_Dir) == F) dir.create(Out_Dir) -Count_Matrix = read.table(Input,header=T,row.names=1,sep='\t') #Load tab file assume header -snames = colnames(Count_Matrix) -nsamples = length(snames) -if (nsubj > 0 & nsubj != nsamples) { -options("show.error.messages"=T) -mess = paste('Fatal error: Supplied subject id list',paste(subjects,collapse=','), - 'has length',nsubj,'but there are',nsamples,'samples',paste(snames,collapse=',')) -write(mess, stderr()) -quit(save="no",status=4) -} -if (length(subjects) != 0) {subjects = subjects[useCols]} -Count_Matrix = Count_Matrix[,useCols] ### reorder columns -rn = rownames(Count_Matrix) -islib = rn %in% c('librarySize','NotInBedRegions') -LibSizes = Count_Matrix[subset(rn,islib),][1] # take first -Count_Matrix = Count_Matrix[subset(rn,! islib),] -group = c(rep(TreatmentName,length(TCols)), rep(ControlName,length(CCols)) ) #Build a group descriptor -group = factor(group, levels=c(ControlName,TreatmentName)) -colnames(Count_Matrix) = paste(group,colnames(Count_Matrix),sep="_") #Relable columns -results = edgeIt(Count_Matrix=Count_Matrix,group=group, out_edgeR=out_edgeR, out_VOOM=out_VOOM, out_DESeq2=out_DESeq2, - fdrtype='BH',mydesign=NULL,priordf=edgeR_priordf,fdrthresh=fdrthresh,outputdir='.', - myTitle=myTitle,useNDF=F,libSize=c(),filterquantile=fQ,subjects=subjects, - doDESeq2=doDESeq2,doVoom=doVoom,doCamera=doCamera,doedgeR=doedgeR,org=org, - histgmt=history_gmt,bigmt=builtin_gmt,DESeq_fitType=DESeq_fitType) -sessionInfo() -]]> -</configfile> -</configfiles> -<help> - -**What it does** - -Allows short read sequence counts from controlled experiments to be analysed for differentially expressed genes. -Optionally adds a term for subject if not all samples are independent or if some other factor needs to be blocked in the design. - -**Input** - -Requires a count matrix as a tabular file. These are best made using the companion HTSeq_ based counter Galaxy wrapper -and your fave gene model to generate inputs. Each row is a genomic feature (gene or exon eg) and each column the -non-negative integer count of reads from one sample overlapping the feature. -The matrix must have a header row uniquely identifying the source samples, and unique row names in -the first column. Typically the row names are gene symbols or probe ids for downstream use in GSEA and other methods. - -**Specifying comparisons** - -This is basically dumbed down for two factors - case vs control. - -More complex interfaces are possible but painful at present. -Probably need to specify a phenotype file to do this better. -Work in progress. Send code. - -If you have (eg) paired samples and wish to include a term in the GLM to account for some other factor (subject in the case of paired samples), -put a comma separated list of indicators for every sample (whether modelled or not!) indicating (eg) the subject number or -A list of integers, one for each subject or an empty string if samples are all independent. -If not empty, there must be exactly as many integers in the supplied integer list as there are columns (samples) in the count matrix. -Integers for samples that are not in the analysis *must* be present in the string as filler even if not used. - -So if you have 2 pairs out of 6 samples, you need to put in unique integers for the unpaired ones -eg if you had 6 samples with the first two independent but the second and third pairs each being from independent subjects. you might use -8,9,1,1,2,2 -as subject IDs to indicate two paired samples from the same subject in columns 3/4 and 5/6 - -**Methods available** - -You can run 3 popular Bioconductor packages available for count data. - -edgeR - see edgeR_ for details - -VOOM/limma - see limma_VOOM_ for details - -DESeq2 - see DESeq2_ for details - -and optionally camera in edgeR which works better if MSigDB is installed. - -**Outputs** - -Some helpful plots and analysis results. Note that most of these are produced using R code -suggested by the excellent documentation and vignettes for the Bioconductor -packages invoked. The Tool Factory is used to automatically lay these out for you to enjoy. - -**Note on Voom** - -The voom from limma version 3.16.6 help in R includes this from the authors - but you should read the paper to interpret this method. - -This function is intended to process RNA-Seq or ChIP-Seq data prior to linear modelling in limma. - -voom is an acronym for mean-variance modelling at the observational level. -The key concern is to estimate the mean-variance relationship in the data, then use this to compute appropriate weights for each observation. -Count data almost show non-trivial mean-variance relationships. Raw counts show increasing variance with increasing count size, while log-counts typically show a decreasing mean-variance trend. -This function estimates the mean-variance trend for log-counts, then assigns a weight to each observation based on its predicted variance. -The weights are then used in the linear modelling process to adjust for heteroscedasticity. - -In an experiment, a count value is observed for each tag in each sample. A tag-wise mean-variance trend is computed using lowess. -The tag-wise mean is the mean log2 count with an offset of 0.5, across samples for a given tag. -The tag-wise variance is the quarter-root-variance of normalized log2 counts per million values with an offset of 0.5, across samples for a given tag. -Tags with zero counts across all samples are not included in the lowess fit. Optional normalization is performed using normalizeBetweenArrays. -Using fitted values of log2 counts from a linear model fit by lmFit, variances from the mean-variance trend were interpolated for each observation. -This was carried out by approxfun. Inverse variance weights can be used to correct for mean-variance trend in the count data. - - -Author(s) - -Charity Law and Gordon Smyth - -References - -Law, CW (2013). Precision weights for gene expression analysis. PhD Thesis. University of Melbourne, Australia. - -Law, CW, Chen, Y, Shi, W, Smyth, GK (2013). Voom! Precision weights unlock linear model analysis tools for RNA-seq read counts. -Technical Report 1 May 2013, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Reseach, Melbourne, Australia. -http://www.statsci.org/smyth/pubs/VoomPreprint.pdf - -See Also - -A voom case study is given in the edgeR User's Guide. - -vooma is a similar function but for microarrays instead of RNA-seq. - - -***old rant on changes to Bioconductor package variable names between versions*** - -The edgeR authors made a small cosmetic change in the name of one important variable (from p.value to PValue) -breaking this and all other code that assumed the old name for this variable, -between edgeR2.4.4 and 2.4.6 (the version for R 2.14 as at the time of writing). -This means that all code using edgeR is sensitive to the version. I think this was a very unwise thing -to do because it wasted hours of my time to track down and will similarly cost other edgeR users dearly -when their old scripts break. This tool currently now works with 2.4.6. - -**Note on prior.N** - -http://seqanswers.com/forums/showthread.php?t=5591 says: - -*prior.n* - -The value for prior.n determines the amount of smoothing of tagwise dispersions towards the common dispersion. -You can think of it as like a "weight" for the common value. (It is actually the weight for the common likelihood -in the weighted likelihood equation). The larger the value for prior.n, the more smoothing, i.e. the closer your -tagwise dispersion estimates will be to the common dispersion. If you use a prior.n of 1, then that gives the -common likelihood the weight of one observation. - -In answer to your question, it is a good thing to squeeze the tagwise dispersions towards a common value, -or else you will be using very unreliable estimates of the dispersion. I would not recommend using the value that -you obtained from estimateSmoothing()---this is far too small and would result in virtually no moderation -(squeezing) of the tagwise dispersions. How many samples do you have in your experiment? -What is the experimental design? If you have few samples (less than 6) then I would suggest a prior.n of at least 10. -If you have more samples, then the tagwise dispersion estimates will be more reliable, -so you could consider using a smaller prior.n, although I would hesitate to use a prior.n less than 5. - - -From Bioconductor Digest, Vol 118, Issue 5, Gordon writes: - -Dear Dorota, - -The important settings are prior.df and trend. - -prior.n and prior.df are related through prior.df = prior.n * residual.df, -and your experiment has residual.df = 36 - 12 = 24. So the old setting of -prior.n=10 is equivalent for your data to prior.df = 240, a very large -value. Going the other way, the new setting of prior.df=10 is equivalent -to prior.n=10/24. - -To recover old results with the current software you would use - - estimateTagwiseDisp(object, prior.df=240, trend="none") - -To get the new default from old software you would use - - estimateTagwiseDisp(object, prior.n=10/24, trend=TRUE) - -Actually the old trend method is equivalent to trend="loess" in the new -software. You should use plotBCV(object) to see whether a trend is -required. - -Note you could also use - - prior.n = getPriorN(object, prior.df=10) - -to map between prior.df and prior.n. - ----- - -**Attributions** - -edgeR - edgeR_ - -VOOM/limma - limma_VOOM_ - -DESeq2 - DESeq2_ for details - -See above for Bioconductor package documentation for packages exposed in Galaxy by this tool and app store package. - -Galaxy_ (that's what you are using right now!) for gluing everything together - -Otherwise, all code and documentation comprising this tool was written by Ross Lazarus and is -licensed to you under the LGPL_ like other rgenetics artefacts - -.. _LGPL: http://www.gnu.org/copyleft/lesser.html -.. _HTSeq: http://www-huber.embl.de/users/anders/HTSeq/doc/index.html -.. _edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html -.. _DESeq2: http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html -.. _limma_VOOM: http://www.bioconductor.org/packages/release/bioc/html/limma.html -.. _Galaxy: http://getgalaxy.org -</help> - -</tool> - - This diff is so big that we needed to truncate the remainder. https://bitbucket.org/galaxy/galaxy-central/commits/2c281898882a/ Changeset: 2c281898882a User: fubar Date: 2013-10-08 01:52:21 Summary: backout -r 10800 Affected #: 4 files diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 lib/galaxy/config.py --- a/lib/galaxy/config.py +++ b/lib/galaxy/config.py @@ -109,7 +109,6 @@ self.allow_user_creation = string_as_bool( kwargs.get( "allow_user_creation", "True" ) ) self.allow_user_deletion = string_as_bool( kwargs.get( "allow_user_deletion", "False" ) ) self.allow_user_dataset_purge = string_as_bool( kwargs.get( "allow_user_dataset_purge", "False" ) ) - self.use_data_id_on_string = string_as_bool( kwargs.get( "use_data_id_on_string", "False" ) ) self.allow_user_impersonation = string_as_bool( kwargs.get( "allow_user_impersonation", "False" ) ) self.new_user_dataset_access_role_default_private = string_as_bool( kwargs.get( "new_user_dataset_access_role_default_private", "False" ) ) self.collect_outputs_from = [ x.strip() for x in kwargs.get( 'collect_outputs_from', 'new_file_path,job_working_directory' ).lower().split(',') ] diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 lib/galaxy/tools/actions/__init__.py --- a/lib/galaxy/tools/actions/__init__.py +++ b/lib/galaxy/tools/actions/__init__.py @@ -1,6 +1,5 @@ import os import galaxy.tools -import re from galaxy.exceptions import ObjectInvalid from galaxy.model import LibraryDatasetDatasetAssociation @@ -192,20 +191,14 @@ data = data.to_history_dataset_association( None ) inp_data[name] = data -# else: # HDA -# if data.hid: -# input_names.append( 'data %s' % data.hid ) + else: # HDA + if data.hid: + input_names.append( 'data %s' % data.hid ) input_ext = data.ext if data.dbkey not in [None, '?']: input_dbkey = data.dbkey - data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name) - if not trans.app.config.use_data_id_on_string: - # we want names in our on_strings not numbers - input_names.append(data_name_sane) - else: - if data.hid: - input_names.append('data %s' % data.hid) + # Collect chromInfo dataset and add as parameters to incoming db_datasets = {} db_dataset = trans.db_dataset_for( input_dbkey ) @@ -239,14 +232,11 @@ if len( input_names ) == 1: on_text = input_names[0] elif len( input_names ) == 2: - #on_text = '%s and %s' % tuple(input_names[0:2]) - on_text = '%s_%s' % tuple(input_names[0:2]) + on_text = '%s and %s' % tuple(input_names[0:2]) elif len( input_names ) == 3: - #on_text = '%s, %s, and %s' % tuple(input_names[0:3]) - on_text = '%s_%s_%s' % tuple(input_names[0:3]) + on_text = '%s, %s, and %s' % tuple(input_names[0:3]) elif len( input_names ) > 3: - #on_text = '%s, %s, and others' % tuple(input_names[0:2]) - on_text = '%s_%s_and_others' % tuple(input_names[0:2]) + on_text = '%s, %s, and others' % tuple(input_names[0:2]) else: on_text = "" # Add the dbkey to the incoming parameters diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 tools/ngs_rna/tophat2_wrapper.xml --- a/tools/ngs_rna/tophat2_wrapper.xml +++ b/tools/ngs_rna/tophat2_wrapper.xml @@ -273,19 +273,19 @@ </stdio><outputs> - <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out"> + <data format="tabular" name="fusions" label="${on_string}_tophat2fus.xls" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data> - <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed"> + <data format="bed" name="insertions" label="${on_string}_tophat2ins.bed" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"> + <data format="bed" name="deletions" label="${on_string}_tophhat2del.bed" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed"> + <data format="bed" name="junctions" label="${on_string}tophat2sj.bed" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data> - <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam"> + <data format="bam" name="accepted_hits" label="${on_string}tophat2hits.bam" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs> diff -r 962d981799cf7db1fbc5dbd0e964a93771703bb7 -r 2c281898882a3c35713166e489667fe084ce5413 tools/sr_mapping/bowtie2_wrapper.xml --- a/tools/sr_mapping/bowtie2_wrapper.xml +++ b/tools/sr_mapping/bowtie2_wrapper.xml @@ -113,6 +113,8 @@ </stdio><inputs> + <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" /> + <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?"> @@ -216,7 +218,7 @@ <!-- define outputs --><outputs> - <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" > + <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_{jobtitle}_unaligL.fastq" ><filter>unaligned_file is True</filter><actions><action type="format"> @@ -224,7 +226,7 @@ </action></actions></data> - <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)"> + <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_{jobtitle}_unaligR.fastq)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format"> @@ -232,7 +234,7 @@ </action></actions></data> - <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads"> + <data format="bam" name="output" label="${tool.name} on ${on_string}_{jobtitle}_aligned.bam"><actions><conditional name="reference_genome.source"><when value="indexed"> https://bitbucket.org/galaxy/galaxy-central/commits/6a39fac59cc0/ Changeset: 6a39fac59cc0 User: fubar Date: 2013-10-08 01:53:45 Summary: backout -r 10802 Affected #: 2 files diff -r 2c281898882a3c35713166e489667fe084ce5413 -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d lib/galaxy/tools/parameters/output.py --- a/lib/galaxy/tools/parameters/output.py +++ b/lib/galaxy/tools/parameters/output.py @@ -65,7 +65,7 @@ def is_case( self, output_dataset, other_values ): ref = self.get_ref( output_dataset, other_values ) return bool( str( ref ) == self.value ) - + class DatatypeIsInstanceToolOutputActionConditionalWhen( ToolOutputActionConditionalWhen ): tag = "when datatype_isinstance" def __init__( self, parent, config_elem, value ): @@ -215,11 +215,12 @@ self.offset = elem.get( 'offset', -1 ) self.offset = int( self.offset ) else: + self.options = [] self.missing_tool_data_table_name = self.name def get_value( self, other_values ): - try: + if self.options: options = self.options - except: + else: options = [] for filter in self.filters: options = filter.filter_options( options, other_values ) @@ -248,7 +249,7 @@ def __init__( self, parent, elem ): super( FormatToolOutputAction, self ).__init__( parent, elem ) self.default = elem.get( 'default', None ) - + def apply_action( self, output_dataset, other_values ): value = self.option.get_value( other_values ) if value is None and self.default is not None: @@ -431,7 +432,7 @@ value = str( getattr( ref.metadata, self.name ) ) rval = [] for fields in options: - if self.keep == ( self.compare( fields[self.column], value ) ): + if self.keep == ( self.compare( fields[self.column], value ) ): rval.append( fields ) return rval @@ -452,7 +453,7 @@ value = self.cast( value ) except: value = False #unable to cast or access value; treat as false - if self.keep == bool( value ): + if self.keep == bool( value ): rval.append( fields ) return rval @@ -480,7 +481,7 @@ option_types = {} for option_type in [ NullToolOutputActionOption, FromFileToolOutputActionOption, FromParamToolOutputActionOption, FromDataTableOutputActionOption ]: option_types[ option_type.tag ] = option_type - + filter_types = {} for filter_type in [ ParamValueToolOutputActionOptionFilter, InsertColumnToolOutputActionOptionFilter, MultipleSplitterFilter, ColumnStripFilter, MetadataValueFilter, BooleanFilter, StringFunctionFilter, ColumnReplaceFilter ]: filter_types[ filter_type.tag ] = filter_type @@ -524,5 +525,5 @@ def compare_re_search( value1, value2 ): #checks pattern=value2 in value1 return bool( re.search( value2, value1 ) ) - + compare_types = { 'eq':compare_eq, 'neq':compare_neq, 'gt':compare_gt, 'gte':compare_gte, 'lt':compare_lt, 'lte':compare_lte, 'in':compare_in, 'startswith':compare_startswith, 'endswith':compare_endswith, "re_search":compare_re_search } diff -r 2c281898882a3c35713166e489667fe084ce5413 -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d tool-data/bowtie2_indices.loc.sample --- a/tool-data/bowtie2_indices.loc.sample +++ b/tool-data/bowtie2_indices.loc.sample @@ -1,37 +1,37 @@ -#This is a sample file distributed with Galaxy that enables tools -#to use a directory of Bowtie2 indexed sequences data files. You will -#need to create these data files and then create a bowtie_indices.loc -#file similar to this one (store it in this directory) that points to -#the directories in which those files are stored. The bowtie2_indices.loc -#file has this format (longer white space characters are TAB characters): +# bowtie2_indices.loc.sample +# This is a *.loc.sample file distributed with Galaxy that enables tools +# to use a directory of indexed data files. This one is for Bowtie2 and Tophat2. +# See the wiki: http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup +# First create these data files and save them in your own data directory structure. +# Then, create a bowtie_indices.loc file to use those indexes with tools. +# Copy this file, save it with the same name (minus the .sample), +# follow the format examples, and store the result in this directory. +# The file should include an one line entry for each index set. +# The path points to the "basename" for the set, not a specific file. +# It has four text columns seperated by TABS. # -#<unique_build_id><dbkey><display_name><file_base_path> +# <unique_build_id><dbkey><display_name><file_base_path> # -#So, for example, if you had hg18 indexed stored in -#/depot/data2/galaxy/bowtie2/hg18/, -#then the bowtie2_indices.loc entry would look like this: +# So, for example, if you had hg18 indexes stored in: # -#hg18 hg18 hg18 /depot/data2/galaxy/bowtie2/hg18/hg18 +# /depot/data2/galaxy/hg19/bowtie2/ # -#and your /depot/data2/galaxy/bowtie2/hg18/ directory -#would contain hg18.*.ebwt files: +# containing hg19 genome and hg19.*.bt2 files, such as: +# -rw-rw-r-- 1 james james 914M Feb 10 18:56 hg19canon.fa +# -rw-rw-r-- 1 james james 914M Feb 10 18:56 hg19canon.1.bt2 +# -rw-rw-r-- 1 james james 683M Feb 10 18:56 hg19canon.2.bt2 +# -rw-rw-r-- 1 james james 3.3K Feb 10 16:54 hg19canon.3.bt2 +# -rw-rw-r-- 1 james james 683M Feb 10 16:54 hg19canon.4.bt2 +# -rw-rw-r-- 1 james james 914M Feb 10 20:45 hg19canon.rev.1.bt2 +# -rw-rw-r-- 1 james james 683M Feb 10 20:45 hg19canon.rev.2.bt2 # -#-rw-r--r-- 1 james universe 830134 2005-09-13 10:12 hg18.1.ebwt -#-rw-r--r-- 1 james universe 527388 2005-09-13 10:12 hg18.2.ebwt -#-rw-r--r-- 1 james universe 269808 2005-09-13 10:12 hg18.3.ebwt -#...etc... +# then the bowtie2_indices.loc entry could look like this: # -#Your bowtie2_indices.loc file should include an entry per line for each -#index set you have stored. The "file" in the path does not actually -#exist, but it is the prefix for the actual index files. For example: +#hg19 hg19 Human (hg19) /depot/data2/galaxy/hg19/bowtie2/hg19canon # -#hg18canon hg18 hg18 Canonical /depot/data2/galaxy/bowtie2/hg18/hg18canon -#hg18full hg18 hg18 Full /depot/data2/galaxy/bowtie2/hg18/hg18full -#/orig/path/hg19 hg19 hg19 /depot/data2/galaxy/bowtie2/hg19/hg19 -#...etc... +#More examples: # -#Note that for backwards compatibility with workflows, the unique ID of -#an entry must be the path that was in the original loc file, because that -#is the value stored in the workflow for that parameter. That is why the -#hg19 entry above looks odd. New genomes can be better-looking. +#mm10 mm10 Mouse (mm10) /depot/data2/galaxy/mm10/bowtie2/mm10 +#dm3 dm3 D. melanogaster (dm3) /depot/data2/galaxy/mm10/bowtie2/dm3 # +# https://bitbucket.org/galaxy/galaxy-central/commits/99be46d34966/ Changeset: 99be46d34966 User: fubar Date: 2013-10-08 01:54:45 Summary: backout -r 10806 Affected #: 1 file diff -r 6a39fac59cc0cd2d0485ec5c6d5af273e8b7d78d -r 99be46d34966236412d0cf69198143ce71a8958d tools/sr_mapping/bowtie2_wrapper.xml --- a/tools/sr_mapping/bowtie2_wrapper.xml +++ b/tools/sr_mapping/bowtie2_wrapper.xml @@ -87,23 +87,22 @@ ## read group information #if str($read_group.selection) == "yes": #if $read_group.rgid and $read_group.rglb and $read_group.rgpl and $read_group.rgsm: - --rg-id $read_group.rgid - --rg LB:$read_group.rglb - --rg PL:$read_group.rgpl - --rg SM:$read_group.rgsm + --rg-id "$read_group.rgid" + --rg "LB:$read_group.rglb" + --rg "PL:$read_group.rgpl" + --rg "SM:$read_group.rgsm" #end if #end if ## view/sort and output file - | samtools view -Su - | samtools sort -o - - > $output; + | samtools view -Su - | samtools sort -o - - > $output ## rename unaligned sequence files #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r: #set left = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' ) #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' ) - - mv $left $output_unaligned_reads_l; - mv $right $output_unaligned_reads_r; + ;mv $left $output_unaligned_reads_l; + mv $right $output_unaligned_reads_r #end if </command> @@ -218,7 +217,7 @@ <!-- define outputs --><outputs> - <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_{jobtitle}_unaligL.fastq" > + <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" ><filter>unaligned_file is True</filter><actions><action type="format"> @@ -226,7 +225,7 @@ </action></actions></data> - <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_{jobtitle}_unaligR.fastq)"> + <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format"> @@ -234,7 +233,7 @@ </action></actions></data> - <data format="bam" name="output" label="${tool.name} on ${on_string}_{jobtitle}_aligned.bam"> + <data format="bam" name="output" label="${tool.name} on ${on_string}_${jobtitle}_aligned.bam"><actions><conditional name="reference_genome.source"><when value="indexed"> @@ -256,6 +255,17 @@ </outputs><tests> + <test> + <!-- basic test on single paired default run --> + <param name="type" value="single"/> + <param name="selection" value="no"/> + <param name="full" value="no"/> + <param name="unaligned_file" value="false"/> + <param name="source" value="history" /> + <param name="input_1" value="bowtie2/phix_reads.fastq" ftype="fastqsanger"/> + <param name="own_file" value="bowtie2/phix_genome.fasta" /> + <output name="output" file="bowtie2/phix_mapped.bam" /> + </test></tests><help> https://bitbucket.org/galaxy/galaxy-central/commits/99404250db29/ Changeset: 99404250db29 User: fubar Date: 2013-10-08 01:55:24 Summary: backout -r 10808 Affected #: 3 files diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a lib/galaxy/tools/actions/__init__.py --- a/lib/galaxy/tools/actions/__init__.py +++ b/lib/galaxy/tools/actions/__init__.py @@ -1,5 +1,6 @@ import os import galaxy.tools +import re from galaxy.exceptions import ObjectInvalid from galaxy.model import LibraryDatasetDatasetAssociation @@ -191,14 +192,20 @@ data = data.to_history_dataset_association( None ) inp_data[name] = data - else: # HDA - if data.hid: - input_names.append( 'data %s' % data.hid ) +# else: # HDA +# if data.hid: +# input_names.append( 'data %s' % data.hid ) input_ext = data.ext if data.dbkey not in [None, '?']: input_dbkey = data.dbkey - + data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name) + if trans.app.config.use_data_id_on_string: + # we want names in our on_strings not numbers + input_names.append(data_name_sane) + else: + if data.hid: + input_names.append('data %s' % data.hid) # Collect chromInfo dataset and add as parameters to incoming db_datasets = {} db_dataset = trans.db_dataset_for( input_dbkey ) @@ -232,11 +239,14 @@ if len( input_names ) == 1: on_text = input_names[0] elif len( input_names ) == 2: - on_text = '%s and %s' % tuple(input_names[0:2]) + #on_text = '%s and %s' % tuple(input_names[0:2]) + on_text = '%s_%s' % tuple(input_names[0:2]) elif len( input_names ) == 3: - on_text = '%s, %s, and %s' % tuple(input_names[0:3]) + #on_text = '%s, %s, and %s' % tuple(input_names[0:3]) + on_text = '%s_%s_%s' % tuple(input_names[0:3]) elif len( input_names ) > 3: - on_text = '%s, %s, and others' % tuple(input_names[0:2]) + #on_text = '%s, %s, and others' % tuple(input_names[0:2]) + on_text = '%s_%s_and_others' % tuple(input_names[0:2]) else: on_text = "" # Add the dbkey to the incoming parameters diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a scripts/functional_tests.py --- a/scripts/functional_tests.py +++ b/scripts/functional_tests.py @@ -454,5 +454,4 @@ return 1 if __name__ == "__main__": - print '\n\n\n\n#### SGE_ROOT=', os.environ.get('SGE_ROOT','##### no SGE_ROOT!'),'\n\n\n' sys.exit( main() ) diff -r 99be46d34966236412d0cf69198143ce71a8958d -r 99404250db290eafa86f1e97e13100d9f705284a tool_conf.xml.sample --- a/tool_conf.xml.sample +++ b/tool_conf.xml.sample @@ -1,7 +1,6 @@ <?xml version="1.0"?><toolbox><section name="Get Data" id="getext"> - <tool file="rlGAT/rlGAT.xml"/><tool file="data_source/upload.xml"/><tool file="data_source/ucsc_tablebrowser.xml" /><tool file="data_source/ucsc_tablebrowser_test.xml" /> https://bitbucket.org/galaxy/galaxy-central/commits/68b694a22c59/ Changeset: 68b694a22c59 User: fubar Date: 2013-10-08 01:56:03 Summary: backout -r 10811 Affected #: 2 files diff -r 99404250db290eafa86f1e97e13100d9f705284a -r 68b694a22c59b6419a568330bbd0a02186068526 run.sh --- a/run.sh +++ b/run.sh @@ -1,6 +1,7 @@ #!/bin/sh cd `dirname $0` + python ./scripts/check_python.py [ $? -ne 0 ] && exit 1 diff -r 99404250db290eafa86f1e97e13100d9f705284a -r 68b694a22c59b6419a568330bbd0a02186068526 tool-data/shared/ucsc/ucsc_build_sites.txt --- a/tool-data/shared/ucsc/ucsc_build_sites.txt +++ b/tool-data/shared/ucsc/ucsc_build_sites.txt @@ -1,6 +1,8 @@ #Harvested from http://genome.ucsc.edu/cgi-bin/das/dsn -main http://genome.ucsc.edu/cgi-bin/hgTracks? latCha1,eriEur1,petMar2,papAnu2,ce6,anoCar2,ce2,rn3,loxAfr3,rn5,rn4,droYak1,oreNil2,droYak2,dp3,dp2,vicPac1,vicPac2,caeRem2,caeRem3,geoFor1,sorAra1,caePb2,gadMor1,droAna1,droAna2,triMan1,sacCer2,sacCer3,oryCun2,dasNov3,droGri1,caeJap1,anoCar1,choHof1,taeGut1,sacCer1,tupBel1,ce4,macEug2,ochPri2,gorGor3,mm8,mm7,pteVam1,micMur1,galGal3,galGal2,proCap1,ornAna1,sarHar1,equCab2,equCab1,myoLuc2,rheMac3,rheMac2,droPer1,apiMel1,droVir2,droVir1,cerSim1,dm1,ailMel1,cb3,calJac1,calJac3,melUnd1,droSim1,hg16,hg17,hg18,hg19,monDom5,monDom4,oviAri1,petMar1,oviAri3,droSec1,saiBol1,aplCal1,strPur1,mm9,braFlo1,oryLat2,susScr2,susScr3,canFam1,canFam3,canFam2,echTel2,caePb1,echTel1,ci2,ci1,dm3,papHam1,ponAbe2,ce10,mm10,xenTro1,xenTro3,xenTro2,fr3,fr2,fr1,galGal4,gasAcu1,dm2,nomLeu3,nomLeu2,nomLeu1,apiMel2,priPac1,cavPor3,panTro4,panTro1,panTro3,panTro2,danRer7,danRer6,danRer5,danRer4,danRer3,dipOrd1,tetNig2,tetNig1,droMoj2,bosTau3,bosTau2,bosTau4,musFur1,bosTau6,melGal1,droEre1,hetGla2,turTru2,hetGla1,monDom1,cb1,droMoj1,speTri2,allMis1,strPur2,otoGar3,bosTau7,tarSyr1,felCat3,chrPic1,anoGam1,felCat4,felCat5 +main http://genome.ucsc.edu/cgi-bin/hgTracks? priPac1,danRer4,mm9,mm8,droAna1,mm5,caeRem2,mm7,mm6,panTro1,dm3,panTro2,anoCar1,ce4,galGal3,galGal2,ce1,rn3,rn2,droMoj1,droMoj2,rn4,droYak1,droYak2,dp3,dp2,dm1,canFam1,danRer5,canFam2,danRer3,danRer2,ornAna1,ci2,ci1,tetNig1,bosTau1,bosTau3,bosTau2,equCab1,oryLat1,droAna2,droEre1,ponAbe2,rheMac2,sacCer1,droPer1,droSim1,monDom1,cb1,dm2,droSec1,strPur1,droVir2,droVir1,strPur2,sc1,xenTro1,droGri1,xenTro2,cb3,gasAcu1,caePb1,anoGam1,fr2,fr1,hg15,hg16,hg17,hg18,hg19,felCat3,apiMel2,monDom4,apiMel1,ce2 #Harvested from http://archaea.ucsc.edu/cgi-bin/das/dsn -archaea http://archaea.ucsc.edu/cgi-bin/hgTracks? therSibi1,symbTher_IAM14863,haloBori1,zymoMobi_ZM4,therFusc_YX,methIgne1,methPalus1,nocaJS61,his2Viru,acidCryp_JF_5,lactLact,paraSp_UWE25,geobKaus_HTA426,rhizEtli_CFN_42,jannCCS1,syntFuma_MPOB,dichNodo_VCS1703A,eschColi_O157H7,burk383,alteMacl_ATCC27126,bradJapo,lebron_Phage,aerPer1,acinSp_ADP1,anapMarg_ST_MARIES,sulfIsla_L_D_8_5,desuPsyc_LSV54,sulfIsla_ROD_SHAPED_VIRUS_2,shewDeni,methBoon1,shewLoihPV4,igniHosp1,therTher_HB8,ferrAcid2,sulfIsla_REY15A,acidCell_11B,stapHell1,acidBoon1,salmTyph_LT2,methBurt2,pyroYaya1,haloHisp1,glucOxyd_621H,geobSulf,acidFila_VIRUS_6,soliUsit_ELLIN6076,methAeol1,procMari_CCMP1375,oenoOeni_PSU_1,alcaBork_SK2,wiggBrev,therUzon1,peloProp_DSM2379,orieTsut_BORYONG,salmEnte_PARATYPI_ATC,therOnnu1,onioYell_PHYTOPLASMA,mariAqua_VT8,heliPylo_J99,pf5,psycIngr_37,methMahi1,candPela_UBIQUE_HTCC1,myxoXant_DK_1622,salmTyph_TY2,chloChlo_CAD3,mesoFlor_L1,shewW318,pyro1860_1,closDiff_630,pyroFuri_COM1,methMaze1,shewWood,haloPhag_HF2,sulfViru_KAMCHATKA_1,ureaUrea,streCoel,polaJS66,rhodRubr_ATCC11170,rhodRHA1,pireSp,geobBemi_BEM,enteCloa_ATCC13047_uid48363,franTula_TULARENSIS,syneSp_WH8102,archProf1,methPetr_PM1,heliAcin_SHEEBA,thioCrun_XCL_2,haloHalo1,idioLoih_L2TR,xantCamp,sodaGlos_MORSITANS,coryEffi_YS_314,mycoTube_H37RV,polyQLWP,peloLute_DSM273,burkCeno_AU_1054,vibrVuln_CMCP6_1,ente638,neorSenn_MIYAYAMA,burkCepa_AMMD,methMari_C5_1,methZhil1,pyroFuma1,methBark1,geobMeta_GS15,eschColi_CFT073,metMar1,enteFaec_V583,hodgCica_DSEM,erwiCaro_ATROSEPTICA,methFerv1,igniAggr1,campJeju_81_176,pseuHalo_TAC125,chlaTrac,heliPylo_SHI470,chroViol,mycoGeni,methSp_FS406_1,therTher_HB27,sulfIsla_Y_G_57_14,uncuMeth_RCI,haloUtah1,sulfSolf_98_2,propAcne_KPA171202,acidJS42,chroSale_DSM3043,tropWhip_TW08_27,acidFila_VIRUS_2,natrPhar1,salmEnte_CHOLERAESUIS,rubrXyla_DSM9941,methVulc1,franCcI3,persMariEXH1,caldSacc_DSM8903,haloSali1,lactPlan,strePneu_R6_uid57859,siliPome_DSS_3,nitrWino_NB_255,mesoLoti,coxiBurn,acidFila_VIRUS_7,acidBact_ELLIN345,caldMaqu1,acidFila_VIRUS_3,amycMedi_U32,acidFila_VIRUS_1,therAggr1,acidFila_VIRUS_8,acidFila_VIRUS_9,methKand1,methCaps_BATH,burkCeno_HI2424,psycArct_273_4,buchSp,pyroCali1,chloTepi_TLS,bifiLong,picrTorr1,shewANA3,bdelBact,gramFors_KT0803,peloCarb,sulfIsla_ROD_SHAPED_V,haloHF1,sulfViru_STSV1,clavMich_NCPPB_382,baciSubt2,methPhag_PSIM100,therNeut1,saliTrop_CNB_440,campJeju,blocFlor,pastMult,saliRube_DSM13855,carbHydrZ2901,mycoPhag_BPS,halaJeot1,dehaEthe_195,mycoPhag_CORNDOG,pyroAbys_VIRUS_1,shewMR7,shewMR4,desuMuco1,geobUran_RF4,desuRedu_MI_1,methStad1,legiPneu_PHILADELPHIA,methSWAN1,gloeViol,moorTher_ATCC39073,acidRods_VIRUS_1,roseDeni_OCH_114,natrMaga1,xantOryz_MAFF_311018,shewHali,burkXeno_LB400,heliPylo_26695,geobFRC_32,eschColi_K_12_SUBSTR_W3110,haloArch_DL31,leptInte,methEves1,hydrY04A,tricEryt_IMS101,photLumi,peloTher_SI,therKoda1,desuAuda,lawsIntr_PHE_MN1_00,mycoSmeg_MC2_155,therBaro1,methOkin1,metaCupr1,therElon,shewPutrCN32,eschColi_O157H7_EDL93,mariMari_MCS10,methInfe1,shigFlex_2A,magnMagn_AMB_1,sulfIsla_FILAMENTOUS,erytLito_HTCC2594,coryGlut_Bielefeld,pyrAer1,thioDeni_ATCC25259,deinRadi,yersPest_CO92,therVolc1,wolbEndo_OF_DROSOPHIL,acidHosp1,shewSedi_HAW_EB3,vulcMout1,ralsSola,crimd_Phage,natrPhag_PHICH1,neisMeni_Z2491_1,shewPeal,anabVari_ATCC29413,colwPsyc_34H,xantAuto_PY2,hyphNept_ATCC15444,vibrChol1,deinGeot_DSM11300,salmEnte_ENTERITIDIS,metaSedu,mycoPhag_GILES,lactSali_UCC118,pyroOgun2,flavJohn_UW101,mannSucc_MBEL55E,haemSomn_129PT,therTena2,haloXana1,methFerv_DSM2088,koraCryp1,oceaIhey,haloTurk1,haloWals1,baciHalo,hypeButy1,agroTume_C58_UWASH,pyroIsla1,cytoHutc_ATCC33406,nitrEuro,therMari,pyroArse1,nitrMari1,vibrFisc_ES114_1,methJann1,heliPylo_G27,vibrPara1,metAce1,neisGono_FA1090_1,methConc1,chlaPneu_CWL029,strePyog_M1_GAS,haloLacu1,novoArom_DSM12444,eschColi_536,azoaSp_EBN1,sulfToko1,listInno,sulfAcid1,geobTher_NG80_2,sulfSpin_VIRUS_4,arthFB24,porpGing_W83,leifXyli_XYLI_CTCB0,campFetu_82_40,mycoPhag_Firecracker,mycoPhag_Dori,methTher1,nostSp,sphiAlas_RB2256,saccDegr_2_40,bordBron,rickBell_RML369_C,thioDeni_ATCC33889,methSmit1,alkaMeta_QYMF,sulfIsla_M_16_27,halMar1,vermEise_EF01_2,granBeth_CGDNIH1,therPend1,sulfIsla_L_S_2_15,vibrChol_O395_1,nitrOcea_ATCC19707,campJeju_RM1221,pyroSphe_VIRUS,yersPseu_IP32953,woliSucc,caldSubt1,sulfTurr_ICOSAHEDRAL,methMarb1,aquiAeol,acidSacc1,burkPseu_1106A,tremPrin_PCIT,azorCaul2,synePCC6,hydrSp_128_5R1,baumCica_HOMALODISCA,shewBalt,therTeng,alkaEhrl_MLHE_1,sulSol1,eschColi_UTI89,methHung1,desuKamc1,pediPent_ATCC25745,therGamm1,nanEqu1,shewAmaz,hydr3684,desuVulg_HILDENBOROUG,haheChej_KCTC_2396,sulfIsla_Y_N_15_51,candBloc_FLORIDANUS,haloPhag_SH1,candCars_RUDDII,burkMall_ATCC23344,burkViet_G4,acidTwot_VIRUS,methVann1,archFulg1,fusoNucl,heliHepa,sulfViru_1,sulfViru_2,syntAcid_SB,salmTyph,trepPall,neisMeni_MC58_1,syntWolf_GOETTINGEN,syneElon_PCC_7942,haemInfl_KW20,haloHalo_SL1,plasFalc1,rhodSpha_2_4_1,archVene1,rhodPalu_CGA009,stapMari1,sinoMeli,sulfYell_SS_5,archBJ1_VIRUS,bartHens_HOUSTON_1,sulfAzorFu1,methMari_C6,methMari_C7,caulCres,crocWats_WH8501_0,rhodSpha_ATCC17025,magnMC1,rhodFerr_DSM15236,hydrTher1,heliPylo_HPAG1,pyrHor1,acidBott_VIRUS,sulfIsla_M_16_4,actiPleu_L20,paraDeni_PD1222,borrBurg,pseuAeru,methRumi1,methVolt1,cenaSymb1,geobLovl_SZ,pyrFur2,saccEryt_NRRL_2338,vibrChol_MO10_1,methPetr1,pyroNA2,yersPseu_IP31758,ehrlRumi_WELGEVONDEN,methLabrZ_1,vibrVuln_YJ016_1,vulcDist1,sulfIsla_M_14_25,his1Viru,methTherPT1,methFlag_KT,therAcid1,therPetr_RKU_1,shewFrig,photProf_SS9,burkThai_E264,nitrMult_ATCC25196,methMari_XI,neisMeni_FAM18_1,hermArse,haloVolc1,desuHafn_Y51,nocaFarc_IFM10152,anaeDeha_2CP_C,methPalud1,eschColi_APEC_O1,xyleFast,leucMese_ATCC8293,bactThet_VPI_5482,brucMeli,ferrPlac1,ralsEutr_JMP134,sulfViru_RAGGED_HILLS,pyrAby1,mculMari1,dechArom_RCB,stapAure_MU50,methPhag_PSIM2,eschColi_K12,sulfYO3A,haloMuko1,aeroHydr_ATCC7966,ther4557,baciAnth_AMES,shewOnei +archaea http://archaea.ucsc.edu/cgi-bin/hgTracks? alkaEhrl_MLHE_1,shewW318,idioLoih_L2TR,sulSol1,erwiCaro_ATROSEPTICA,symbTher_IAM14863,moorTher_ATCC39073,therFusc_YX,methHung1,bradJapo,therElon,shewPutrCN32,pediPent_ATCC25745,mariMari_MCS10,nanEqu1,baciSubt,chlaTrac,magnMagn_AMB_1,chroViol,ralsSola,acidCryp_JF_5,erytLito_HTCC2594,desuVulg_HILDENBOROUG,pyrAer1,sulfToko1,shewANA3,paraSp_UWE25,geobKaus_HTA426,rhizEtli_CFN_42,uncuMeth_RCI,candBloc_FLORIDANUS,deinRadi,yersPest_CO92,saccEryt_NRRL_2338,rhodRHA1,candCars_RUDDII,burkMall_ATCC23344,eschColi_O157H7,burk383,psycIngr_37,rhodSpha_2_4_1,wolbEndo_OF_DROSOPHIL,burkViet_G4,propAcne_KPA171202,enteFaec_V583,campJeju_81_176,acidJS42,heliPylo_26695,pseuHalo_TAC125,chroSale_DSM3043,methVann1,archFulg1,neisMeni_Z2491_1,fusoNucl,vermEise_EF01_2,anabVari_ATCC29413,tropWhip_TW08_27,heliHepa,acinSp_ADP1,anapMarg_ST_MARIES,natrPhar1,haheChej_KCTC_2396,therPetr_RKU_1,neisGono_FA1090_1,colwPsyc_34H,desuPsyc_LSV54,hyphNept_ATCC15444,vibrChol1,deinGeot_DSM11300,strePyog_M1_GAS,franCcI3,salmTyph,metaSedu,lactSali_UCC118,trepPall,neisMeni_MC58_1,syntWolf_GOETTINGEN,flavJohn_UW101,methBoon1,haemSomn_129PT,shewLoihPV4,igniHosp1,haemInfl_KW20,haloHalo_SL1,ferrAcid1,sphiAlas_RB2256,candPela_UBIQUE_HTCC1,caldSacc_DSM8903,aerPer1,lactPlan,carbHydr_Z_2901,therTher_HB8,vibrVuln_YJ016_1,rhodPalu_CGA009,acidCell_11B,siliPome_DSS_3,therVolc1,haloWals1,rubrXyla_DSM9941,shewAmaz,nocaJS61,vibrVuln_CMCP6_1,sinoMeli,ureaUrea,baciHalo,bartHens_HOUSTON_1,nitrWino_NB_255,hypeButy1,methBurt2,polaJS66,mesoLoti,methMari_C7,caulCres,neisMeni_FAM18_1,acidBact_ELLIN345,caldMaqu1,salmEnte_PARATYPI_ATC,glucOxyd_621H,cytoHutc_ATCC33406,nitrEuro,therMari,coxiBurn,woliSucc,heliPylo_HPAG1,mesoFlor_L1,pyrHor1,methAeol1,procMari_CCMP1375,pyroArse1,oenoOeni_PSU_1,alcaBork_SK2,wiggBrev,actiPleu_L20,lactLact,methJann1,paraDeni_PD1222,borrBurg,pyroIsla1,orieTsut_BORYONG,shewMR4,methKand1,methCaps_BATH,onioYell_PHYTOPLASMA,bordBron,cenaSymb1,burkCeno_HI2424,franTula_TULARENSIS,pyrFur2,mariAqua_VT8,heliPylo_J99,psycArct_273_4,vibrChol_MO10_1,vibrPara1,rickBell_RML369_C,metAce1,buchSp,ehrlRumi_WELGEVONDEN,methLabrZ_1,chlaPneu_CWL029,thioCrun_XCL_2,pyroCali1,chloTepi_TLS,stapAure_MU50,novoArom_DSM12444,magnMC1,zymoMobi_ZM4,salmTyph_TY2,chloChlo_CAD3,azoaSp_EBN1,therTher_HB27,bifiLong,picrTorr1,listInno,bdelBact,gramFors_KT0803,sulfAcid1,geobTher_NG80_2,peloCarb,ralsEutr_JMP134,mannSucc_MBEL55E,syneSp_WH8102,methTherPT1,clavMich_NCPPB_382,therAcid1,syntAcid_SB,porpGing_W83,therNeut0,leifXyli_XYLI_CTCB0,shewFrig,photProf_SS9,thioDeni_ATCC25259,methMaze1,desuRedu_MI_1,burkThai_E264,campFetu_82_40,blocFlor,jannCCS1,nitrMult_ATCC25196,streCoel,soliUsit_ELLIN6076,pastMult,saliRube_DSM13855,methTher1,nostSp,shigFlex_2A,saccDegr_2_40,oceaIhey,dehaEthe_195,rhodRubr_ATCC11170,arthFB24,shewMR7,pireSp,anaeDeha_2CP_C,haloVolc1,dichNodo_VCS1703A,tricEryt_IMS101,mycoGeni,thioDeni_ATCC33889,methSmit1,geobUran_RF4,shewDeni,halMar1,desuHafn_Y51,methStad1,granBeth_CGDNIH1,therPend1,legiPneu_PHILADELPHIA,vibrChol_O395_1,nitrOcea_ATCC19707,campJeju_RM1221,methPetr_PM1,heliAcin_SHEEBA,eschColi_APEC_O1,peloTher_SI,haloHalo1,syntFuma_MPOB,xyleFast,gloeViol,leucMese_ATCC8293,bactThet_VPI_5482,xantCamp,sodaGlos_MORSITANS,geobSulf,roseDeni_OCH_114,coryEffi_YS_314,brucMeli,mycoTube_H37RV,vibrFisc_ES114_1,pyrAby1,burkXeno_LB400,polyQLWP,stapMari1,peloLute_DSM273,burkCeno_AU_1054,shewBalt,nocaFarc_IFM10152,ente638,mculMari1,saliTrop_CNB_440,neorSenn_MIYAYAMA,aquiAeol,dechArom_RCB,myxoXant_DK_1622,burkPseu_1106A,burkCepa_AMMD,methMari_C5_1,azorCaul2,methFlag_KT,leptInte,eschColi_K12,synePCC6,baumCica_HOMALODISCA,methBark1,pseuAeru,geobMeta_GS15,eschColi_CFT073,photLumi,metMar1,hermArse,campJeju,therKoda1,aeroHydr_ATCC7966,baciAnth_AMES,shewOnei,therTeng,lawsIntr_PHE_MN1_00 #Harvested from http://genome-test.cse.ucsc.edu/cgi-bin/das/dsn -test http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks? punNye1,homIni20,danRer4,amaVit1,latCha1,eriEur1,criGri1,eriEur2,odoRosDiv1,lepWed1,papAnu2,ce6,ce5,anoCar2,ce3,ce2,chiLan1,rn3,loxAfr3,loxAfr2,simHuman,droMoj2,rn5,rn4,haeCon1,tetNig2,oreNil2,droYak2,dp3,dp2,mhcRef,vicPac1,eptFus1,caePb3,vicPac2,strPur2,caeRem3,geoFor1,caeRem1,sorAra1,oviAri1,galGal4,astMex1,musFur1,borEut13,homIni14,gadMor1,cacTus1,cacTus2,cacTus3,oryCun1,sacCer3,oryCun2,micOch1,colLiv1,caeJap3,melInc1,chlSab1,kirRefV3,kirRefV2,choHof1,droSim1,saiBol1,kirRefV5,kirRefV4,octDeg1,centromers1,droGri1,sc1,CHM1,mhcRefnoNs8,mhcRefnoNs9,dasNov2,dasNov3,taeGut1,caeJap4,taeGut2,caeSp71,equCab2,mhcRefnoNs2,mhcRefnoNs3,mhcRefnoNs4,mhcRefnoNs5,tupBel1,hlaRef2,galGal2,caeSp91,ce4,croPor0,bosTau7,bosTau6,ficAlb2,caePb2,panHod1,gorGor3,gorGor2,gorGor1,mayZeb1,mm7,pteVam1,macEug1,loxAfr1,venter1,micMur1,galGal3,mhcRefnoNs6,priPac2,proCap1,priPac1,mhcRefnoNs7,droEre1,sorAra2,mm10Strains1,ornAna1,xipMac1,anaPla1,mmtv,eutHer13,braNey1,droPer1,bioGla0,ce9,sarHar1,mhcRefnoNs,hlaRef3,simHumanMammal,hlaRef1,equCab1,simHumanPrim,cioSav2,myoLuc1,rheMac3,rheMac2,oreNil1,ailMel1,droSec1,canHg12,simRefMam,apiMel1,homIni13,nemVec1,turTru1,sacCer2,droVir2,droVir1,ce7,myoLuc2,repBase0,repBase1,dm1,susScr2,cb3,calJac1,macFas5,calJac3,dm3,triMan1,neoBri1,canFam1,conCri1,caeAng1,hg15,hg16,hg17,hg18,hg19,lotGig0,monDom5,monDom4,euaGli13,pteAle1,panPan1,petMar1,oviAri3,cb4,caeRem2,cioSav1,aplCal0,aplCal1,gliRes13,strPur1,anoCar1,dasNov1,mm9,tetNig1,mm8,braFlo1,oryLat2,caeJap2,braFlo2,lauRas13,susScr1,jacJac1,susScr3,hg19LggInv,caeJap1,canFam3,canFam2,echTel2,caePb1,echTel1,simHumanMam,ci2,ci1,canFamPoodle1,camFer1,papHam1,ce8,cerSim1,capHir1,droMoj1,oryLat1,ponAbe2,ce10,pseHum1,mm10Patch1,calMil0,caeSp111,hapBur1,cheMyd1,hg19Haps,mm10,falciparum,mhcRefnoNs24,mhcRefnoNs23,mhcRefnoNs22,mhcRefnoNs21,mhcRefnoNs20,xenTro1,xenTro3,xenTro2,xenTro7,falPer1,nonAfr13,homPan20,fr3,fr2,fr1,macEug2,caeRem4,gasAcu1,dm2,nomLeu3,nomLeu2,nomLeu1,apiMel2,melHap1,zonAlb1,tupChi1,ecoRef1,ecoRef3,ecoRef2,chrAsi1,ecoRef4,cavPor3,triCas2,panTro4,pelSin1,panTro1,droAna2,panTro3,panTro2,mhcRefnoNs18,mhcRefnoNs19,lepOcu1,hg19Patch10,strPur3,mhcRefnoNs12,mhcRefnoNs13,mhcRefnoNs10,mhcRefnoNs11,mhcRefnoNs16,mhcRefnoNs17,mhcRefnoNs14,mhcRefnoNs15,sacCer1,melUnd1,apaSpi1,danRer6,danRer5,mesAur1,danRer3,dipOrd1,eleEdw1,hg19Patch2,hg19Patch5,droAna1,hg19Patch9,bruMal1,bosTau3,bosTau2,bosTau5,bosTau4,ochPri2,ochPri3,melGal1,petMar2,rodEnt13,oryAfe1,evoSim1,priMat13,hetGla2,turTru2,repeats2,hetGla1,cb2,monDom1,cb1,speTri1,speTri2,catArr1,allMis1,allMis0,plasFalc3D7_2,araMac1,strPur4,otoGar1,otoGar3,kirRef,myoDav1,falChe1,tarSyr1,letCam1,afrOth13,danRer7,takFla1,droYak1,felCat3,orcOrc1,chrPic1,anoGam1,felCat4,felCat5 +test http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks? anoCar1,ce4,ce3,ce2,ce1,loxAfr1,rn2,eschColi_O157H7_1,rn4,droYak1,heliPylo_J99_1,droYak2,dp3,dp2,caeRem2,caeRem1,oryLat1,eschColi_K12_1,homIni13,homIni14,droAna1,droAna2,oryCun1,sacCer1,heliHepa1,droGri1,sc1,dasNov1,choHof1,tupBel1,mm9,mm8,vibrChol1,mm5,mm4,mm7,mm6,mm3,mm2,rn3,venter1,galGal3,galGal2,ornAna1,equCab1,cioSav2,rheMac2,eutHer13,droPer1,droVir2,droVir1,heliPylo_26695_1,euaGli13,calJac1,campJeju1,droSim1,hg13,hg15,hg16,hg17,monDom1,monDom4,droMoj1,petMar1,droMoj2,vibrChol_MO10_1,vibrPara1,gliRes13,vibrVuln_YJ016_1,braFlo1,cioSav1,lauRas13,dm1,canFam1,canFam2,ci1,echTel1,ci2,caePb1,dm3,ponAbe2,falciparum,xenTro1,xenTro2,nonAfr13,fr2,fr1,gasAcu1,dm2,apiMel1,apiMel2,eschColi_O157H7EDL933_1,priPac1,panTro1,hg18,panTro2,campJeju_RM1221_1,canHg12,vibrChol_O395_1,vibrFisc_ES114_1,danRer5,danRer4,danRer3,danRer2,danRer1,tetNig1,afrOth13,bosTau1,eschColi_CFT073_1,bosTau3,bosTau2,bosTau4,rodEnt13,droEre1,priMat13,vibrVuln_CMCP6_1,cb2,cb3,cb1,borEut13,droSec1,felCat3,strPur1,strPur2,otoGar1,catArr1,anoGam1,triCas2 +ucla http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks? araTha1 +psu bx main http://main.genome-browser.bx.psu.edu/cgi-bin/hgTracks? hg18,hg19,mm8,mm9 https://bitbucket.org/galaxy/galaxy-central/commits/c2561f65d6f9/ Changeset: c2561f65d6f9 User: fubar Date: 2013-10-08 04:34:54 Summary: revert bowtie2 and tophat2 wrappers Affected #: 2 files diff -r 68b694a22c59b6419a568330bbd0a02186068526 -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 tools/ngs_rna/tophat2_wrapper.xml --- a/tools/ngs_rna/tophat2_wrapper.xml +++ b/tools/ngs_rna/tophat2_wrapper.xml @@ -1,5 +1,5 @@ -<tool id="tophat2" name="Tophat2" version="0.5"> - <!-- Wrapper compatible with Tophat version 2.0.0 --> +<tool id="tophat2" name="Tophat2" version="0.6"> + <!-- Wrapper compatible with Tophat version 2.0.0+ --><description>Gapped-read mapper for RNA-seq data</description><version_command>tophat2 --version</version_command><requirements> @@ -268,24 +268,25 @@ </inputs><stdio> - <regex match="Exception" source="both" level="fatal" description="Tool exception"/> + <regex match="Exception|Error" source="both" level="fatal" description="Tool execution failed"/><regex match=".*" source="both" level="log" description="tool progress"/></stdio><outputs> - <data format="tabular" name="fusions" label="${on_string}_tophat2fus.xls" from_work_dir="tophat_out/fusions.out"> + <data format="txt" name="align_summary" label="${tool.name} on ${on_string}: align_summary" from_work_dir="tophat_out/align_summary.txt"/> + <data format="tabular" name="fusions" label="${tool.name} on ${on_string}: fusions" from_work_dir="tophat_out/fusions.out"><filter>(params['settingsType'] == 'full' and params['fusion_search']['do_search'] == 'Yes')</filter></data> - <data format="bed" name="insertions" label="${on_string}_tophat2ins.bed" from_work_dir="tophat_out/insertions.bed"> + <data format="bed" name="insertions" label="${tool.name} on ${on_string}: insertions" from_work_dir="tophat_out/insertions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="deletions" label="${on_string}_tophhat2del.bed" from_work_dir="tophat_out/deletions.bed"> + <data format="bed" name="deletions" label="${tool.name} on ${on_string}: deletions" from_work_dir="tophat_out/deletions.bed"><expand macro="dbKeyActions" /></data> - <data format="bed" name="junctions" label="${on_string}tophat2sj.bed" from_work_dir="tophat_out/junctions.bed"> + <data format="bed" name="junctions" label="${tool.name} on ${on_string}: splice junctions" from_work_dir="tophat_out/junctions.bed"><expand macro="dbKeyActions" /></data> - <data format="bam" name="accepted_hits" label="${on_string}tophat2hits.bam" from_work_dir="tophat_out/accepted_hits.bam"> + <data format="bam" name="accepted_hits" label="${tool.name} on ${on_string}: accepted_hits" from_work_dir="tophat_out/accepted_hits.bam"><expand macro="dbKeyActions" /></data></outputs> @@ -312,6 +313,7 @@ </actions></macro></macros> + <tests><!-- Test base-space single-end reads with pre-built index and preset parameters --><test> @@ -450,7 +452,8 @@ <help> **Tophat Overview** -TopHat_ is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. Please cite: Trapnell, C., Pachter, L. and Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111 (2009). +TopHat_ is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie(2), and then analyzes the mapping results to identify splice junctions between exons. Please cite: Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, and Salzberg SL. TopHat2: accurate alignment +of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36, 2013. .. _Tophat: http://tophat.cbcb.umd.edu/ diff -r 68b694a22c59b6419a568330bbd0a02186068526 -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 tools/sr_mapping/bowtie2_wrapper.xml --- a/tools/sr_mapping/bowtie2_wrapper.xml +++ b/tools/sr_mapping/bowtie2_wrapper.xml @@ -95,14 +95,15 @@ #end if ## view/sort and output file - | samtools view -Su - | samtools sort -o - - > $output + | samtools view -Su - | samtools sort -o - - > $output; ## rename unaligned sequence files #if $library.type == "paired" and $output_unaligned_reads_l and $output_unaligned_reads_r: #set left = str($output_unaligned_reads_l).replace( '.dat', '.1.dat' ) #set right = str($output_unaligned_reads_l).replace( '.dat', '.2.dat' ) - ;mv $left $output_unaligned_reads_l; - mv $right $output_unaligned_reads_r + + mv $left $output_unaligned_reads_l; + mv $right $output_unaligned_reads_r; #end if </command> @@ -112,8 +113,6 @@ </stdio><inputs> - <param name="jobtitle" type="text" value="bowtie2" label="Memorable short reminder of this job's importance for outputs" /> - <!-- single/paired --><conditional name="library"><param name="type" type="select" label="Is this library mate-paired?"> @@ -217,7 +216,7 @@ <!-- define outputs --><outputs> - <data format="fastqsanger" name="output_unaligned_reads_l" label="${on_string}_${jobtitle}_unaligL.fastq" > + <data format="fastqsanger" name="output_unaligned_reads_l" label="${tool.name} on ${on_string}: unaligned reads (L)" ><filter>unaligned_file is True</filter><actions><action type="format"> @@ -225,7 +224,7 @@ </action></actions></data> - <data format="fastqsanger" name="output_unaligned_reads_r" label="${on_string}_${jobtitle}_unaligR.fastq)"> + <data format="fastqsanger" name="output_unaligned_reads_r" label="${tool.name} on ${on_string}: unaligned reads (R)"><filter>library['type'] == "paired" and unaligned_file is True</filter><actions><action type="format"> @@ -233,7 +232,7 @@ </action></actions></data> - <data format="bam" name="output" label="${tool.name} on ${on_string}_${jobtitle}_aligned.bam"> + <data format="bam" name="output" label="${tool.name} on ${on_string}: aligned reads"><actions><conditional name="reference_genome.source"><when value="indexed"> https://bitbucket.org/galaxy/galaxy-central/commits/a2ffcbc614d4/ Changeset: a2ffcbc614d4 User: fubar Date: 2013-10-08 04:39:22 Summary: restore original tools/actions/__init__.py Affected #: 1 file diff -r c2561f65d6f9d181e76d60d0f7ed60e1ce94c513 -r a2ffcbc614d42f343f4bab09035e771bb50b0dcc lib/galaxy/tools/actions/__init__.py --- a/lib/galaxy/tools/actions/__init__.py +++ b/lib/galaxy/tools/actions/__init__.py @@ -1,6 +1,5 @@ import os import galaxy.tools -import re from galaxy.exceptions import ObjectInvalid from galaxy.model import LibraryDatasetDatasetAssociation @@ -23,13 +22,13 @@ """ def execute( self, tool, trans, incoming={}, set_output_hid=True ): raise TypeError("Abstract method") - + class DefaultToolAction( object ): """Default tool action is to run an external command""" - + def collect_input_datasets( self, tool, param_values, trans ): """ - Collect any dataset inputs from incoming. Returns a mapping from + Collect any dataset inputs from incoming. Returns a mapping from parameter name to Dataset instance for each tool parameter that is of the DataToolParameter type. """ @@ -118,9 +117,9 @@ def make_dict_copy( from_dict ): """ Makes a copy of input dictionary from_dict such that all values that are dictionaries - result in creation of a new dictionary ( a sort of deepcopy ). We may need to handle - other complex types ( e.g., lists, etc ), but not sure... - Yes, we need to handle lists (and now are)... + result in creation of a new dictionary ( a sort of deepcopy ). We may need to handle + other complex types ( e.g., lists, etc ), but not sure... + Yes, we need to handle lists (and now are)... """ copy_from_dict = {} for key, value in from_dict.items(): @@ -169,11 +168,11 @@ input_values[ input.name ] = galaxy.tools.SelectToolParameterWrapper( input, input_values[ input.name ], tool.app, other_values = incoming ) else: input_values[ input.name ] = galaxy.tools.InputValueWrapper( input, input_values[ input.name ], incoming ) - + # Set history. if not history: history = tool.get_default_history_by_trans( trans, create=True ) - + out_data = odict() # Collect any input datasets from the incoming parameters inp_data = self.collect_input_datasets( tool, incoming, trans ) @@ -186,26 +185,20 @@ if not data: data = NoneDataset( datatypes_registry = trans.app.datatypes_registry ) continue - + # Convert LDDA to an HDA. if isinstance(data, LibraryDatasetDatasetAssociation): data = data.to_history_dataset_association( None ) inp_data[name] = data - -# else: # HDA -# if data.hid: -# input_names.append( 'data %s' % data.hid ) + + else: # HDA + if data.hid: + input_names.append( 'data %s' % data.hid ) input_ext = data.ext - + if data.dbkey not in [None, '?']: input_dbkey = data.dbkey - data_name_sane = re.sub('[^a-zA-Z0-9_]+', '', data.name) - if trans.app.config.use_data_id_on_string: - # we want names in our on_strings not numbers - input_names.append(data_name_sane) - else: - if data.hid: - input_names.append('data %s' % data.hid) + # Collect chromInfo dataset and add as parameters to incoming db_datasets = {} db_dataset = trans.db_dataset_for( input_dbkey ) @@ -221,13 +214,13 @@ if 'fasta' in custom_build_dict: build_fasta_dataset = trans.sa_session.query( trans.app.model.HistoryDatasetAssociation ).get( custom_build_dict[ 'fasta' ] ) chrom_info = build_fasta_dataset.get_converted_dataset( trans, 'len' ).file_name - + if not chrom_info: # Default to built-in build. chrom_info = os.path.join( trans.app.config.len_file_path, "%s.len" % input_dbkey ) incoming[ "chromInfo" ] = chrom_info inp_data.update( db_datasets ) - + # Determine output dataset permission/roles list existing_datasets = [ inp for inp in inp_data.values() if inp ] if existing_datasets: @@ -239,20 +232,17 @@ if len( input_names ) == 1: on_text = input_names[0] elif len( input_names ) == 2: - #on_text = '%s and %s' % tuple(input_names[0:2]) - on_text = '%s_%s' % tuple(input_names[0:2]) + on_text = '%s and %s' % tuple(input_names[0:2]) elif len( input_names ) == 3: - #on_text = '%s, %s, and %s' % tuple(input_names[0:3]) - on_text = '%s_%s_%s' % tuple(input_names[0:3]) + on_text = '%s, %s, and %s' % tuple(input_names[0:3]) elif len( input_names ) > 3: - #on_text = '%s, %s, and others' % tuple(input_names[0:2]) - on_text = '%s_%s_and_others' % tuple(input_names[0:2]) + on_text = '%s, %s, and others' % tuple(input_names[0:2]) else: on_text = "" # Add the dbkey to the incoming parameters incoming[ "dbkey" ] = input_dbkey params = None #wrapped params are used by change_format action and by output.label; only perform this wrapping once, as needed - # Keep track of parent / child relationships, we'll create all the + # Keep track of parent / child relationships, we'll create all the # datasets first, then create the associations parent_to_child_pairs = [] child_dataset_names = set() @@ -268,7 +258,7 @@ if output.parent: parent_to_child_pairs.append( ( output.parent, name ) ) child_dataset_names.add( name ) - ## What is the following hack for? Need to document under what + ## What is the following hack for? Need to document under what ## conditions can the following occur? (james@bx.psu.edu) # HACK: the output data has already been created # this happens i.e. as a result of the async controller @@ -289,7 +279,7 @@ ext = input_extension except Exception, e: pass - + #process change_format tags if output.change_format: if params is None: @@ -332,14 +322,14 @@ object_store_id = data.dataset.object_store_id # these will be the same thing after the first output # This may not be neccesary with the new parent/child associations data.designation = name - # Copy metadata from one of the inputs if requested. + # Copy metadata from one of the inputs if requested. if output.metadata_source: data.init_meta( copy_from=inp_data[output.metadata_source] ) else: data.init_meta() # Take dbkey from LAST input data.dbkey = str(input_dbkey) - # Set state + # Set state # FIXME: shouldn't this be NEW until the job runner changes it? data.state = data.states.QUEUED data.blurb = "queued" @@ -361,7 +351,7 @@ params = make_dict_copy( incoming ) wrap_values( tool.inputs, params, skip_missing_values = not tool.check_values ) data.name = self._get_default_data_name( data, tool, on_text=on_text, trans=trans, incoming=incoming, history=history, params=params, job_params=job_params ) - # Store output + # Store output out_data[ name ] = data if output.actions: #Apply pre-job tool-output-dataset actions; e.g. setting metadata, changing format @@ -383,7 +373,7 @@ parent_dataset = out_data[ parent_name ] child_dataset = out_data[ child_name ] parent_dataset.children.append( child_dataset ) - # Store data after custom code runs + # Store data after custom code runs trans.sa_session.flush() # Create the job object job = trans.app.model.Job() @@ -464,7 +454,7 @@ for name in inp_data.keys(): dataset = inp_data[ name ] redirect_url = tool.parse_redirect_url( dataset, incoming ) - # GALAXY_URL should be include in the tool params to enable the external application + # GALAXY_URL should be include in the tool params to enable the external application # to send back to the current Galaxy instance GALAXY_URL = incoming.get( 'GALAXY_URL', None ) assert GALAXY_URL is not None, "GALAXY_URL parameter missing in tool config." https://bitbucket.org/galaxy/galaxy-central/commits/3df68022beef/ Changeset: 3df68022beef User: fubar Date: 2013-10-08 07:15:56 Summary: Big cleanup - remove material in tools/rgenetics accidentally added in August when a push was made from a BakerIDI repository - apologies to all. This push will revert all those changes and also removes the old rgenetics SNP tools which are now deprecated. Affected #: 1 file diff -r a2ffcbc614d42f343f4bab09035e771bb50b0dcc -r 3df68022beef2077798ed50a33afb03be75a841a tools/rgenetics/rgFastQC.py --- a/tools/rgenetics/rgFastQC.py +++ b/tools/rgenetics/rgFastQC.py @@ -24,10 +24,29 @@ import optparse import shutil import tempfile -from rgutils import getFileString import zipfile import gzip + +def getFileString(fpath, outpath): + """ + format a nice file size string + """ + size = '' + fp = os.path.join(outpath, fpath) + s = '? ?' + if os.path.isfile(fp): + n = float(os.path.getsize(fp)) + if n > 2**20: + size = ' (%1.1f MB)' % (n/2**20) + elif n > 2**10: + size = ' (%1.1f KB)' % (n/2**10) + elif n > 0: + size = ' (%d B)' % (int(n)) + s = '%s %s' % (fpath, size) + return s + + class FastQC(): """wrapper """ https://bitbucket.org/galaxy/galaxy-central/commits/c8b55344e779/ Changeset: c8b55344e779 User: fubar Date: 2013-10-08 07:30:54 Summary: Proper removal of rgenetics deprecated tool wrappers Affected #: 41 files diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/listFiles.py --- a/tools/rgenetics/listFiles.py +++ /dev/null @@ -1,227 +0,0 @@ -#Provides Upload tool with access to list of available files -import glob,sys -import galaxy.app as thisapp -import galaxy.util - -from elementtree.ElementTree import XML - -librepos = '/usr/local/galaxy/data/rg' -myrepos = '/home/rerla/galaxy' -marchinirepos = '/usr/local/galaxy/data/rg/snptest' - -from galaxy.tools.parameters import DataToolParameter - -#Provides Upload tool with access to list of available builds - -builds = [] -#Read build names and keys from galaxy.util -for dbkey, build_name in galaxy.util.dbnames: - builds.append((build_name,dbkey,False)) - -#Return available builds -def get_available_builds(defval='hg18'): - for i,x in enumerate(builds): - if x[1] == defval: - x = list(x) - x[2] = True - builds[i] = tuple(x) - return builds - - - -def get_tabular_cols( input, outformat='gg' ): - """numeric only other than rs for strict genome graphs - otherwise tabular. Derived from galaxy tool source around August 2007 by Ross""" - columns = [] - seenCnames = {} - elems = [] - colnames = ['Col%d' % x for x in range(input.metadata.columns+1)] - strict = (outformat=='gg') - for i, line in enumerate( file ( input.file_name ) ): - if line and not line.startswith( '#' ): - line = line.rstrip('\r\n') - elems = line.split( '\t' ) - - """ - Strict gg note: - Since this tool requires users to select only those columns - that contain numerical values, we'll restrict the column select - list appropriately other than the first column which must be a marker - """ - if len(elems) > 0: - for col in range(1, input.metadata.columns+1): - isFloat = False # short circuit common result - try: - val = float(elems[col-1]) - isFloat = True - except: - val = elems[col-1] - if val: - if i == 0: # header row - colnames[col] = val - if isFloat or (not strict) or (col == 1): # all in if not GG - option = colnames[col] - if not seenCnames.get(option,None): # new - columns.append((option,str(col),False)) - seenCnames[option] = option - #print 'get_tab: %d=%s. Columns=%s' % (i,line,str(columns)) - if len(columns) > 0 and i > 10: - """ - We have our select list built, so we can break out of the outer most for loop - """ - break - if i == 30: - break # Hopefully we never get here... - for option in range(min(5,len(columns))): - (x,y,z) = columns[option] - columns[option] = (x,y,True) - return columns # sorted select options - -def get_marchini_dir(): - """return the filesystem directory for snptest style files""" - return marchinirepos - - -def get_lib_SNPTESTCaCofiles(): - """return a list of file names - without extensions - available for caco studies - These have a common file name with both _1 and _2 suffixes""" - d = get_marchini_dir() - testsuffix = '.gen_1' # glob these - flist = glob.glob('%s/*%s' % (d,testsuffix)) - flist = [x.split(testsuffix)[0] for x in flist] # leaves with a list of file set names - if len(flist) > 0: - dat = [(flist[0],flist[0],True),] - dat += [(x,x,False) for x in flist[1:]] - else: - dat = [('No Marchini CaCo files found in %s - convert some using the Marchini converter tool' % d,'None',True),] - return dat - -def getChropt(): - """return dynamic chromosome select options - """ - c = ['X','Y'] - c += ['%d' % x for x in range(1,23)] - dat = [(x,x,False) for x in c] - x,y,z = dat[3] - dat[3] = (x,y,True) - return dat - - -def get_phecols(fname=''): - """ return a list of phenotype columns for a multi-select list - prototype: - foo = ('fake - not yet implemented','not implemented','False') - dat = [foo for x in range(5)] - return dat - """ - try: - header = file(fname,'r').next().split() - except: - return [('get_phecols unable to open file %s' % fname,'None',False),] - dat = [(x,x,False) for x in header] - return dat - -#Return various kinds of files - -def get_lib_pedfiles(): - dat = glob.glob('%s/ped/*.ped' % librepos) - dat += glob.glob('%s/ped/*.ped' % myrepos) - dat.sort() - if len(dat) > 0: - dat = [x.split('.ped')[0] for x in dat] - dat = [(x,x,'True') for x in dat] - else: - dat = [('No ped files - add some to %s/ped or %s/ped' % (librepos,myrepos),'None',True),] - return dat - -def get_lib_phefiles(): - ext = 'phe' - dat = glob.glob('%s/pheno/*.%s' % (librepos,ext)) - dat += glob.glob('%s/pheno/*.%s' % (myrepos,ext)) - dat.sort() - if len(dat) > 0: - dat = [(x,x,'False') for x in dat] - else: - dat = [('No %s files - add some to %s/pheno or %s/pheno' % (ext,librepos,myrepos),'None',True),] - return dat - -def get_lib_bedfiles(): - dat = glob.glob('%s/plinkbed/*.bed' % librepos) - dat += glob.glob('%s/plinkbed/*.bed' % myrepos) - dat.sort() - if len(dat) > 0: - dat = [x.split('.bed')[0] for x in dat] - dat = [(x,x,False) for x in dat] - else: - dat = [('No bed files - Please import some to %s/plinkbed or %s/plinkbed' % (librepos,myrepos),'None',True),] - return dat - -def get_lib_fbatfiles(): - dat = glob.glob('%s/plinkfbat/*.ped' % librepos) - dat += glob.glob('%s/plinkfbat/*.ped' % myrepos) - dat.sort() - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No fbat bed files - Please import some to %s/plinkfbat or %s/plinkfbat' % (librepos,myrepos),'None',True),] - return dat - -def get_lib_mapfiles(): - dat = glob.glob('%s/ped/*.map' % librepos) - dat += glob.glob('%s/ped/*.map' % myrepos) - dat.sort() - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No map files - add some to %s/ped' % librepos,'None',True),] - return dat - -def get_my_pedfiles(): - dat = glob.glob('%s/*.ped' % myrepos) - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - -def get_my_mapfiles(): - dat = glob.glob('%s/*.map' % myrepos) - if len(dat) > 0: - dat = [(x,x,'True') for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - -def get_lib_xlsfiles(): - dat = glob.glob('%s/*.xls' % librepos) - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - -def get_lib_htmlfiles(): - dat = glob.glob('%s/*.html' % librepos) - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - -def get_my_xlsfiles(): - dat = glob.glob('%s/*.xls' % myrepos) - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - -def get_my_htmlfiles(): - dat = glob.glob('%s/*.html' % myrepos) - if len(dat) > 0: - dat = [(x,x,False) for x in dat] - else: - dat = [('No ped files - add some to %s' % librepos,'None',True),] - return dat - - diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/plinkbinJZ.py --- a/tools/rgenetics/plinkbinJZ.py +++ /dev/null @@ -1,868 +0,0 @@ -#!/usr/bin/env python2.4 -""" -""" - -import optparse,os,subprocess,gzip,struct,time,commands -from array import array - -#from AIMS import util -#from pga import util as pgautil - -__FILE_ID__ = '$Id: plinkbinJZ.py,v 1.14 2009/07/13 20:16:50 rejpz Exp $' - -VERBOSE = True - -MISSING_ALLELES = set(['N', '0', '.', '-','']) - -AUTOSOMES = set(range(1, 23) + [str(c) for c in range(1, 23)]) - -MAGIC_BYTE1 = '00110110' -MAGIC_BYTE2 = '11011000' -FORMAT_SNP_MAJOR_BYTE = '10000000' -FORMAT_IND_MAJOR_BYTE = '00000000' -MAGIC1 = (0, 3, 1, 2) -MAGIC2 = (3, 1, 2, 0) -FORMAT_SNP_MAJOR = (2, 0, 0, 0) -FORMAT_IND_MAJOR = (0, 0, 0, 0) -HEADER_LENGTH = 3 - -HOM0 = 3 -HOM1 = 0 -MISS = 2 -HET = 1 -HOM0_GENO = (0, 0) -HOM1_GENO = (1, 1) -HET_GENO = (0, 1) -MISS_GENO = (-9, -9) - -GENO_TO_GCODE = { - HOM0_GENO: HOM0, - HET_GENO: HET, - HOM1_GENO: HOM1, - MISS_GENO: MISS, - } - -CHROM_REPLACE = { - 'X': '23', - 'Y': '24', - 'XY': '25', - 'MT': '26', - 'M': '26', -} - -MAP_LINE_EXCEPTION_TEXT = """ -One or more lines in the *.map file has only three fields. -The line was: - -%s - -If you are running rgGRR through EPMP, this is usually a -sign that you are using an old version of the map file. -You can correct the problem by re-running Subject QC. If -you have already tried this, please contact the developers, -or file a bug. -""" - -INT_TO_GCODE = { - 0: array('i', (0, 0, 0, 0)), 1: array('i', (2, 0, 0, 0)), 2: array('i', (1, 0, 0, 0)), 3: array('i', (3, 0, 0, 0)), - 4: array('i', (0, 2, 0, 0)), 5: array('i', (2, 2, 0, 0)), 6: array('i', (1, 2, 0, 0)), 7: array('i', (3, 2, 0, 0)), - 8: array('i', (0, 1, 0, 0)), 9: array('i', (2, 1, 0, 0)), 10: array('i', (1, 1, 0, 0)), 11: array('i', (3, 1, 0, 0)), - 12: array('i', (0, 3, 0, 0)), 13: array('i', (2, 3, 0, 0)), 14: array('i', (1, 3, 0, 0)), 15: array('i', (3, 3, 0, 0)), - 16: array('i', (0, 0, 2, 0)), 17: array('i', (2, 0, 2, 0)), 18: array('i', (1, 0, 2, 0)), 19: array('i', (3, 0, 2, 0)), - 20: array('i', (0, 2, 2, 0)), 21: array('i', (2, 2, 2, 0)), 22: array('i', (1, 2, 2, 0)), 23: array('i', (3, 2, 2, 0)), - 24: array('i', (0, 1, 2, 0)), 25: array('i', (2, 1, 2, 0)), 26: array('i', (1, 1, 2, 0)), 27: array('i', (3, 1, 2, 0)), - 28: array('i', (0, 3, 2, 0)), 29: array('i', (2, 3, 2, 0)), 30: array('i', (1, 3, 2, 0)), 31: array('i', (3, 3, 2, 0)), - 32: array('i', (0, 0, 1, 0)), 33: array('i', (2, 0, 1, 0)), 34: array('i', (1, 0, 1, 0)), 35: array('i', (3, 0, 1, 0)), - 36: array('i', (0, 2, 1, 0)), 37: array('i', (2, 2, 1, 0)), 38: array('i', (1, 2, 1, 0)), 39: array('i', (3, 2, 1, 0)), - 40: array('i', (0, 1, 1, 0)), 41: array('i', (2, 1, 1, 0)), 42: array('i', (1, 1, 1, 0)), 43: array('i', (3, 1, 1, 0)), - 44: array('i', (0, 3, 1, 0)), 45: array('i', (2, 3, 1, 0)), 46: array('i', (1, 3, 1, 0)), 47: array('i', (3, 3, 1, 0)), - 48: array('i', (0, 0, 3, 0)), 49: array('i', (2, 0, 3, 0)), 50: array('i', (1, 0, 3, 0)), 51: array('i', (3, 0, 3, 0)), - 52: array('i', (0, 2, 3, 0)), 53: array('i', (2, 2, 3, 0)), 54: array('i', (1, 2, 3, 0)), 55: array('i', (3, 2, 3, 0)), - 56: array('i', (0, 1, 3, 0)), 57: array('i', (2, 1, 3, 0)), 58: array('i', (1, 1, 3, 0)), 59: array('i', (3, 1, 3, 0)), - 60: array('i', (0, 3, 3, 0)), 61: array('i', (2, 3, 3, 0)), 62: array('i', (1, 3, 3, 0)), 63: array('i', (3, 3, 3, 0)), - 64: array('i', (0, 0, 0, 2)), 65: array('i', (2, 0, 0, 2)), 66: array('i', (1, 0, 0, 2)), 67: array('i', (3, 0, 0, 2)), - 68: array('i', (0, 2, 0, 2)), 69: array('i', (2, 2, 0, 2)), 70: array('i', (1, 2, 0, 2)), 71: array('i', (3, 2, 0, 2)), - 72: array('i', (0, 1, 0, 2)), 73: array('i', (2, 1, 0, 2)), 74: array('i', (1, 1, 0, 2)), 75: array('i', (3, 1, 0, 2)), - 76: array('i', (0, 3, 0, 2)), 77: array('i', (2, 3, 0, 2)), 78: array('i', (1, 3, 0, 2)), 79: array('i', (3, 3, 0, 2)), - 80: array('i', (0, 0, 2, 2)), 81: array('i', (2, 0, 2, 2)), 82: array('i', (1, 0, 2, 2)), 83: array('i', (3, 0, 2, 2)), - 84: array('i', (0, 2, 2, 2)), 85: array('i', (2, 2, 2, 2)), 86: array('i', (1, 2, 2, 2)), 87: array('i', (3, 2, 2, 2)), - 88: array('i', (0, 1, 2, 2)), 89: array('i', (2, 1, 2, 2)), 90: array('i', (1, 1, 2, 2)), 91: array('i', (3, 1, 2, 2)), - 92: array('i', (0, 3, 2, 2)), 93: array('i', (2, 3, 2, 2)), 94: array('i', (1, 3, 2, 2)), 95: array('i', (3, 3, 2, 2)), - 96: array('i', (0, 0, 1, 2)), 97: array('i', (2, 0, 1, 2)), 98: array('i', (1, 0, 1, 2)), 99: array('i', (3, 0, 1, 2)), - 100: array('i', (0, 2, 1, 2)), 101: array('i', (2, 2, 1, 2)), 102: array('i', (1, 2, 1, 2)), 103: array('i', (3, 2, 1, 2)), - 104: array('i', (0, 1, 1, 2)), 105: array('i', (2, 1, 1, 2)), 106: array('i', (1, 1, 1, 2)), 107: array('i', (3, 1, 1, 2)), - 108: array('i', (0, 3, 1, 2)), 109: array('i', (2, 3, 1, 2)), 110: array('i', (1, 3, 1, 2)), 111: array('i', (3, 3, 1, 2)), - 112: array('i', (0, 0, 3, 2)), 113: array('i', (2, 0, 3, 2)), 114: array('i', (1, 0, 3, 2)), 115: array('i', (3, 0, 3, 2)), - 116: array('i', (0, 2, 3, 2)), 117: array('i', (2, 2, 3, 2)), 118: array('i', (1, 2, 3, 2)), 119: array('i', (3, 2, 3, 2)), - 120: array('i', (0, 1, 3, 2)), 121: array('i', (2, 1, 3, 2)), 122: array('i', (1, 1, 3, 2)), 123: array('i', (3, 1, 3, 2)), - 124: array('i', (0, 3, 3, 2)), 125: array('i', (2, 3, 3, 2)), 126: array('i', (1, 3, 3, 2)), 127: array('i', (3, 3, 3, 2)), - 128: array('i', (0, 0, 0, 1)), 129: array('i', (2, 0, 0, 1)), 130: array('i', (1, 0, 0, 1)), 131: array('i', (3, 0, 0, 1)), - 132: array('i', (0, 2, 0, 1)), 133: array('i', (2, 2, 0, 1)), 134: array('i', (1, 2, 0, 1)), 135: array('i', (3, 2, 0, 1)), - 136: array('i', (0, 1, 0, 1)), 137: array('i', (2, 1, 0, 1)), 138: array('i', (1, 1, 0, 1)), 139: array('i', (3, 1, 0, 1)), - 140: array('i', (0, 3, 0, 1)), 141: array('i', (2, 3, 0, 1)), 142: array('i', (1, 3, 0, 1)), 143: array('i', (3, 3, 0, 1)), - 144: array('i', (0, 0, 2, 1)), 145: array('i', (2, 0, 2, 1)), 146: array('i', (1, 0, 2, 1)), 147: array('i', (3, 0, 2, 1)), - 148: array('i', (0, 2, 2, 1)), 149: array('i', (2, 2, 2, 1)), 150: array('i', (1, 2, 2, 1)), 151: array('i', (3, 2, 2, 1)), - 152: array('i', (0, 1, 2, 1)), 153: array('i', (2, 1, 2, 1)), 154: array('i', (1, 1, 2, 1)), 155: array('i', (3, 1, 2, 1)), - 156: array('i', (0, 3, 2, 1)), 157: array('i', (2, 3, 2, 1)), 158: array('i', (1, 3, 2, 1)), 159: array('i', (3, 3, 2, 1)), - 160: array('i', (0, 0, 1, 1)), 161: array('i', (2, 0, 1, 1)), 162: array('i', (1, 0, 1, 1)), 163: array('i', (3, 0, 1, 1)), - 164: array('i', (0, 2, 1, 1)), 165: array('i', (2, 2, 1, 1)), 166: array('i', (1, 2, 1, 1)), 167: array('i', (3, 2, 1, 1)), - 168: array('i', (0, 1, 1, 1)), 169: array('i', (2, 1, 1, 1)), 170: array('i', (1, 1, 1, 1)), 171: array('i', (3, 1, 1, 1)), - 172: array('i', (0, 3, 1, 1)), 173: array('i', (2, 3, 1, 1)), 174: array('i', (1, 3, 1, 1)), 175: array('i', (3, 3, 1, 1)), - 176: array('i', (0, 0, 3, 1)), 177: array('i', (2, 0, 3, 1)), 178: array('i', (1, 0, 3, 1)), 179: array('i', (3, 0, 3, 1)), - 180: array('i', (0, 2, 3, 1)), 181: array('i', (2, 2, 3, 1)), 182: array('i', (1, 2, 3, 1)), 183: array('i', (3, 2, 3, 1)), - 184: array('i', (0, 1, 3, 1)), 185: array('i', (2, 1, 3, 1)), 186: array('i', (1, 1, 3, 1)), 187: array('i', (3, 1, 3, 1)), - 188: array('i', (0, 3, 3, 1)), 189: array('i', (2, 3, 3, 1)), 190: array('i', (1, 3, 3, 1)), 191: array('i', (3, 3, 3, 1)), - 192: array('i', (0, 0, 0, 3)), 193: array('i', (2, 0, 0, 3)), 194: array('i', (1, 0, 0, 3)), 195: array('i', (3, 0, 0, 3)), - 196: array('i', (0, 2, 0, 3)), 197: array('i', (2, 2, 0, 3)), 198: array('i', (1, 2, 0, 3)), 199: array('i', (3, 2, 0, 3)), - 200: array('i', (0, 1, 0, 3)), 201: array('i', (2, 1, 0, 3)), 202: array('i', (1, 1, 0, 3)), 203: array('i', (3, 1, 0, 3)), - 204: array('i', (0, 3, 0, 3)), 205: array('i', (2, 3, 0, 3)), 206: array('i', (1, 3, 0, 3)), 207: array('i', (3, 3, 0, 3)), - 208: array('i', (0, 0, 2, 3)), 209: array('i', (2, 0, 2, 3)), 210: array('i', (1, 0, 2, 3)), 211: array('i', (3, 0, 2, 3)), - 212: array('i', (0, 2, 2, 3)), 213: array('i', (2, 2, 2, 3)), 214: array('i', (1, 2, 2, 3)), 215: array('i', (3, 2, 2, 3)), - 216: array('i', (0, 1, 2, 3)), 217: array('i', (2, 1, 2, 3)), 218: array('i', (1, 1, 2, 3)), 219: array('i', (3, 1, 2, 3)), - 220: array('i', (0, 3, 2, 3)), 221: array('i', (2, 3, 2, 3)), 222: array('i', (1, 3, 2, 3)), 223: array('i', (3, 3, 2, 3)), - 224: array('i', (0, 0, 1, 3)), 225: array('i', (2, 0, 1, 3)), 226: array('i', (1, 0, 1, 3)), 227: array('i', (3, 0, 1, 3)), - 228: array('i', (0, 2, 1, 3)), 229: array('i', (2, 2, 1, 3)), 230: array('i', (1, 2, 1, 3)), 231: array('i', (3, 2, 1, 3)), - 232: array('i', (0, 1, 1, 3)), 233: array('i', (2, 1, 1, 3)), 234: array('i', (1, 1, 1, 3)), 235: array('i', (3, 1, 1, 3)), - 236: array('i', (0, 3, 1, 3)), 237: array('i', (2, 3, 1, 3)), 238: array('i', (1, 3, 1, 3)), 239: array('i', (3, 3, 1, 3)), - 240: array('i', (0, 0, 3, 3)), 241: array('i', (2, 0, 3, 3)), 242: array('i', (1, 0, 3, 3)), 243: array('i', (3, 0, 3, 3)), - 244: array('i', (0, 2, 3, 3)), 245: array('i', (2, 2, 3, 3)), 246: array('i', (1, 2, 3, 3)), 247: array('i', (3, 2, 3, 3)), - 248: array('i', (0, 1, 3, 3)), 249: array('i', (2, 1, 3, 3)), 250: array('i', (1, 1, 3, 3)), 251: array('i', (3, 1, 3, 3)), - 252: array('i', (0, 3, 3, 3)), 253: array('i', (2, 3, 3, 3)), 254: array('i', (1, 3, 3, 3)), 255: array('i', (3, 3, 3, 3)), - } - -GCODE_TO_INT = dict([(tuple(v),k) for (k,v) in INT_TO_GCODE.items()]) - -### Exceptions -class DuplicateMarkerInMapFile(Exception): pass -class MapLineTooShort(Exception): pass -class ThirdAllele(Exception): pass -class PedError(Exception): pass -class BadMagic(Exception): - """ Raised when one of the MAGIC bytes in a bed file does not match - """ - pass -class BedError(Exception): - """ Raised when parsing a bed file runs into problems - """ - pass -class UnknownGenocode(Exception): - """ Raised when we get a 2-bit genotype that is undecipherable (is it possible?) - """ - pass -class UnknownGeno(Exception): pass - -### Utility functions - -def timenow(): - """return current time as a string - """ - return time.strftime('%d/%m/%Y %H:%M:%S', time.localtime(time.time())) - -def ceiling(n, k): - ''' Return the least multiple of k which is greater than n - ''' - m = n % k - if m == 0: - return n - else: - return n + k - m - -def nbytes(n): - ''' Return the number of bytes required for n subjects - ''' - return 2*ceiling(n, 4)/8 - -### Primary module functionality -class LPed: - """ The uber-class for processing the Linkage-format *.ped/*.map files - """ - def __init__(self, base): - self.base = base - self._ped = Ped('%s.ped' % (self.base)) - self._map = Map('%s.map' % (self.base)) - - self._markers = {} - self._ordered_markers = [] - self._marker_allele_lookup = {} - self._autosomal_indices = set() - - self._subjects = {} - self._ordered_subjects = [] - - self._genotypes = [] - - def parse(self): - """ - """ - if VERBOSE: print 'plinkbinJZ: Analysis started: %s' % (timenow()) - self._map.parse() - self._markers = self._map._markers - self._ordered_markers = self._map._ordered_markers - self._autosomal_indices = self._map._autosomal_indices - - self._ped.parse(self._ordered_markers) - self._subjects = self._ped._subjects - self._ordered_subjects = self._ped._ordered_subjects - self._genotypes = self._ped._genotypes - self._marker_allele_lookup = self._ped._marker_allele_lookup - - ### Adjust self._markers based on the allele information - ### we got from parsing the ped file - for m, name in enumerate(self._ordered_markers): - a1, a2 = self._marker_allele_lookup[m][HET] - self._markers[name][-2] = a1 - self._markers[name][-1] = a2 - if VERBOSE: print 'plinkbinJZ: Analysis finished: %s' % (timenow()) - - def getSubjectInfo(self, fid, oiid): - """ - """ - return self._subject_info[(fid, oiid)] - - def getSubjectInfoByLine(self, line): - """ - """ - return self._subject_info[self._ordered_subjects[line]] - - def getGenotypesByIndices(self, s, mlist, format): - """ needed for grr if lped - deprecated but.. - """ - mlist = dict(zip(mlist,[True,]*len(mlist))) # hash quicker than 'in' ? - raw_array = array('i', [row[s] for m,row in enumerate(self._genotypes) if mlist.get(m,None)]) - if format == 'raw': - return raw_array - elif format == 'ref': - result = array('i', [0]*len(mlist)) - for m, gcode in enumerate(raw_array): - if gcode == HOM0: - nref = 3 - elif gcode == HET: - nref = 2 - elif gcode == HOM1: - nref = 1 - else: - nref = 0 - result[m] = nref - return result - else: - result = [] - for m, gcode in enumerate(raw_array): - result.append(self._marker_allele_lookup[m][gcode]) - return result - - def writebed(self, base): - """ - """ - dst_name = '%s.fam' % (base) - print 'Writing pedigree information to [ %s ]' % (dst_name) - dst = open(dst_name, 'w') - for skey in self._ordered_subjects: - (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid) = self._subjects[skey] - dst.write('%s %s %s %s %s %s\n' % (fid, iid, did, mid, sex, phe)) - dst.close() - - dst_name = '%s.bim' % (base) - print 'Writing map (extended format) information to [ %s ]' % (dst_name) - dst = open(dst_name, 'w') - for m, marker in enumerate(self._ordered_markers): - chrom, name, genpos, abspos, a1, a2 = self._markers[marker] - dst.write('%s\t%s\t%s\t%s\t%s\t%s\n' % (chrom, name, genpos, abspos, a1, a2)) - dst.close() - - bed_name = '%s.bed' % (base) - print 'Writing genotype bitfile to [ %s ]' % (bed_name) - print 'Using (default) SNP-major mode' - bed = open(bed_name, 'w') - - ### Write the 3 header bytes - bed.write(struct.pack('B', int(''.join(reversed(MAGIC_BYTE1)), 2))) - bed.write(struct.pack('B', int(''.join(reversed(MAGIC_BYTE2)), 2))) - bed.write(struct.pack('B', int(''.join(reversed(FORMAT_SNP_MAJOR_BYTE)), 2))) - - ### Calculate how many "pad bits" we should add after the last subject - nsubjects = len(self._ordered_subjects) - nmarkers = len(self._ordered_markers) - total_bytes = nbytes(nsubjects) - nbits = nsubjects * 2 - pad_nibbles = ((total_bytes * 8) - nbits)/2 - pad = array('i', [0]*pad_nibbles) - - ### And now write genotypes to the file - for m in xrange(nmarkers): - geno = self._genotypes[m] - geno.extend(pad) - bytes = len(geno)/4 - for b in range(bytes): - idx = b*4 - gcode = tuple(geno[idx:idx+4]) - try: - byte = struct.pack('B', GCODE_TO_INT[gcode]) - except KeyError: - print m, b, gcode - raise - bed.write(byte) - bed.close() - - def autosomal_indices(self): - """ Return the indices of markers in this ped/map that are autosomal. - This is used by rgGRR so that it can select a random set of markers - from the autosomes (sex chroms screw up the plot) - """ - return self._autosomal_indices - -class Ped: - def __init__(self, path): - self.path = path - self._subjects = {} - self._ordered_subjects = [] - self._genotypes = [] - self._marker_allele_lookup = {} - - def lineCount(self,infile): - """ count the number of lines in a file - efficiently using wget - """ - return int(commands.getoutput('wc -l %s' % (infile)).split()[0]) - - - def parse(self, markers): - """ Parse a given file -- this needs to be memory-efficient so that large - files can be parsed (~1 million markers on ~5000 subjects?). It - should also be fast, if possible. - """ - - ### Find out how many lines are in the file so we can ... - nsubjects = self.lineCount(self.path) - ### ... Pre-allocate the genotype arrays - nmarkers = len(markers) - _marker_alleles = [['0', '0'] for _ in xrange(nmarkers)] - self._genotypes = [array('i', [-1]*nsubjects) for _ in xrange(nmarkers)] - - if self.path.endswith('.gz'): - pfile = gzip.open(self.path, 'r') - else: - pfile = open(self.path, 'r') - - for s, line in enumerate(pfile): - line = line.strip() - if not line: - continue - - fid, iid, did, mid, sex, phe, genos = line.split(None, 6) - sid = iid.split('.')[0] - d_sid = did.split('.')[0] - m_sid = mid.split('.')[0] - - skey = (fid, iid) - self._subjects[skey] = (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid) - self._ordered_subjects.append(skey) - - genotypes = genos.split() - - for m, marker in enumerate(markers): - idx = m*2 - a1, a2 = genotypes[idx:idx+2] # Alleles for subject s, marker m - s1, s2 = seen = _marker_alleles[m] # Alleles seen for marker m - - ### FIXME: I think this can still be faster, and simpler to read - # Two pieces of logic intertwined here: first, we need to code - # this genotype as HOM0, HOM1, HET or MISS. Second, we need to - # keep an ongoing record of the genotypes seen for this marker - if a1 == a2: - if a1 in MISSING_ALLELES: - geno = MISS_GENO - else: - if s1 == '0': - seen[0] = a1 - elif s1 == a1 or s2 == a2: - pass - elif s2 == '0': - seen[1] = a1 - else: - raise ThirdAllele('a1=a2=%s, seen=%s?' % (a1, str(seen))) - - if a1 == seen[0]: - geno = HOM0_GENO - elif a1 == seen[1]: - geno = HOM1_GENO - else: - raise PedError('Cannot assign geno for a1=a2=%s from seen=%s' % (a1, str(seen))) - elif a1 in MISSING_ALLELES or a2 in MISSING_ALLELES: - geno = MISS_GENO - else: - geno = HET_GENO - if s1 == '0': - seen[0] = a1 - seen[1] = a2 - elif s2 == '0': - if s1 == a1: - seen[1] = a2 - elif s1 == a2: - seen[1] = a1 - else: - raise ThirdAllele('a1=%s, a2=%s, seen=%s?' % (a1, a2, str(seen))) - else: - if sorted(seen) != sorted((a1, a2)): - raise ThirdAllele('a1=%s, a2=%s, seen=%s?' % (a1, a2, str(seen))) - - gcode = GENO_TO_GCODE.get(geno, None) - if gcode is None: - raise UnknownGeno(str(geno)) - self._genotypes[m][s] = gcode - - # Build the _marker_allele_lookup table - for m, alleles in enumerate(_marker_alleles): - if len(alleles) == 2: - a1, a2 = alleles - elif len(alleles) == 1: - a1 = alleles[0] - a2 = '0' - else: - print 'All alleles blank for %s: %s' % (m, str(alleles)) - raise - - self._marker_allele_lookup[m] = { - HOM0: (a2, a2), - HOM1: (a1, a1), - HET : (a1, a2), - MISS: ('0','0'), - } - - if VERBOSE: print '%s(%s) individuals read from [ %s ]' % (len(self._subjects), nsubjects, self.path) - -class Map: - def __init__(self, path=None): - self.path = path - self._markers = {} - self._ordered_markers = [] - self._autosomal_indices = set() - - def __len__(self): - return len(self._markers) - - def parse(self): - """ Parse a Linkage-format map file - """ - if self.path.endswith('.gz'): - fh = gzip.open(self.path, 'r') - else: - fh = open(self.path, 'r') - - for i, line in enumerate(fh): - line = line.strip() - if not line: - continue - - fields = line.split() - if len(fields) < 4: - raise MapLineTooShort(MAP_LINE_EXCEPTION_TEXT % (str(line), len(fields))) - else: - chrom, name, genpos, abspos = fields - if name in self._markers: - raise DuplicateMarkerInMapFile('Marker %s was found twice in map file %s' % (name, self.path)) - abspos = int(abspos) - if abspos < 0: - continue - if chrom in AUTOSOMES: - self._autosomal_indices.add(i) - chrom = CHROM_REPLACE.get(chrom, chrom) - self._markers[name] = [chrom, name, genpos, abspos, None, None] - self._ordered_markers.append(name) - fh.close() - if VERBOSE: print '%s (of %s) markers to be included from [ %s ]' % (len(self._ordered_markers), i, self.path) - -class BPed: - """ The uber-class for processing Plink's Binary Ped file format *.bed/*.bim/*.fam - """ - def __init__(self, base): - self.base = base - self._bed = Bed('%s.bed' % (self.base)) - self._bim = Bim('%s.bim' % (self.base)) - self._fam = Fam('%s.fam' % (self.base)) - - self._markers = {} - self._ordered_markers = [] - self._marker_allele_lookup = {} - self._autosomal_indices = set() - - self._subjects = {} - self._ordered_subjects = [] - - self._genotypes = [] - - def parse(self, quick=False): - """ - """ - self._quick = quick - - self._bim.parse() - self._markers = self._bim._markers - self._ordered_markers = self._bim._ordered_markers - self._marker_allele_lookup = self._bim._marker_allele_lookup - self._autosomal_indices = self._bim._autosomal_indices - - self._fam.parse() - self._subjects = self._fam._subjects - self._ordered_subjects = self._fam._ordered_subjects - - self._bed.parse(self._ordered_subjects, self._ordered_markers, quick=quick) - self._bedf = self._bed._fh - self._genotypes = self._bed._genotypes - self.nsubjects = len(self._ordered_subjects) - self.nmarkers = len(self._ordered_markers) - self._bytes_per_marker = nbytes(self.nsubjects) - - def writeped(self, path=None): - """ - """ - path = self.path = path or self.path - - map_name = self.path.replace('.bed', '.map') - print 'Writing map file [ %s ]' % (map_name) - dst = open(map_name, 'w') - for m in self._ordered_markers: - chrom, snp, genpos, abspos, a1, a2 = self._markers[m] - dst.write('%s\t%s\t%s\t%s\n' % (chrom, snp, genpos, abspos)) - dst.close() - - ped_name = self.path.replace('.bed', '.ped') - print 'Writing ped file [ %s ]' % (ped_name) - ped = open(ped_name, 'w') - firstyikes = False - for s, skey in enumerate(self._ordered_subjects): - idx = s*2 - (fid, iid, did, mid, sex, phe, oiid, odid, omid) = self._subjects[skey] - ped.write('%s %s %s %s %s %s' % (fid, iid, odid, omid, sex, phe)) - genotypes_for_subject = self.getGenotypesForSubject(s) - for m, snp in enumerate(self._ordered_markers): - #a1, a2 = self.getGenotypeByIndices(s, m) - a1,a2 = genotypes_for_subject[m] - ped.write(' %s %s' % (a1, a2)) - ped.write('\n') - ped.close() - - def getGenotype(self, subject, marker): - """ Retrieve a genotype for a particular subject/marker pair - """ - m = self._ordered_markers.index(marker) - s = self._ordered_subjects.index(subject) - return self.getGenotypeByIndices(s, m) - - def getGenotypesForSubject(self, s, raw=False): - """ Returns list of genotypes for all m markers - for subject s. If raw==True, then an array - of raw integer gcodes is returned instead - """ - if self._quick: - nmarkers = len(self._markers) - raw_array = array('i', [0]*nmarkers) - seek_nibble = s % 4 - for m in xrange(nmarkers): - seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH - self._bedf.seek(seek_byte) - geno = struct.unpack('B', self._bedf.read(1))[0] - quartet = INT_TO_GCODE[geno] - gcode = quartet[seek_nibble] - raw_array[m] = gcode - else: - raw_array = array('i', [row[s] for row in self._genotypes]) - - if raw: - return raw_array - else: - result = [] - for m, gcode in enumerate(raw_array): - result.append(self._marker_allele_lookup[m][gcode]) - return result - - def getGenotypeByIndices(self, s, m): - """ - """ - if self._quick: - # Determine which byte we need to seek to, and - # which nibble within the byte we need - seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH - seek_nibble = s % 4 - self._bedf.seek(seek_byte) - geno = struct.unpack('B', self._bedf.read(1))[0] - quartet = INT_TO_GCODE[geno] - gcode = quartet[seek_nibble] - else: - # Otherwise, just grab the genotypes from the - # list of arrays - genos_for_marker = self._genotypes[m] - gcode = genos_for_marker[s] - - return self._marker_allele_lookup[m][gcode] - - def getGenotypesByIndices(self, s, mlist, format): - """ - """ - if self._quick: - raw_array = array('i', [0]*len(mlist)) - seek_nibble = s % 4 - for i,m in enumerate(mlist): - seek_byte = m * self._bytes_per_marker + s/4 + HEADER_LENGTH - self._bedf.seek(seek_byte) - geno = struct.unpack('B', self._bedf.read(1))[0] - quartet = INT_TO_GCODE[geno] - gcode = quartet[seek_nibble] - raw_array[i] = gcode - mlist = set(mlist) - else: - mlist = set(mlist) - raw_array = array('i', [row[s] for m,row in enumerate(self._genotypes) if m in mlist]) - - if format == 'raw': - return raw_array - elif format == 'ref': - result = array('i', [0]*len(mlist)) - for m, gcode in enumerate(raw_array): - if gcode == HOM0: - nref = 3 - elif gcode == HET: - nref = 2 - elif gcode == HOM1: - nref = 1 - else: - nref = 0 - result[m] = nref - return result - else: - result = [] - for m, gcode in enumerate(raw_array): - result.append(self._marker_allele_lookup[m][gcode]) - return result - - def getSubject(self, s): - """ - """ - skey = self._ordered_subjects[s] - return self._subjects[skey] - - def autosomal_indices(self): - """ Return the indices of markers in this ped/map that are autosomal. - This is used by rgGRR so that it can select a random set of markers - from the autosomes (sex chroms screw up the plot) - """ - return self._autosomal_indices - -class Bed: - - def __init__(self, path): - self.path = path - self._genotypes = [] - self._fh = None - - def parse(self, subjects, markers, quick=False): - """ Parse the bed file, indicated either by the path parameter, - or as the self.path indicated in __init__. If quick is - True, then just parse the bim and fam, then genotypes will - be looked up dynamically by indices - """ - self._quick = quick - - ordered_markers = markers - ordered_subjects = subjects - nsubjects = len(ordered_subjects) - nmarkers = len(ordered_markers) - - bed = open(self.path, 'rb') - self._fh = bed - - byte1 = bed.read(1) - byte2 = bed.read(1) - byte3 = bed.read(1) - format_flag = struct.unpack('B', byte3)[0] - - h1 = tuple(INT_TO_GCODE[struct.unpack('B', byte1)[0]]) - h2 = tuple(INT_TO_GCODE[struct.unpack('B', byte2)[0]]) - h3 = tuple(INT_TO_GCODE[format_flag]) - - if h1 != MAGIC1 or h2 != MAGIC2: - raise BadMagic('One or both MAGIC bytes is wrong: %s==%s or %s==%s' % (h1, MAGIC1, h2, MAGIC2)) - if format_flag: - print 'Detected that binary PED file is v1.00 SNP-major mode (%s, "%s")\n' % (format_flag, h3) - else: - raise 'BAD_FORMAT_FLAG? (%s, "%s")\n' % (format_flag, h3) - - print 'Parsing binary ped file for %s markers and %s subjects' % (nmarkers, nsubjects) - - ### If quick mode was specified, we're done ... - self._quick = quick - if quick: - return - - ### ... Otherwise, parse genotypes into an array, and append that - ### array to self._genotypes - ngcodes = ceiling(nsubjects, 4) - bytes_per_marker = nbytes(nsubjects) - for m in xrange(nmarkers): - genotype_array = array('i', [-1]*(ngcodes)) - for byte in xrange(bytes_per_marker): - intval = struct.unpack('B', bed.read(1))[0] - idx = byte*4 - genotype_array[idx:idx+4] = INT_TO_GCODE[intval] - self._genotypes.append(genotype_array) - -class Bim: - def __init__(self, path): - """ - """ - self.path = path - self._markers = {} - self._ordered_markers = [] - self._marker_allele_lookup = {} - self._autosomal_indices = set() - - def parse(self): - """ - """ - print 'Reading map (extended format) from [ %s ]' % (self.path) - bim = open(self.path, 'r') - for m, line in enumerate(bim): - chrom, snp, gpos, apos, a1, a2 = line.strip().split() - self._markers[snp] = (chrom, snp, gpos, apos, a1, a2) - self._marker_allele_lookup[m] = { - HOM0: (a2, a2), - HOM1: (a1, a1), - HET : (a1, a2), - MISS: ('0','0'), - } - self._ordered_markers.append(snp) - if chrom in AUTOSOMES: - self._autosomal_indices.add(m) - bim.close() - print '%s markers to be included from [ %s ]' % (m+1, self.path) - -class Fam: - def __init__(self, path): - """ - """ - self.path = path - self._subjects = {} - self._ordered_subjects = [] - - def parse(self): - """ - """ - print 'Reading pedigree information from [ %s ]' % (self.path) - fam = open(self.path, 'r') - for s, line in enumerate(fam): - fid, iid, did, mid, sex, phe = line.strip().split() - sid = iid.split('.')[0] - d_sid = did.split('.')[0] - m_sid = mid.split('.')[0] - skey = (fid, iid) - self._ordered_subjects.append(skey) - self._subjects[skey] = (fid, iid, did, mid, sex, phe, sid, d_sid, m_sid) - fam.close() - print '%s individuals read from [ %s ]' % (s+1, self.path) - -### Command-line functionality and testing -def test(arg): - ''' - ''' - - import time - - if arg == 'CAMP_AFFY.ped': - print 'Testing bed.parse(quick=True)' - s = time.time() - bed = Bed(arg.replace('.ped', '.bed')) - bed.parse(quick=True) - print bed.getGenotype(('400118', '10300283'), 'rs2000467') - print bed.getGenotype(('400118', '10101384'), 'rs2294019') - print bed.getGenotype(('400121', '10101149'), 'rs2294019') - print bed.getGenotype(('400123', '10200290'), 'rs2294019') - assert bed.getGenotype(('400118', '10101384'), 'rs2294019') == ('4','4') - e = time.time() - print 'e-s = %s\n' % (e-s) - - print 'Testing bed.parse' - s = time.time() - bed = BPed(arg) - bed.parse(quick=False) - e = time.time() - print 'e-s = %s\n' % (e-s) - - print 'Testing bed.writeped' - s = time.time() - outname = '%s_BEDTEST' % (arg) - bed.writeped(outname) - e = time.time() - print 'e-s = %s\n' % (e-s) - del(bed) - - print 'Testing ped.parse' - s = time.time() - ped = LPed(arg) - ped.parse() - e = time.time() - print 'e-s = %s\n' % (e-s) - - print 'Testing ped.writebed' - s = time.time() - outname = '%s_PEDTEST' % (arg) - ped.writebed(outname) - e = time.time() - print 'e-s = %s\n' % (e-s) - del(ped) - -def profile_bed(arg): - """ - """ - bed = BPed(arg) - bed.parse(quick=False) - outname = '%s_BEDPROFILE' % (arg) - bed.writeped(outname) - -def profile_ped(arg): - """ - """ - ped = LPed(arg) - ped.parse() - outname = '%s_PEDPROFILE' % (arg) - ped.writebed(outname) - -if __name__ == '__main__': - """ Run as a command-line, this script should get one or more arguments, - each one a ped file to be parsed with the PedParser (unit tests?) - """ - op = optparse.OptionParser() - op.add_option('--profile-bed', action='store_true', default=False) - op.add_option('--profile-ped', action='store_true', default=False) - opts, args = op.parse_args() - - if opts.profile_bed: - import profile - import pstats - profile.run('profile_bed(args[0])', 'fooprof') - p = pstats.Stats('fooprof') - p.sort_stats('cumulative').print_stats(10) - elif opts.profile_ped: - import profile - import pstats - profile.run('profile_ped(args[0])', 'fooprof') - p = pstats.Stats('fooprof') - p.sort_stats('cumulative').print_stats(10) - else: - for arg in args: - test(arg) - - ### Code used to generate the INT_TO_GCODE dictionary - #print '{\n ', - #for i in range(256): - # b = INT2BIN[i] - # ints = [] - # s = str(i).rjust(3) - # #print b - # for j in range(4): - # idx = j*2 - # #print i, j, idx, b[idx:idx+2], int(b[idx:idx+2], 2) - # ints.append(int(b[idx:idx+2], 2)) - # print '%s: array(\'i\', %s),' % (s,tuple(ints)), - # if i > 0 and (i+1) % 4 == 0: - # print '\n ', - #print '}' - - diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgCaCo.py --- a/tools/rgenetics/rgCaCo.py +++ /dev/null @@ -1,271 +0,0 @@ -#!/usr/local/bin/python -# hack to run and process a plink case control association -# expects args as -# bfilepath outname jobname outformat (wig,xls) -# ross lazarus -# for wig files, we need annotation so look for map file or complain -""" -Parameters for wiggle track definition lines -All options are placed in a single line separated by spaces: - - track type=wiggle_0 name=track_label description=center_label \ - visibility=display_mode color=r,g,b altColor=r,g,b \ - priority=priority autoScale=on|off \ - gridDefault=on|off maxHeightPixels=max:default:min \ - graphType=bar|points viewLimits=lower:upper \ - yLineMark=real-value yLineOnOff=on|off \ - windowingFunction=maximum|mean|minimum smoothingWindow=off|2-16 -""" - -import sys,math,shutil,subprocess,os,time,tempfile,string -from os.path import abspath -from rgutils import timenow, plinke -imagedir = '/static/rg' # if needed for images -myversion = 'V000.1 April 2007' -verbose = False - -def makeGFF(resf='',outfname='',logf=None,twd='.',name='track name',description='track description',topn=1000): - """ - score must be scaled to 0-1000 - - Want to make some wig tracks from each analysis - Best n -log10(p). Make top hit the window. - we use our tab output which has - rs chrom offset ADD_stat ADD_p ADD_log10p - rs3094315 1 792429 1.151 0.2528 0.597223 - - """ - - def is_number(s): - try: - float(s) - return True - except ValueError: - return False - header = 'track name=%s description="%s" visibility=2 useScore=1 color=0,60,120\n' % (name,description) - column_names = [ 'Seqname', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'Group' ] - halfwidth=100 - resfpath = os.path.join(twd,resf) - resf = open(resfpath,'r') - resfl = resf.readlines() # dumb but convenient for millions of rows - resfl = [x.split() for x in resfl] - headl = resfl[0] - resfl = resfl[1:] - headl = [x.strip().upper() for x in headl] - headIndex = dict(zip(headl,range(0,len(headl)))) - whatwewant = ['CHR','RS','OFFSET','LOG10ARMITAGEP'] - wewant = [headIndex.get(x,None) for x in whatwewant] - if None in wewant: # missing something - logf.write('### Error missing a required header from %s in makeGFF - headIndex=%s\n' % (whatwewant,headIndex)) - return - ppos = wewant[3] # last in list - resfl = [x for x in resfl if x[ppos] > '' and x[ppos] <> 'NA'] - resfl = [(float(x[ppos]),x) for x in resfl] # decorate - resfl.sort() - resfl.reverse() # using -log10 so larger is better - pvals = [x[0] for x in resfl] # need to scale - resfl = [x[1] for x in resfl] # drop decoration - resfl = resfl[:topn] # truncate - maxp = max(pvals) # need to scale - minp = min(pvals) - prange = abs(maxp-minp) + 0.5 # fudge - scalefact = 1000.0/prange - logf.write('###maxp=%f,minp=%f,prange=%f,scalefact=%f\n' % (maxp,minp,prange,scalefact)) - for i,row in enumerate(resfl): - row[ppos] = '%d' % (int(scalefact*pvals[i])) - resfl[i] = row # replace - outf = file(outfname,'w') - outf.write(header) - outres = [] # need to resort into chrom offset order - for i,lrow in enumerate(resfl): - chrom,snp,offset,p, = [lrow[x] for x in wewant] - gff = ('chr%s' % chrom,'rgCaCo','variation','%d' % (int(offset)-halfwidth), - '%d' % (int(offset)+halfwidth),p,'.','.','%s logp=%1.2f' % (snp,pvals[i])) - outres.append(gff) - outres = [(x[0],int(x[3]),x) for x in outres] # decorate - outres.sort() # into chrom offset - outres=[x[2] for x in outres] # undecorate - outres = ['\t'.join(x) for x in outres] - outf.write('\n'.join(outres)) - outf.write('\n') - outf.close() - - -def plink_assocToGG(plinkout="hm",tag='test'): - """ plink --assoc output looks like this - # CHR SNP A1 F_A F_U A2 CHISQ P OR - # 1 rs3094315 G 0.6685 0.1364 A 104.1 1.929e-24 12.77 - # write as a genegraph input file - """ - inf = file('%s.assoc' % plinkout,'r') - outf = file('%sassoc.xls' % plinkout,'w') - res = ['rs\tlog10p%s\tFakeInvOR%s\tRealOR%s' % (tag,tag,tag),] # output header for ucsc genome graphs - head = inf.next() - for l in inf: - ll = l.split() - if len(ll) >= 8: - p = float(ll[7]) - if p <> 'NA': # eesh - logp = '%9.9f' % -math.log10(p) - else: - logp = 'NA' - try: - orat = ll[8] - except: - orat = 'NA' - orat2 = orat - # invert large negative odds ratios - if float(orat) < 1 and float(orat) > 0.0: - orat2 = '%9.9f' % (1.0/float(orat)) - outl = [ll[1],logp, orat2, orat] - res.append('\t'.join(outl)) - outf.write('\n'.join(res)) - outf.write('\n') - outf.close() - inf.close() - -def xformModel(infname='',resf='',outfname='', - name='foo',mapf='/usr/local/galaxy/data/rg/ped/x.bim',flog=None): - """munge a plink .model file into either a ucsc track or an xls file - rerla@meme ~/plink]$ head hmYRI_CEU.model - CHR SNP TEST AFF UNAFF CHISQ DF P - 1 rs3094315 GENO 41/37/11 0/24/64 NA NA NA - 1 rs3094315 TREND 119/59 24/152 81.05 1 2.201e-19 - 1 rs3094315 ALLELIC 119/59 24/152 104.1 1 1.929e-24 - 1 rs3094315 DOM 78/11 24/64 NA NA NA - - bim file has -[rerla@beast pbed]$ head plink_wgas1_example.bim -1 rs3094315 0.792429 792429 G A -1 rs6672353 0.817376 817376 A G - """ - if verbose: - print 'Rgenetics rgCaCo.xformModel got resf=%s, outfname=%s' % (resf,outfname) - res = [] - rsdict = {} - map = file(mapf,'r') - for l in map: # plink map - ll = l.strip().split() - if len(ll) >= 3: - rs=ll[1].strip() - chrom = ll[0] - if chrom.lower() == 'x': - chrom='23' - elif chrom.lower() == 'y': - chrom = 24 - elif chrom.lower() == 'mito': - chrom = 25 - offset = ll[3] - rsdict[rs] = (chrom,offset) - res.append('rs\tChr\tOffset\tGenop\tlog10Genop\tArmitagep\tlog10Armitagep\tAllelep\tlog10Allelep\tDomp\tlog10Domp') - f = open(resf,'r') - headl = f.readline() - if headl.find('\t') <> -1: - headl = headl.split('\t') - delim = '\t' - else: - headl = headl.split() - delim = None - whatwewant = ['CHR','SNP','TEST','AFF','UNAFF','CHISQ','P'] - wewant = [headl.index(x) for x in whatwewant] - llen = len(headl) - lnum = anum = 0 - lastsnp = None # so we know when to write out a gg line - outl = {} - f.seek(0) - for lnum,l in enumerate(f): - if lnum == 0: - continue - ll = l.split() - if delim: - ll = l.split(delim) - if len(ll) >= llen: # valid line - chr,snp,test,naff,nuaff,chi,p = [ll[x] for x in wewant] - snp = snp.strip() - chrom,offset = rsdict.get(snp,(None,None)) - anum += 1 - fp = 1.0 # if NA - lp = 0.0 - try: - fp = float(p) - if fp > 0: - lp = -math.log10(fp) - else: - fp = 9e-100 - flog.write('### WARNING - Plink calculated %s for %s p value!!! 9e-100 substituted!\n' % (p,test)) - flog.write('### offending line #%d in %s = %s' % (lnum,l)) - except: - pass - if snp <> lastsnp: - if len(outl.keys()) > 3: - sl = [outl.get(x,'?') for x in ('snp','chrom','offset','GENO','TREND','ALLELIC','DOM')] - res.append('\t'.join(sl)) # last snp line - outl = {'snp':snp,'chrom':chrom,'offset':offset} # first 3 cols for gg line - lastsnp = snp # reset for next marker - #if p == 'NA': - # p = 1.0 - # let's pass downstream for handling R is fine? - outl[test] = '%s\t%f' % (p,lp) - if len(outl.keys()) > 3: - l = [outl.get(x,'?') for x in ('snp','chrom','offset','GENO','TREND','ALLELIC','DOM')] - res.append('\t'.join(l)) # last snp line - f = file(outfname,'w') - res.append('') - f.write('\n'.join(res)) - f.close() - - - - -if __name__ == "__main__": - """ - # called as - <command interpreter="python"> - rgCaCo.py '$i.extra_files_path/$i.metadata.base_name' "$name" - '$out_file1' '$logf' '$logf.files_path' '$gffout' - </command></command> - """ - if len(sys.argv) < 7: - s = 'rgCaCo.py needs 6 params - got %s \n' % (sys.argv) - print >> sys.stdout, s - sys.exit(0) - bfname = sys.argv[1] - name = sys.argv[2] - killme = string.punctuation + string.whitespace - trantab = string.maketrans(killme,'_'*len(killme)) - name = name.translate(trantab) - outfname = sys.argv[3] - logf = sys.argv[4] - logoutdir = sys.argv[5] - gffout = sys.argv[6] - topn = 1000 - try: - os.makedirs(logoutdir) - except: - pass - map_file = None - me = sys.argv[0] - amapf = '%s.bim' % bfname # to decode map in xformModel - flog = file(logf,'w') - logme = [] - cdir = os.getcwd() - s = 'Rgenetics %s http://rgenetics.org Galaxy Tools, rgCaCo.py started %s\n' % (myversion,timenow()) - print >> sys.stdout, s # so will appear as blurb for file - logme.append(s) - if verbose: - s = 'rgCaCo.py: bfname=%s, logf=%s, argv = %s\n' % (bfname, logf, sys.argv) - print >> sys.stdout, s # so will appear as blurb for file - logme.append(s) - twd = tempfile.mkdtemp(suffix='rgCaCo') # make sure plink doesn't spew log file into the root! - tname = os.path.join(twd,name) - vcl = [plinke,'--noweb','--bfile',bfname,'--out',name,'--model'] - p=subprocess.Popen(' '.join(vcl),shell=True,stdout=flog,cwd=twd) - retval = p.wait() - resf = '%s.model' % tname # plink output is here we hope - xformModel(bfname,resf,outfname,name,amapf,flog) # leaves the desired summary file - makeGFF(resf=outfname,outfname=gffout,logf=flog,twd=twd,name='rgCaCo_TopTable',description=name,topn=topn) - flog.write('\n'.join(logme)) - flog.close() # close the log used - #shutil.copytree(twd,logoutdir) - shutil.rmtree(twd) # clean up - diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgCaCo.xml --- a/tools/rgenetics/rgCaCo.xml +++ /dev/null @@ -1,104 +0,0 @@ -<tool id="rgCaCo1" name="Case Control:"> - <requirements><requirement type="package">plink</requirement></requirements> - <description>for unrelated subjects</description> - <command interpreter="python"> - rgCaCo.py '$i.extra_files_path/$i.metadata.base_name' "$title" '$out_file1' '$logf' '$logf.files_path' '$gffout' - </command> - <inputs> - <param name="i" type="data" label="RGenetics genotype data from your current history" - format="pbed" /> - <param name='title' type='text' size="132" value='CaseControl' label="Title for this job"/> - - </inputs> - - <outputs> - <data format="tabular" name="out_file1" label="${title}_rgCaCo.xls" /> - <data format="txt" name="logf" label="${title}_rgCaCo.log"/> - <data format="gff" name="gffout" label="${title}_rgCaCoTop.gff" /> - </outputs> -<tests> - <test> - <param name='i' value='tinywga' ftype='pbed' > - <metadata name='base_name' value='tinywga' /> - <composite_data value='tinywga.bim' /> - <composite_data value='tinywga.bed' /> - <composite_data value='tinywga.fam' /> - <edit_attributes type='name' value='tinywga' /> - </param> - <param name='title' value='rgCaCotest1' /> - <output name='out_file1' file='rgCaCotest1_CaCo.xls' ftype='tabular' compare='diff' /> - <output name='logf' file='rgCaCotest1_CaCo_log.txt' ftype='txt' compare='diff' lines_diff='20' /> - <output name='gffout' file='rgCaCotest1_CaCo_topTable.gff' ftype='gff' compare='diff' /> - </test> -</tests> -<help> - -.. class:: infomark - -**Syntax** - -- **Genotype file** is the input case control data chosen from available library Plink binary files -- **Map file** is the linkage format .map file corresponding to the genotypes in the Genotype file -- **Type of test** is the kind of test statistic to report such as Armitage trend test or genotype test -- **Format** determines how your data will be returned to your Galaxy workspace - ------ - -**Summary** - -This tool will perform some standard statistical tests comparing subjects designated as -affected (cases) and unaffected subjects (controls). To avoid bias, it is important that -controls who had been affected would have been eligible for sampling as cases. This may seem -odd, but it requires that the cases and controls are drawn from the same sampling frame. - -The armitage trend test is robust to departure from HWE and so very attractive - after all, a real disease -mutation may well result in distorted HWE at least in cases. All the others are susceptible to -bias in the presence of HWE departures. - -All of these tests are exquisitely sensitive to non-differential population stratification in cases -compared to controls and this must be tested before believing any results here. Use the PCA method for -100k markers or more. - -If you don't see the genotype data set you want here, it can be imported using one of the methods available from -the Galaxy Get Data tool page. - -Output format can be UCSC .bed if you want to see your -results as a fully fledged UCSC track. A map file containing the chromosome and offset for each marker is required for -writing this kind of output. -Alternatively you can use .gg for the UCSC Genome Graphs tool which has all of the advantages -of the the .bed track, plus a neat, visual front end that displays a lot of useful clues. -Either of these are a very useful way of quickly getting a look -at your data in full genomic context. - -Finally, if you can't live without -spreadsheet data, choose the .xls tab delimited format. It's not a stupid binary excel file. Just a plain old tab delimited -one with a header. Fortunately excel is dumb enough to open these without much protest. - - ------ - -.. class:: infomark - -**Attribution** - -This Galaxy tool relies on Plink (see Plinksrc_) to test Casae Control association models. - -So, we rely on the author (Shaun Purcell) for the documentation you need specific to those settings - they are very nicely documented - see -DOC_ - -Tool and Galaxy datatypes originally designed and written for the Rgenetics -series of whole genome scale statistical genetics tools by ross lazarus (ross.lazarus@gmail.com) - -Copyright Ross Lazarus March 2007 -This Galaxy wrapper is released licensed under the LGPL_ but is about as useful as a chocolate teapot without Plink which is GPL. - -I'm no lawyer, but it looks like you got GPL if you use this software. Good luck. - -.. _Plinksrc: http://pngu.mgh.harvard.edu/~purcell/plink/ - -.. _LGPL: http://www.gnu.org/copyleft/lesser.html - -.. _DOC: http://pngu.mgh.harvard.edu/~purcell/plink/anal.shtml#cc - -</help> -</tool> diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgClean.py --- a/tools/rgenetics/rgClean.py +++ /dev/null @@ -1,160 +0,0 @@ -""" -# galaxy tool xml files can define a galaxy supplied output filename -# that must be passed to the tool and used to return output -# here, the plink log file is copied to that file and removed -# took a while to figure this out! -# use exec_before_job to give files sensible names -# -# ross april 14 2007 -# plink cleanup script -# ross lazarus March 2007 for camp illumina whole genome data -# note problems with multiple commands being ignored - eg --freq --missing --mendel -# only the first seems to get done... -# -##Summary statistics versus inclusion criteria -## -##Feature As summary statistic As inclusion criteria -##Missingness per individual --missing --mind N -##Missingness per marker --missing --geno N -##Allele frequency --freq --maf N -##Hardy-Weinberg equilibrium --hardy --hwe N -##Mendel error rates --mendel --me N M -# -# call as plinkClean.py $i $o $mind $geno $hwe $maf $mef $mei $outfile -# note plinkClean_code.py does some renaming before the job starts - - - <command interpreter="python2.4"> - rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind' '$geno' '$hwe' '$maf' - '$mef' '$mei' '$out_file1' '$out_file1.files_path' '$userId' - - -""" -import sys,shutil,os,subprocess, glob, string, tempfile, time -from rgutils import galhtmlprefix, timenow, plinke -prog = os.path.split(sys.argv[0])[-1] -myversion = 'January 4 2010' -verbose=False - - -def fixoutaff(outpath='',newaff='1'): - """ quick way to create test data sets - set all aff to 1 or 2 for - some hapmap data and then merge - [rerla@beast galaxy]$ head tool-data/rg/library/pbed/affyHM_CEU.fam - 1341 14 0 0 2 1 - 1341 2 13 14 2 1 - 1341 13 0 0 1 1 - 1340 9 0 0 1 1 - 1340 10 0 0 2 1 - """ - nchanged = 0 - fam = '%s.fam' % outpath - famf = open(fam,'r') - fl = famf.readlines() - famf.close() - for i,row in enumerate(fl): - lrow = row.split() - if lrow[-1] <> newaff: - lrow[-1] = newaff - fl[i] = ' '.join(lrow) - fl[i] += '\n' - nchanged += 1 - fo = open(fam,'w') - fo.write(''.join(fl)) - fo.close() - return nchanged - - - -def clean(): - """ - """ - if len(sys.argv) < 16: - print >> sys.stdout, '## %s expected 12 params in sys.argv, got %d - %s' % (prog,len(sys.argv),sys.argv) - print >> sys.stdout, """this script will filter a linkage format ped - and map file containing genotypes. It takes 14 parameters - the plink --f parameter and" - a new filename root for the output clean data followed by the mind,geno,hwe,maf, mef and mei" - documented in the plink docs plus the file to be returned to Galaxy - called as: - <command interpreter="python"> - rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind' - '$geno' '$hwe' '$maf' '$mef' '$mei' '$out_file1' '$out_file1.files_path' - '$relfilter' '$afffilter' '$sexfilter' '$fixaff' - </command> - - """ - sys.exit(1) - plog = [] - inpath = sys.argv[1] - inbase = sys.argv[2] - killme = string.punctuation + string.whitespace - trantab = string.maketrans(killme,'_'*len(killme)) - title = sys.argv[3].translate(trantab) - mind = sys.argv[4] - geno = sys.argv[5] - hwe = sys.argv[6] - maf = sys.argv[7] - me1 = sys.argv[8] - me2 = sys.argv[9] - outfname = sys.argv[10] - outfpath = sys.argv[11] - relf = sys.argv[12] - afff = sys.argv[13] - sexf = sys.argv[14] - fixaff = sys.argv[15] - output = os.path.join(outfpath,outfname) - outpath = os.path.join(outfpath,title) - outprunepath = os.path.join(outfpath,'ldprune_%s' % title) - try: - os.makedirs(outfpath) - except: - pass - bfile = os.path.join(inpath,inbase) - outf = file(outfname,'w') - vcl = [plinke,'--noweb','--bfile',bfile,'--make-bed','--out', - outpath,'--set-hh-missing','--mind',mind, - '--geno',geno,'--maf',maf,'--hwe',hwe,'--me',me1,me2] - # yes - the --me parameter takes 2 values - mendels per snp and per family - if relf == 'oo': # plink filters are what they leave... - vcl.append('--filter-nonfounders') # leave only offspring - elif relf == 'fo': - vcl.append('--filter-founders') - if afff == 'affonly': - vcl.append('--filter-controls') - elif relf == 'unaffonly': - vcl.append('--filter-cases') - if sexf == 'fsex': - vcl.append('--filter-females') - elif relf == 'msex': - vcl.append('--filter-males') - p=subprocess.Popen(' '.join(vcl),shell=True,cwd=outfpath) - retval = p.wait() - plog.append('%s started, called as %s' % (prog,' '.join(sys.argv))) - outf.write(galhtmlprefix % prog) - outf.write('<ul>\n') - plogf = '%s.log' % os.path.join(outfpath,title) - try: - plogl = file(plogf,'r').readlines() - plog += [x.strip() for x in plogl] - except: - plog += ['###Cannot open plink log file %s' % plogf,] - # if fixaff, want to 'fix' the fam file - if fixaff <> '0': - nchanged = fixoutaff(outpath=outpath,newaff=fixaff) - plog += ['## fixaff was requested %d subjects affection status changed to %s' % (nchanged,fixaff)] - pf = file(plogf,'w') - pf.write('\n'.join(plog)) - pf.close() - globme = os.path.join(outfpath,'*') - flist = glob.glob(globme) - flist.sort() - for i, data in enumerate( flist ): - outf.write('<li><a href="%s">%s</a></li>\n' % (os.path.split(data)[-1],os.path.split(data)[-1])) - outf.write('</ul>\n') - outf.write("</ul></br></div></body></html>") - outf.close() - - -if __name__ == "__main__": - clean() - diff -r 3df68022beef2077798ed50a33afb03be75a841a -r c8b55344e779e8be93bae5d66f52128addd83003 tools/rgenetics/rgClean.xml --- a/tools/rgenetics/rgClean.xml +++ /dev/null @@ -1,155 +0,0 @@ -<tool id="rgClean1" name="Clean genotypes:"> - <requirements><requirement type="package">plink</requirement></requirements> - <description>filter markers, subjects</description> - - <command interpreter="python"> - rgClean.py '$input_file.extra_files_path' '$input_file.metadata.base_name' '$title' '$mind' - '$geno' '$hwe' '$maf' '$mef' '$mei' '$out_file1' '$out_file1.files_path' - '$relfilter' '$afffilter' '$sexfilter' '$fixaff' - </command> - - <inputs> - <param name="input_file" type="data" label="RGenetics genotype library file in compressed Plink format" - size="120" format="pbed" /> - <param name="title" type="text" size="80" label="Descriptive title for cleaned genotype file" value="Cleaned_data"/> - <param name="geno" type="text" label="Maximum Missing Fraction: Markers" value="0.05" /> - <param name="mind" type="text" value="0.1" label="Maximum Missing Fraction: Subjects"/> - <param name="mef" type="text" label="Maximum Mendel Error Rate: Family" value="0.05"/> - <param name="mei" type="text" label="Maximum Mendel Error Rate: Marker" value="0.05"/> - <param name="hwe" type="text" value="0" label="Smallest HWE p value (set to 0 for all)" /> - <param name="maf" type="text" value="0.01" - label="Smallest Minor Allele Frequency (set to 0 for all)"/> - <param name='relfilter' label = "Filter on pedigree relatedness" type="select" - optional="false" size="132" - help="Optionally remove related subjects if pedigree identifies founders and their offspring"> - <option value="all" selected='true'>No filter on relatedness</option> - <option value="fo" >Keep Founders only (pedigree m/f ID = "0")</option> - <option value="oo" >Keep Offspring only (one randomly chosen if >1 sibs in family)</option> - </param> - <param name='afffilter' label = "Filter on affection status" type="select" - optional="false" size="132" - help="Optionally remove affected or non affected subjects"> - <option value="allaff" selected='true'>No filter on affection status</option> - <option value="affonly" >Keep Controls only (affection='1')</option> - <option value="unaffonly" >Keep Cases only (affection='2')</option> - </param> - <param name='sexfilter' label = "Filter on gender" type="select" - optional="false" size="132" - help="Optionally remove all male or all female subjects"> - <option value="allsex" selected='true'>No filter on gender status</option> - <option value="msex" >Keep Males only (pedigree gender='1')</option> - <option value="fsex" >Keep Females only (pedigree gender='2')</option> - </param> - <param name="fixaff" type="text" value="0" - label = "Change ALL subjects affection status to (0=no change,1=unaff,2=aff)" - help="Use this option to switch the affection status to a new value for all output subjects" /> - </inputs> - - <outputs> - <data format="pbed" name="out_file1" metadata_source="input_file" label="${title}_rgClean.pbed" /> - </outputs> - -<tests> - <test> - <param name='input_file' value='tinywga' ftype='pbed' > - <metadata name='base_name' value='tinywga' /> - <composite_data value='tinywga.bim' /> - <composite_data value='tinywga.bed' /> - <composite_data value='tinywga.fam' /> - <edit_attributes type='name' value='tinywga' /> - </param> - <param name='title' value='rgCleantest1' /> - <param name="geno" value="1" /> - <param name="mind" value="1" /> - <param name="mef" value="0" /> - <param name="mei" value="0" /> - <param name="hwe" value="0" /> - <param name="maf" value="0" /> - <param name="relfilter" value="all" /> - <param name="afffilter" value="allaff" /> - <param name="sexfilter" value="allsex" /> - <param name="fixaff" value="0" /> - <output name='out_file1' file='rgtestouts/rgClean/rgCleantest1.pbed' compare="diff" lines_diff="25" > - <extra_files type="file" name='rgCleantest1.bim' value="rgtestouts/rgClean/rgCleantest1.bim" compare="diff" /> - <extra_files type="file" name='rgCleantest1.fam' value="rgtestouts/rgClean/rgCleantest1.fam" compare="diff" /> - <extra_files type="file" name='rgCleantest1.bed' value="rgtestouts/rgClean/rgCleantest1.bed" compare="diff" /> - </output> - </test> -</tests> -<help> - -.. class:: infomark - -**Syntax** - -- **Genotype data** is the input genotype file chosen from your current history -- **Descriptive title** is the name to use for the filtered output file -- **Missfrac threshold: subjects** is the threshold for missingness by subject. Subjects with more than this fraction missing will be excluded from the import -- **Missfrac threshold: markers** is the threshold for missingness by marker. Markers with more than this fraction missing will be excluded from the import -- **MaxMendel Individuals** Mendel error fraction above which to exclude subjects with more than the specified fraction of mendelian errors in transmission (for family data only) -- **MaxMendel Families** Mendel error fraction above which to exclude families with more than the specified fraction of mendelian errors in transmission (for family data only) -- **HWE** is the threshold for HWE test p values below which the marker will not be imported. Set this to -1 and all markers will be imported regardless of HWE p value -- **MAF** is the threshold for minor allele frequency - SNPs with lower MAF will be excluded -- **Filters** for founders/offspring or affected/unaffected or males/females are optionally available if needed -- **Change Affection** is only needed if you want to change the affection status for creating new analysis datasets - ------ - -**Attribution** - -This tool relies on the work of many people. It uses Plink http://pngu.mgh.harvard.edu/~purcell/plink/, -and the R http://cran.r-project.org/ and -Bioconductor http://www.bioconductor.org/ projects. -respectively. - -In particular, http://pngu.mgh.harvard.edu/~purcell/plink/ -has excellent documentation describing the parameters you can set here. - -This implementation is a Galaxy tool wrapper around these third party applications. -It was originally designed and written for family based data from the CAMP Illumina run of 2007 by -ross lazarus (ross.lazarus@gmail.com) and incorporated into the rgenetics toolkit. - -Rgenetics merely exposes them, wrapping Plink so you can use it in Galaxy. - ------ - -**Summary** - -Reliable statistical inference depends on reliable data. Poor quality samples and markers -may add more noise than signal, decreasing statistical power. Removing the worst of them -can be done by setting thresholds for some of the commonly used technical quality measures -for genotype data. Of course discordant replicate calls are also very informative but are not -in scope here. - -Marker cleaning: Filters are available to remove markers below a specific minor allele -frequency, beyond a Hardy Wienberg threshold, below a minor allele frequency threshold, -or above a threshold for missingness. If family data are available, thresholds for Mendelian -error can be set. - -Subject cleaning: Filters are available to remove subjects with many missing calls. Subjects and markers for family data can be filtered by proportions -of Mendelian errors in observed transmission. Use the QC reporting tool to -generate a comprehensive series of reports for quality control. - -Note that ancestry and cryptic relatedness should also be checked using the relevant tools. - ------ - -.. class:: infomark - -**Tip** - -You can check that you got what you asked for by running the QC tool to ensure that the distributions -are truncated the way you expect. Note that you do not expect that the thresholds will be exactly -what you set - some bad assays and subjects are out in multiple QC measures, so you sometimes have -more samples or markers than you exactly set for each threshold. Finally, the ordering of -operations matters and Plink is somewhat restrictive about what it will do on each pass -of the data. At least it's fixed. - ------ - -This Galaxy tool was written by Ross Lazarus for the Rgenetics project -It uses Plink for most calculations - for full Plink attribution, source code and documentation, -please see http://pngu.mgh.harvard.edu/~purcell/plink/ plus some custom python code - -</help> -</tool> This diff is so big that we needed to truncate the remainder. Repository URL: https://bitbucket.org/galaxy/galaxy-central/ -- This is a commit notification from bitbucket.org. You are receiving this because you have the service enabled, addressing the recipient of this email.