[hg] galaxy 2682: Added Bowtie wrapper tool
details: http://www.bx.psu.edu/hg/galaxy/rev/a7b1304e736f changeset: 2682:a7b1304e736f user: Kelly Vincent <kpvincent@bx.psu.edu> date: Fri Sep 11 14:38:05 2009 -0400 description: Added Bowtie wrapper tool 9 file(s) affected in this change: test-data/bowtie_in1.fastq test-data/bowtie_in2.fastq test-data/bowtie_in3.fastq test-data/bowtie_out1.sam test-data/bowtie_out2.sam tool-data/bowtie_indices.loc.sample tool_conf.xml.sample tools/sr_mapping/bowtie_wrapper.py tools/sr_mapping/bowtie_wrapper.xml diffs (819 lines): diff -r e7b899fb4462 -r a7b1304e736f test-data/bowtie_in1.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/bowtie_in1.fastq Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,5 @@ +@HWI-EAS91_1_30788AAXX:1:1:1513:715/1 +GTTTTTTNNGCATAGATGTTTAGTTGTGGTAGTCAG ++/1 +IIIIIII""IIIIIIIIIIIIIIIIIIIDI?II-+I + diff -r e7b899fb4462 -r a7b1304e736f test-data/bowtie_in2.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/bowtie_in2.fastq Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,4 @@ +@HWI-EAS91_1_30788AAXX:1:2:618:346/1 +TAGACTACGAAAGTGACTTTAATACCTCTGACTACA ++ +IIIIIIIIIIIIIIIIIIIIIIIIIIIII%4II;I3 diff -r e7b899fb4462 -r a7b1304e736f test-data/bowtie_in3.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/bowtie_in3.fastq Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,4 @@ +@HWI-EAS91_1_30788AAXX:1:2:618:346/2 +ATAGGCTGAATTAGCAATGGATGGTGGGGTTTATCG ++ +IIIIIIIIIIIIIII9I.II5II6DFIIIIII*I2) diff -r e7b899fb4462 -r a7b1304e736f test-data/bowtie_out1.sam --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/bowtie_out1.sam Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,1 @@ +HWI-EAS91_1_30788AAXX:1:1:1513:715 16 chrM 9563 25 36M * 0 0 CTGACTACCACAACTAAACATCTATGCNNAAAAAAC I+-II?IDIIIIIIIIIIIIIIIIIII""IIIIIII NM:i:1 X1:i:1 MD:Z:7N0N27 diff -r e7b899fb4462 -r a7b1304e736f test-data/bowtie_out2.sam --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/bowtie_out2.sam Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,2 @@ +HWI-EAS91_1_30788AAXX:1:2:618:346 0 chrM 441 25 36M * 0 0 TAGACTACGAAAGTGACTTTAATACCTCTGACTACA IIIIIIIIIIIIIIIIIIIIIIIIIIIII%4II;I3 NM:i:0 X0:i:1 MD:Z:36 +HWI-EAS91_1_30788AAXX:1:2:618:346 16 chrM 652 25 36M * 0 0 CGATAAACCCCACCATCCATTGCTAATTCAGCCTAT )2I*IIIIIIFD6II5II.I9IIIIIIIIIIIIIII NM:i:1 X1:i:1 MD:Z:17A18 diff -r e7b899fb4462 -r a7b1304e736f tool-data/bowtie_indices.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/bowtie_indices.loc.sample Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,28 @@ +#This is a sample file distributed with Galaxy that enables tools +#to use a directory of Bowtie indexed sequences data files. You will need +#to create these data files and then create a bowtie_indices.loc file +#similar to this one (store it in this directory ) that points to +#the directories in which those files are stored. The bowtie_indices.loc +#file has this format (white space characters are TAB characters): +# +#<build> <file_base> +# +#So, for example, if you had hg18 indexed stored in +#/depot/data2/galaxy/bowtie/hg18/, +#then the bowtie_indices.loc entry would look like this: +# +#hg18 /depot/data2/galaxy/bowtie/hg18/hg18 +# +#and your /depot/data2/galaxy/bowtie/hg18/ directory +#would contain hg18.*.ebwt files: +# +#-rw-r--r-- 1 james universe 830134 2005-09-13 10:12 hg18.1.ebwt +#-rw-r--r-- 1 james universe 527388 2005-09-13 10:12 hg18.2.ebwt +#-rw-r--r-- 1 james universe 269808 2005-09-13 10:12 gh18.3.ebwt +#...etc... +# +#Your bowtie_indices.loc file should include an entry per line for +#each index set you have stored. The "file" in the path does not actually +#exist, but it is the prefix for the actual index files. For example: +# +#hg18 /depot/data2/galaxy/bowtie/hg18/hg18 diff -r e7b899fb4462 -r a7b1304e736f tool_conf.xml.sample --- a/tool_conf.xml.sample Fri Sep 11 12:48:33 2009 -0400 +++ b/tool_conf.xml.sample Fri Sep 11 14:38:05 2009 -0400 @@ -332,7 +332,8 @@ <tool file="metag_tools/megablast_xml_parser.xml" /> <tool file="metag_tools/blat_wrapper.xml" /> <tool file="metag_tools/mapping_to_ucsc.xml" /> - </section> + <tool file="sr_mapping/bowtie_wrapper.xml" /> + </section> <section name="Tracks" id="tracks"> <tool file="visualization/genetrack.xml" /> </section> diff -r e7b899fb4462 -r a7b1304e736f tools/sr_mapping/bowtie_wrapper.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sr_mapping/bowtie_wrapper.py Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,174 @@ +#! /usr/bin/python + +""" +Runs Bowtie on single-end or paired-end data. +""" + +import optparse, os, sys, tempfile + +def stop_err( msg ): + sys.stderr.write( "%s\n" % msg ) + sys.exit() + +def __main__(): + #Parse Command Line + parser = optparse.OptionParser() + parser.add_option('', '--input1', dest='input1', help='The (forward or single-end) reads file in Sanger FASTQ format') + parser.add_option('', '--input2', dest='input2', help='The reverse reads file in Sanger FASTQ format') + parser.add_option('', '--output', dest='output', help='The output file') + parser.add_option('', '--paired', dest='paired', help='Whether the data is single- or paired-end') + parser.add_option('', '--genomeSource', dest='genomeSource', help='The type of reference provided') + parser.add_option('', '--ref', dest='ref', help='The reference genome to use or index') + parser.add_option('', '--skip', dest='skip', help='Skip the first n reads') + parser.add_option('', '--alignLimit', dest='alignLimit', help='Only align the first n reads') + parser.add_option('', '--trimH', dest='trimH', help='Trim n bases from high-quality (left) end of each read before alignment') + parser.add_option('', '--trimL', dest='trimL', help='Trim n bases from low-quality (right) end of each read before alignment') + parser.add_option('', '--mismatchSeed', dest='mismatchSeed', help='Maximum number of mismatches permitted in the seed') + parser.add_option('', '--mismatchQual', dest='mismatchQual', help='Maximum permitted total of quality values at mismatched read positions') + parser.add_option('', '--seedLen', dest='seedLen', help='Seed length') + parser.add_option('', '--rounding', dest='rounding', help='Whether or not to round to the nearest 10 and saturating at 30') + parser.add_option('', '--maqSoapAlign', dest='maqSoapAlign', help='Choose MAQ- or SOAP-like alignment policy') + parser.add_option('', '--tryHard', dest='tryHard', help='Whether or not to try as hard as possible to find valid alignments when they exist') + parser.add_option('', '--valAlign', dest='valAlign', help='Report up to n valid arguments per read') + parser.add_option('', '--allValAligns', dest='allValAligns', help='Whether or not to report all valid alignments per read') + parser.add_option('', '--suppressAlign', dest='suppressAlign', help='Suppress all alignments for a read if more than n reportable alignments exist') + parser.add_option('', '--offbase', dest='offbase', help='Number the first base of a reference sequence as n when outputting alignments') + parser.add_option('', '--best', dest='best', help="Whether or not to make Bowtie guarantee that reported singleton alignments are 'best' in terms of stratum and in terms of the quality values at the mismatched positions") + parser.add_option('', '--maxBacktracks', dest='maxBacktracks', help='Maximum number of backtracks permitted when aligning a read') + parser.add_option('', '--threadMem', dest='threadMem', help='Number of megabytes of memory a given thread is given to store path descriptors in best mode') + parser.add_option('', '--strata', dest='strata', help='Whether or not to report only those alignments that fall in the best stratum if many valid alignments exist and are reportable') + parser.add_option('', '--minInsert', dest='minInsert', help='Minimum insert size for valid paired-end alignments') + parser.add_option('', '--maxInsert', dest='maxInsert', help='Maximum insert size for valid paired-end alignments') + parser.add_option('', '--mateOrient', dest='mateOrient', help='The upstream/downstream mate orientation for valid paired-end alignment against the forward reference strand') + parser.add_option('', '--maxAlignAttempt', dest='maxAlignAttempt', help='Maximum number of attempts Bowtie will make to match an alignment for one mate with an alignment for the opposite mate') + parser.add_option('', '--forwardAlign', dest='forwardAlign', help='Whether or not to attempt to align the forward reference strand') + parser.add_option('', '--reverseAlign', dest='reverseAlign', help='Whether or not to attempt to align the reverse-complement reference strand') + parser.add_option('', '--phased', dest='phased', help='Whether or not it should alternate between using the forward and mirror indexes in a series of phases so that only half of the index is resident in memory at one time') + parser.add_option('', '--offrate', dest='offrate', help='Override the offrate of the index to n') + parser.add_option('', '--mm', dest='mm', help='Whether or not to use memory-mapped I/O to load the index') + parser.add_option('', '--seed', dest='seed', help='Seed for pseudo-random number generator') + parser.add_option('', '--dbkey', dest='dbkey', help='') + parser.add_option('', '--params', dest='params', help='Whether to use default or specified parameters') + parser.add_option('', '--iauto_b', dest='iauto_b', help='Automatic or specified behavior') + parser.add_option('', '--ipacked', dest='ipacked', help='Whether or not to use a packed representation for DNA strings') + parser.add_option('', '--ibmax', dest='ibmax', help='Maximum number of suffixes allowed in a block') + parser.add_option('', '--ibmaxdivn', dest='ibmaxdivn', help='Maximum number of suffixes allowed in a block as a fraction of the length of the reference') + parser.add_option('', '--idcv', dest='idcv', help='The period for the difference-cover sample') + parser.add_option('', '--inodc', dest='inodc', help='Whether or not to disable the use of the difference-cover sample') + parser.add_option('', '--inoref', dest='inoref', help='Whether or not to build the part of the reference index used only in paried-end alignment') + parser.add_option('', '--ioffrate', dest='ioffrate', help='How many rows get marked during annotation of some or all of the Burrows-Wheeler rows') + parser.add_option('', '--iftab', dest='iftab', help='The size of the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first n characters of the query') + parser.add_option('', '--intoa', dest='intoa', help='Whether or not to convert Ns in the reference sequence to As') + parser.add_option('', '--iendian', dest='iendian', help='Endianness to use when serializing integers to the index file') + parser.add_option('', '--iseed', dest='iseed', help='Seed for the pseudorandom number generator') + parser.add_option('', '--icutoff', dest='icutoff', help='Number of first bases of the reference sequence to index') + parser.add_option('', '--ioldpmap', dest='ioldpmap', help='Use the scheme for mapping joined reference locations to original reference locations used in versions of Bowtie prior to 0.9.8') + parser.add_option('', '--indexSettings', dest='index_settings', help='Whether or not indexing options are to be set') + (options, args) = parser.parse_args() + + # index if necessary + if options.genomeSource == 'history': + # set up commands + if options.index_settings =='index_pre_set': + indexing_cmds = '' + else: + try: + indexing_cmds = '%s %s %s %s %s %s %s --offrate %s %s %s %s %s %s %s' % \ + (('','--noauto')[options.iauto_b=='set'], + ('','--packed')[options.ipacked=='packed'], + ('','--bmax %s'%options.ibmax)[options.ibmax!='None' and options.ibmax>=1], + ('','--bmaxdivn %s'%options.ibmaxdivn)[options.ibmaxdivn!='None'], + ('','--dcv %s'%options.idcv)[options.idcv!='None'], + ('','--nodc')[options.inodc=='nodc'], + ('','--noref')[options.inoref=='noref'], options.ioffrate, + ('','--ftabchars %s'%options.iftab)[int(options.iftab)>=0], + ('','--ntoa')[options.intoa=='yes'], + ('--little','--big')[options.iendian=='big'], + ('','--seed %s'%options.iseed)[int(options.iseed)>0], + ('','--cutoff %s'%options.icutoff)[int(options.icutoff)>0], + ('','--oldpmap')[options.ioldpmap=='yes']) + except ValueError: + indexing_cmds = '' + + # make temp directory for placement of indices and copy reference file there + tmp_dir = tempfile.gettempdir() + try: + os.system('cp %s %s' % (options.ref, tmp_dir)) + except Exception, erf: + stop_err('Error creating temp directory for indexing purposes\n' + str(erf)) + options.ref = os.path.join(tmp_dir,os.path.split(options.ref)[1]) + cmd1 = 'cd %s; bowtie-build %s -f %s %s > /dev/null' % (tmp_dir, indexing_cmds, options.ref, options.ref) + try: + os.system(cmd1) + except Exception, erf: + stop_err('Error indexing reference sequence\n' + str(erf)) + + # set up aligning and generate aligning command options + # automatically set threads to 8 in both cases + if options.params == 'pre_set': + aligning_cmds = '-p 8' + else: + try: + aligning_cmds = '%s %s %s %s %s %s %s %s %s %s %s %s %s %s ' \ + '%s %s %s %s %s %s %s %s %s %s %s %s %s %s -p 8' % \ + (('','-s %s'%options.skip)[options.skip!='None'], + ('','-u %s'%options.alignLimit)[int(options.alignLimit)>0], + ('','-5 %s'%options.trimH)[int(options.trimH)>=0], + ('','-3 %s'%options.trimL)[int(options.trimL)>=0], + ('','-n %s'%options.mismatchSeed)[options.mismatchSeed=='0' or options.mismatchSeed=='1' or options.mismatchSeed=='2' or options.mismatchSeed=='3'], + ('','-e %s'%options.mismatchQual)[int(options.mismatchQual)>=0], + ('','-l %s'%options.seedLen)[int(options.seedLen)>=5], + ('','--nomaqround')[options.rounding=='noRound'], + ('','-v %s'%options.maqSoapAlign)[options.maqSoapAlign!='-1'], + ('','-I %s'%options.minInsert)[options.minInsert!='None'], + ('','-X %s'%options.maxInsert)[options.maxInsert!='None'], + ('','--%s'%options.mateOrient)[options.mateOrient!='None'], + ('','--pairtries %s'%options.maxAlignAttempt)[int(options.maxAlignAttempt)>=0], + ('','--nofw')[options.forwardAlign=='noForward'], + ('','--norc')[options.reverseAlign=='noReverse'], + ('','--maxbts %s'%options.maxBacktracks)[options.maxBacktracks!='None' and (options.mismatchSeed=='2' or options.mismatchSeed=='3')], + ('','-y')[options.tryHard=='doTryHard'], + ('','--chunkmbs %s'%options.threadMem)[options.threadMem!='None' and int(options.threadMem)>=0], + ('','-k %s'%options.valAlign)[options.valAlign!='None' and int(options.valAlign)>=0], + ('','-a')[options.allValAligns=='doAllValAligns' and int(options.allValAligns)>=0], + ('','-m %s'%options.suppressAlign)[int(options.suppressAlign)>=0], + ('','--best')[options.best=='doBest'], + ('','--strata')[options.strata=='doStrata'], + ('','-B %s'%options.offbase)[int(options.offbase)>=0], + ('','-z %s'%options.phased)[options.phased!='None'], + ('','-o %s'%options.offrate)[int(options.offrate)>=0], + ('','--mm')[options.mm=='doMm'], + ('','--seed %s'%options.seed)[int(options.seed)>=0]) + except ValueError: + aligning_cmds = '-p 8' + + tmp_out = tempfile.NamedTemporaryFile() + + # prepare actual aligning commands + if options.paired == 'paired': + cmd2 = 'bowtie %s %s -1 %s -2 %s > %s 2> /dev/null' % (aligning_cmds, options.ref, options.input1, options.input2, tmp_out.name) + else: + cmd2 = 'bowtie %s %s %s > %s 2> /dev/null' % (aligning_cmds, options.ref, options.input1, tmp_out.name) + # prepare command to convert bowtie output to sam and alternative + cmd3 = 'bowtie2sam.pl %s > %s' % (tmp_out.name, options.output) + cmd4 = 'cp %s %s' % (tmp_out.name, options.output) + + # align + try: + os.system(cmd2) + except Exception, erf: + stop_err("Error aligning sequence\n" + str(erf)) + if len(file(tmp_out.name,'r').read()) > 0: + #convert + try: + os.system(cmd3) + except Exception, erf: + stop_err('Error converting output to sam format\n' + str(erf)) + else: + try: + os.system(cmd4) + sys.stdout.write('Alignment file contained no data') + except Exception, erf: + stop_err('Error producing alignment file. File contained no data.\n' + str(erf)) + +if __name__=="__main__": __main__() diff -r e7b899fb4462 -r a7b1304e736f tools/sr_mapping/bowtie_wrapper.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/sr_mapping/bowtie_wrapper.xml Fri Sep 11 14:38:05 2009 -0400 @@ -0,0 +1,556 @@ +<tool id="bowtie_wrapper" name="Bowtie" version="1.0.0"> + <description> fast alignment of reads against reference sequence </description> + <command interpreter="python"> + bowtie_wrapper.py + --input1=$singlePaired.input1 + #if $singlePaired.sPaired == "paired": + --input2=$singlePaired.input2 + #else: + --input2="None" + #end if + --output=$output + --paired=$singlePaired.sPaired + --genomeSource=$refGenomeSource.genomeSource + #if $refGenomeSource.genomeSource == "history": + --ref=$refGenomeSource.ownFile + #else: + --ref=$refGenomeSource.indices.value + #end if + --params=$singlePaired.params.settings_type + #if $singlePaired.params.settings_type == "full": + --skip=$singlePaired.params.skip + --alignLimit=$singlePaired.params.alignLimit + --trimH=$singlePaired.params.trimH + --trimL=$singlePaired.params.trimL + --mismatchSeed=$singlePaired.params.mismatchSeed + --mismatchQual=$singlePaired.params.mismatchQual + --seedLen=$singlePaired.params.seedLen + --rounding=$singlePaired.params.rounding + --maqSoapAlign=$singlePaired.params.maqSoapAlign + --tryHard=$singlePaired.params.tryHard + --valAlign=$singlePaired.params.valAlign + --allValAligns=$singlePaired.params.allValAligns + --suppressAlign=$singlePaired.params.suppressAlign + --offbase=$singlePaired.params.offbase + --offrate=$singlePaired.params.offrate + --mm=$singlePaired.params.mm + --seed=$singlePaired.params.seed + --best=$singlePaired.params.bestOption.best + #if $singlePaired.params.bestOption.best == "doBest": + --maxBacktracks=$singlePaired.params.bestOption.maxBacktracks + --threadMem=$singlePaired.params.bestOption.threadMem + --strata=$singlePaired.params.bestOption.strata + --phased="None" + #else: + --maxBacktracks="None" + --threadMem="None" + --strata="None" + #if $singlePaired.sPaired =="single": + --phased=$singlePaired.params.bestOption.phased + #else: + --phased="None" + #end if + #end if + #if $singlePaired.sPaired == "single": + --minInsert="None" + --maxInsert="None" + --mateOrient="None" + --maxAlignAttempt="None" + --forwardAlign="None" + --reverseAlign="None" + #else: + --minInsert=$singlePaired.params.minInsert + --maxInsert=$singlePaired.params.maxInsert + --mateOrient=$singlePaired.params.mateOrient + --maxAlignAttempt=$singlePaired.params.maxAlignAttempt + --forwardAlign=$singlePaired.params.forwardAlign + --reverseAlign=$singlePaired.params.reverseAlign + #end if + #else + --skip="None" + --alignLimit="None" + --trimH="None" + --trimL="None" + --mismatchSeed="None" + --mismatchQual="None" + --seedLen="None" + --rounding="None" + --maqSoapAlign="None" + --tryHard="None" + --valAlign="None" + --allValAligns="None" + --suppressAlign="None" + --offbase="None" + --best="None" + --maxBacktracks="None" + --threadMem="None" + --strata="None" + --minInsert="None" + --maxInsert="None" + --mateOrient="None" + --maxAlignAttempt="None" + --forwardAlign="None" + --reverseAlign="None" + --phased="None" + --offrate="None" + --mm="None" + --seed="None" + #end if + #if $refGenomeSource.genomeSource == "history": + --dbkey=$dbkey + #else: + --dbkey="None" + #end if + #if $refGenomeSource.genomeSource == "history": + --indexSettings=$refGenomeSource.indexParams.index_settings + #else: + --indexSettings="None" + #end if + #if $refGenomeSource.genomeSource == "history" and $refGenomeSource.indexParams.index_settings == "index_full": + --iauto_b=$refGenomeSource.indexParams.auto_behavior.auto_b + #if $refGenomeSource.indexParams.auto_behavior.auto_b == "set": + --ipacked=$refGenomeSource.indexParams.auto_behavior.packed + --ibmax=$refGenomeSource.indexParams.auto_behavior.bmax + --ibmaxdivn=$refGenomeSource.indexParams.auto_behavior.bmaxdivn + --idcv=$refGenomeSource.indexParams.auto_behavior.dcv + #else: + --ipacked="None" + --ibmax="None" + --ibmaxdivn="None" + --idcv="None" + #end if + --inodc=$refGenomeSource.indexParams.nodc + --inoref=$refGenomeSource.indexParams.noref + --ioffrate=$refGenomeSource.indexParams.offrate + --iftab=$refGenomeSource.indexParams.ftab + --intoa=$refGenomeSource.indexParams.ntoa + --iendian=$refGenomeSource.indexParams.endian + --iseed=$refGenomeSource.indexParams.seed + --icutoff=$refGenomeSource.indexParams.cutoff + --ioldpmap=$refGenomeSource.indexParams.oldpmap + #else: + --iauto_b="None" + --ipacked="None" + --ibmax="None" + --ibmaxdivn="None" + --idcv="None" + --inodc="None" + --inoref="None" + --ioffrate="None" + --iftab="None" + --intoa="None" + --iendian="None" + --iseed="None" + --icutoff="None" + --ioldpmap="None" + #end if + </command> + <inputs> + <conditional name="refGenomeSource"> + <param name="genomeSource" type="select" label="Will you select a reference genome from your history or use a built-in index?" help="Built-ins were indexed using default options"> + <option value="indexed">Use a built-in index</option> + <option value="history">Use one from the history</option> + </param> + <when value="indexed"> + <param name="indices" type="select" label="Select a reference genome"> + <options from_file="bowtie_indices.loc"> + <column name="value" index="1" /> + <column name="name" index="0" /> + <filter type="sort_by" column="0" /> + </options> + </param> + </when> + <when value="history"> + <param name="ownFile" type="data" format="fasta" metadata_name="dbkey" label="Select a reference genome" /> + <conditional name="indexParams"> + <param name="index_settings" type="select" label="Choose whether to use default options or to set your own"> + <option value="index_pre_set">Commonly Used</option> + <option value="index_full">Full Parameter List</option> + </param> + <when value="index_pre_set" /> + <when value="index_full"> + <conditional name="auto_behavior"> + <param name="auto_b" type="select" label="Choose to use automatic or specified behavior for some parameters (-a)" help="Allows you to set --packed, --bmax, --bmaxdivn, and --dcv"> + <option value="auto">Automatic behavior</option> + <option value="set">Set values (sets --noauto and allows others to be set)</option> + </param> + <when value="auto" /> + <when value="set"> + <param name="packed" type="select" label="Whether or not to use a packed representation for DNA strings (-p)"> + <option value="unpacked">Use regular representation</option> + <option value="packed">Use packed representation</option> + </param> + <param name="bmax" type="integer" value="-1" label="Maximum number of suffixes allowed in a block (--bmax)" help="-1 for not specified. Must be at least 1" /> + <param name="bmaxdivn" type="integer" value="4" label="Maximum number of suffixes allowed in a block as a fraction of the length of the reference (--bmaxdivn)" /> + <param name="dcv" type="integer" value="1024" label="The period for the difference-cover sample (--dcv)" /> + </when> + </conditional> + <param name="nodc" type="select" label="Whether or not to disable the use of the difference-cover sample (--nodc)" help="Suffix sorting becomes quadratic-time in the worst case (a very repetetive reference)"> + <option value="dc">Use difference-cover sample</option> + <option value="nodc">Disable difference-cover sample</option> + </param> + <param name="noref" type="select" label="Whether or not to build the part of the reference index used only in paired-end alignment (-r)"> + <option value="ref">Build all index files</option> + <option value="noref">Do not build paired-end alignment index files</option> + </param> + <param name="offrate" type="integer" value="5" label="How many rows get marked during annotation of some or all of the Burrows-Wheeler rows (-o)" /> + <param name="ftab" type="integer" value="10" label="The size of the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first n characters of the query (-t)" help="ftab is 4^(n+1) bytes" /> + <param name="ntoa" type="select" label="Whether or not to convert Ns in the reference sequence to As (--ntoa)"> + <option value="no">Do not convert Ns</option> + <option value="yes">Convert Ns to As</option> + </param> + <param name="endian" type="select" label="Endianness to use when serializing integers to the index file (--big/--little)" help="Little is most appropriate for Intel- and AMD-based architecture"> + <option value="little">Little</option> + <option value="big">Big</option> + </param> + <param name="seed" type="integer" value="-1" label="Seed for the pseudorandom number generator (--seed)" help="Use -1 to use default" /> + <param name="cutoff" type="integer" value="-1" label="Number of first bases of the reference sequence to index (--cutoff)" help="Use -1 to use default" /> + <param name="oldpmap" type="select" label="Use the scheme for mapping joined reference locations to original reference locations used in versions of Bowtie prior to 0.9.8 (--oldpmap)" help="The old scheme uses padding and the new one doesn't"> + <option value="no">Use the new scheme</option> + <option value="yes">Use the old scheme</option> + </param> + </when> <!-- index_full --> + </conditional> + </when> + </conditional> <!-- refGenomeSource --> + <conditional name="singlePaired"> + <param name="sPaired" type="select" label="Is this library mate-paired?"> + <option value="single">Single-end</option> + <option value="paired">Paired-end</option> + </param> + <when value="single"> + <param name="input1" type="data" format="fastqsanger" label="FASTQ file" /> + <conditional name="params"> + <param name="settings_type" type="select" label="Bowtie settings to use" help="For most mapping needs use Commonly used settings. If you want full control use Full parameter list"> + <option value="pre_set">Commonly used</option> + <option value="full">Full parameter list</option> + </param> + <when value="pre_set" /> + <when value="full"> + <param name="skip" type="integer" value="0" label="Skip the first n reads (-s)" /> + <param name="alignLimit" type="integer" value="-1" label="Only align the first n reads (-u)" help="-1 for off" /> + <param name="trimH" type="integer" value="0" label="Trim n bases from high-quality (left) end of each read before alignment (-5)" /> + <param name="trimL" type="integer" value="0" label="Trim n bases from low-quality (right) end of each read before alignment (-3)" /> + <param name="mismatchSeed" type="integer" value="2" label="Maximum number of mismatches permitted in the seed (-n)" help="May be 0, 1, 2, or 3" /> + <param name="mismatchQual" type="integer" value="70" label="Maximum permitted total of quality values at mismatched read positions (-e)" /> + <param name="seedLen" type="integer" value="28" label="Seed length (-l)" help="Minimum value is 5" /> + <param name="rounding" type="select" label="Whether or not to round to the nearest 10 and saturating at 30 (--nomaqround)"> + <option value="round">Round to nearest 10</option> + <option value="noRound">Do not round to nearest 10</option> + </param> + <param name="maqSoapAlign" type="integer" value="-1" label="Number of mismatches for SOAP-like alignment policy (-v)" help="-1 for default MAQ-like alignment policy" /> + <param name="tryHard" type="select" label="Whether or not to try as hard as possible to find valid alignments when they exist (-y)" help="Tryhard mode is much slower than regular mode"> + <option value="noTryHard">Do not try hard</option> + <option value="doTryHard">Try hard</option> + </param> + <param name="valAlign" type="integer" value="1" label="Report up to n valid arguments per read (-k)" /> + <param name="allValAligns" type="select" label="Whether or not to report all valid alignments per read (-a)"> + <option value="noAllValAligns">Do not report all valid alignments</option> + <option value="doAllValAligns">Report all valid alignments</option> + </param> + <param name="suppressAlign" type="integer" value="-1" label="Suppress all alignments for a read if more than n reportable alignments exist (-m)" help="-1 for no limit" /> + <param name="offbase" type="integer" value="0" label="Number the first base of a reference sequence as n when outputting alignments (-B)" /> + <conditional name="bestOption"> + <param name="best" type="select" label="Whether or not to make Bowtie guarantee that reported singleton alignments are 'best' in terms of stratum and in terms of the quality values at the mismatched positions (--best)" help="Removes all strand bias. Only affects which alignments are reported by Bowtie. Runs slower with best option"> + <option value="noBest">Do not use best</option> + <option value="doBest">Use best</option> + </param> + <when value="noBest"> + <param name="maxBacktracks" type="integer" value="125" label="Maximum number of backtracks permitted when aligning a read (--maxbts)" /> + <param name="phased" type="select" label="Whether or not it should alternate between using the forward and mirror indexes in a series of phases so that only half of the index is resident in memory at one time (-z)"> + <option value="noPhased">Don't alternate</option> + <option value="doPhased">Do alternate</option> + </param> + </when> + <when value="doBest"> + <param name="maxBacktracks" type="integer" value="800" label="Maximum number of backtracks permitted when aligning a read (--maxbts)" /> + <param name="threadMem" type="integer" value="32" label="Number of megabytes of memory a given thread is given to store path descriptors in best mode (--chunkmbs)" help="If running in best mode, and you run out of memory, try adjusting this" /> + <param name="strata" type="select" label="Whether or not to report only those alignments that fall in the best stratum if many valid alignments exist and are reportable (--strata)"> + <option value="noStrata">Do not use strata option</option> + <option value="doStrata">Use strata option</option> + </param> + </when> + </conditional> <!-- bestOption --> + <param name="offrate" type="integer" value="-1" label="Override the offrate of the index to n (-o)" help="-1 for default" /> + <param name="mm" type="select" label="Whether or not to use memory-mapped I/O to load the index (--m)"> + <option value="noMm">Use POSIX/C file I/O</option> + <option value="doMm">Use memory-mapped I/O</option> + </param> + <param name="seed" type="integer" value="-1" label="Seed for pseudo-random number generator (--seed)" help="-1 for default" /> + </when> <!-- full --> + </conditional> <!-- params --> + </when> <!-- single --> + <when value="paired"> + <param name="input1" type="data" format="fastqsanger" label="Forward FASTQ file" /> + <param name="input2" type="data" format="fastqsanger" label="Reverse FASTQ file" /> + <conditional name="params"> + <param name="settings_type" type="select" label="BWA settings to use" help="For most mapping needs use Commonly used settings. If you want full control use Full parameter list"> + <option value="pre_set">Commonly used</option> + <option value="full">Full parameter list</option> + </param> + <when value="pre_set" /> + <when value="full"> + <param name="skip" type="integer" value="0" label="Skip the first n pairs (-s)" /> + <param name="alignLimit" type="integer" value="-1" label="Only align the first n pairs (-u)" help="-1 for off" /> + <param name="trimH" type="integer" value="0" label="Trim n bases from high-quality (left) end of each read before alignment (-5)" /> + <param name="trimL" type="integer" value="0" label="Trim n bases from low-quality (right) end of each read before alignment (-3)" /> + <param name="mismatchSeed" type="integer" value="2" label="Maximum number of mismatches permitted in the seed (-n)" help="May be 0, 1, 2, or 3" /> + <param name="mismatchQual" type="integer" value="70" label="Maximum permitted total of quality values at mismatched read positions (-e)" /> + <param name="seedLen" type="integer" value="28" label="Seed length (-l)" help="Minimum value is 5" /> + <param name="rounding" type="select" label="Whether or not to round to the nearest 10 and saturating at 30 (--nomaqround)"> + <option value="round">Round to nearest 10</option> + <option value="noRound">Do not round to nearest 10</option> + </param> + <param name="maqSoapAlign" type="integer" value="-1" label="Number of mismatches for SOAP-like alignment policy (-v)" help="-1 for default MAQ-like alignment policy" /> + <param name="minInsert" type="integer" value="0" label="Minimum insert size for valid paired-end alignments (-I)" /> + <param name="maxInsert" type="integer" value="250" label="Maximum insert size for valid paired-end alignments (-X)" /> + <param name="mateOrient" type="select" label="The upstream/downstream mate orientation for valid paired-end alignment against the forward reference strand (--fr/--rf/--ff)"> + <option value="fr">FR (for Illumina)</option> + <option value="rf">RF</option> + <option value="ff">FF</option> + </param> + <param name="maxAlignAttempt" type="integer" value="100" label="Maximum number of attempts Bowtie will make to match an alignment for one mate with an alignment for the opposite mate (--pairtries)" /> + <param name="forwardAlign" type="select" label="Choose whether or not to attempt to align the forward reference strand (--nofw)"> + <option value="forward">Align against the forward reference strand</option> + <option value="noForward">Do not align against the forward reference strand</option> + </param> + <param name="reverseAlign" type="select" label="Choose whether or not to align against the reverse-complement reference strand (--norc)"> + <option value="reverse">Align against the reverse-complement reference strand</option> + <option value="noReverse">Do not align against the reverse-complement reference strand</option> + </param> + <param name="tryHard" type="select" label="Whether or not to try as hard as possible to find valid alignments when they exist (-y)" help="Tryhard mode is much slower than regular mode"> + <option value="noTryHard">Do not try hard</option> + <option value="doTryHard">Try hard</option> + </param> + <param name="valAlign" type="integer" value="1" label="Report up to n valid arguments per pair (-k)" /> + <param name="allValAligns" type="select" label="Whether or not to report all valid alignments per pair (-a)"> + <option value="noAllValAligns">Do not report all valid alignments</option> + <option value="doAllValAligns">Report all valid alignments</option> + </param> + <param name="suppressAlign" type="integer" value="-1" label="Suppress all alignments for a pair if more than n reportable alignments exist (-m)" help="-1 for no limit" /> + <param name="offbase" type="integer" value="0" label="Number the first base of a reference sequence as n when outputting alignments (-B)" /> + <conditional name="bestOption"> + <param name="best" type="select" label="Whether or not to make Bowtie guarantee that reported singleton alignments are 'best' in terms of stratum and in terms of the quality values at the mismatched positions (--best)" help="Removes all strand bias. Only affects which alignments are reported by Bowtie. Runs slower with best option"> + <option value="noBest">Do not use best</option> + <option value="doBest">Use best</option> + </param> + <when value="noBest"> + <param name="maxBacktracks" type="integer" value="125" label="Maximum number of backtracks permitted when aligning a read (--maxbts)" /> + </when> + <when value="doBest"> + <param name="maxBacktracks" type="integer" value="800" label="Maximum number of backtracks permitted when aligning a read (--maxbts)" /> + <param name="threadMem" type="integer" value="32" label="Number of megabytes of memory a given thread is given to store path descriptors in best mode (--chunkmbs)" help="If running in best mode, and you run out of memory, try adjusting this" /> + <param name="strata" type="select" label="Whether or not to report only those alignments that fall in the best stratum if many valid alignments exist and are reportable (--strata)"> + <option value="noStrata">Do not use strata option</option> + <option value="doStrata">Use strata option</option> + </param> + </when> + </conditional> + <param name="offrate" type="integer" value="-1" label="Override the offrate of the index to n -o)" help="-1 for default" /> + <param name="mm" type="select" label="Whether or not to use memory-mapped I/O to load the index (--mm)"> + <option value="noMm">Use POSIX/C file I/O</option> + <option value="doMm">Use memory-mapped I/O</option> + </param> + <param name="seed" type="integer" value="-1" label="Seed for pseudo-random number generator (--seed)" help="-1 for default" /> + </when> <!-- full --> + </conditional> <!-- params --> + </when> <!-- paired --> + </conditional> <!-- singlePaired --> + </inputs> + <outputs> + <data format="sam" name="output" /> + </outputs> + <tests> + <test> + <param name="genomeSource" value="indexed" /> + <param name="indices" value="chrM" /> + <param name="sPaired" value="single" /> + <param name="input1" ftype="fastqsanger" value="bowtie_in1.fastq" /> + <param name="settings_type" value="pre_set" /> + <output name="output" ftype="sam" file="bowtie_out1.sam" /> + </test> + <test> + <param name="genomeSource" value="history" /> + <param name="ownFile" value="chrM.fa" /> + <param name="index_settings" value="index_pre_set" /> + <param name="sPaired" value="paired" /> + <param name="input1" ftype="fastqsanger" value="bowtie_in2.fastq" /> + <param name="input2" ftype="fastqsanger" value="bowtie_in3.fastq" /> + <param name="settings_type" value="pre_set" /> + <output name="output" ftype="sam" file="bowtie_out2.sam" /> + </test> + <test> + <param name="genomeSource" value="history" /> + <param name="ownFile" value="chrM.fa" /> + <param name="index_settings" value="index_full" /> + <param name="auto_b" value="set" /> + <param name="packed" value="unpacked" /> + <param name="bmax" value="-1" /> + <param name="bmaxdivn" value="4" /> + <param name="dcv" value="2048" /> + <param name="nodc" value="dc" /> + <param name="noref" value="noref" /> + <param name="offrate" value="6" /> + <param name="ftab" value="10" /> + <param name="ntoa" value="yes" /> + <param name="endian" value="little" /> + <param name="seed" value="-1" /> + <param name="cutoff" value="-1" /> + <param name="oldpmap" value="no" /> + <param name="sPaired" value="single" /> + <param name="input1" ftype="fastqsanger" value="bowtie_in1.fastq" /> + <param name="settings_type" value="pre_set" /> + <output name="output" ftype="sam" file="bowtie_out1.sam" /> + </test> + <test> + <param name="genomeSource" value="indexed" /> + <param name="indices" value="chrM" /> + <param name="sPaired" value="paired" /> + <param name="input1" ftype="fastqsanger" value="bowtie_in2.fastq" /> + <param name="input2" ftype="fastqsanger" value="bowtie_in3.fastq" /> + <param name="settings_type" value="full" /> + <param name="skip" value="0" /> + <param name="alignLimit" value="-1" /> + <param name="trimL" value="0" /> + <param name="trimH" value="0" /> + <param name="mismatchSeed" value="3" /> + <param name="mismatchQual" value="50" /> + <param name="seedLen" value="10" /> + <param name="rounding" value="round" /> + <param name="maqSoapAlign" value="-1" /> + <param name="minInsert" value="0" /> + <param name="maxInsert" value="250" /> + <param name="mateOrient" value="fr" /> + <param name="maxAlignAttempt" value="100" /> + <param name="forwardAlign" value="forward" /> + <param name="reverseAlign" value="reverse" /> + <param name="tryHard" value="doTryHard" /> + <param name="valAlign" value="1" /> + <param name="allValAligns" value="noAllValAligns" /> + <param name="suppressAlign" value="-1" /> + <param name="offbase" value="0" /> + <param name="best" value="doBest" /> + <param name="maxBacktracks" value="800" /> + <param name="threadMem" value="32" /> + <param name="strata" value="noStrata" /> + <param name="offrate" value="-1" /> + <param name="mm" value="noMm" /> + <param name="seed" value="403" /> + <output name="output" ftype="sam" file="bowtie_out2.sam" /> + </test> + </tests> + <help> + +**What it does** + +Bowtie_ is a short read aligner designed to be ultrafast and memory-efficient. Reads can be as long as 1024 base pairs, though shorter is better. Bowtie produces a specific output format which is converted to SAM by this tool. + +.. _Bowtie: http://bowtie-bio.sourceforge.net/index.shtml + +------ + +**Input formats** + +Bowtie accepts files in Sanger FASTQ format. + +------ + +**Outputs** + +The output is in SAM format, and has the following columns:: + + 1 QNAME - Query (pair) NAME + 2 FLAG - bitwise FLAG + 3 RNAME - Reference sequence NAME + 4 POS - 1-based leftmost POSition/coordinate of clipped sequence + 5 MAPQ - MAPping Quality (Phred-scaled) + 6 CIGAR - extended CIGAR string + 7 MRNM - Mate Reference sequence NaMe ('=' if same as RNAME) + 8 MPOS - 1-based Mate POSition + 9 ISIZE - Inferred insert SIZE + 10 SEQ - query SEQuence on the same strand as the reference + 11 QUAL - query QUALity (ASCII-33 gives the Phred base quality) + 12 OPT - variable OPTional fields in the format TAG:VTYPE:VALU + +The flags are as follows:: + + Flag - Description + 0x0001 - the read is paired in sequencing + 0x0002 - the read is mapped in a proper pair + 0x0004 - the query sequence itself is unmapped + 0x0008 - the mate is unmapped + 0x0010 - strand of the query (1 for reverse) + 0x0020 - strand of the mate + 0x0040 - the read is the first read in a pair + 0x0080 - the read is the second read in a pair + 0x0100 - the alignment is not primary + +It looks like this (scroll sideways to see the entire example):: + + QNAME FLAG RNAME POS MAPQ CIAGR MRNM MPOS ISIZE SEQ QUAL OPT + HWI-EAS91_1_30788AAXX:1:1:1761:343 4 * 0 0 * * 0 0 AAAAAAANNAAAAAAAAAAAAAAAAAAAAAAAAAAACNNANNGAGTNGNNNNNNNGCTTCCCACAGNNCTGG hhhhhhh;;hhhhhhhhhhh^hOhhhhghhhfhhhgh;;h;;hhhh;h;;;;;;;hhhhhhghhhh;;Phhh + HWI-EAS91_1_30788AAXX:1:1:1578:331 4 * 0 0 * * 0 0 GTATAGANNAATAAGAAAAAAAAAAATGAAGACTTTCNNANNTCTGNANNNNNNNTCTTTTTTCAGNNGTAG hhhhhhh;;hhhhhhhhhhhhhhhhhhhhhhhhhhhh;;h;;hhhh;h;;;;;;;hhhhhhhhhhh;;hhVh + +------- + +**Bowtie settings** + +All of the options have a default value. You can change any of them. Most of the options in Bowtie have been implemented here. + +------ + +**Bowtie parameter list** + +This is an exhaustive list of Bowtie options: + +For indexing (bowtie-build):: + -a No auto behavior. Disable the default behavior where bowtie automatically selects values for --bmax/--dcv/--packed parameters according to the memory available. [off] + -p Packing. Use a packed representation for DNA strings. [auto] + --bmax <int> Suffix maximum. The maximum number of suffixes allowed in a block. [auto] + --bmaxdivn <int> Suffix maximum fraction. The maximum number of suffixes allowed in a block expressed as a fraction of the length of the reference. [4] + --dcv <int> Difference-cover sample. Use <int> as the period for the difference-cover sample. [1024] + --nodc <int> No difference-cover sample. Disable the difference-cover sample. [off] + -r No reference indexes. Do not build the NAME.3.ebwt and NAME.4.ebwt portions of the index, used only for paired-end alignment. [off] + -o Offrate. How many Burrows-Wheeler rows get marked by the indexer. The indexer will mark every 2^<int> rows. The marked rows correspond to rows on the genome. [5] + -t <int> Ftab. The lookup table used to calculate an initial Burrows-Wheeler range with respect to the first <int> characters of the query. Ftab is 4^<int>+1 bytes. [10] + --ntoa N conversion. Convert Ns to As before building the index. Otherwise, Ns are simply excluded from the index and Bowtie will not find alignments that overlap them. [off] + --big Endianness. Endianness to use when serializing integers to the index file. [off] + --little Endianness. [--little] + --seed <int> Random seed. Use <int> as the seed for the pseudo-random number generator. [off] + --cutoff <int> Cutoff. Index only the first <int> bases of the reference sequences (cumulative across sequences) and ignore the rest. [off] + --oldpmap Use old mapping scheme. Use the padding-based scheme from Bowtie versions before 0.9.8 instead of the current scheme. [off] + +For aligning (bowtie):: + -s <int> Skip. Do not align the first <int> reads or pairs in the input. [off] + -u <int> Align limit. Only align the first <int> reads/pairs from the input. [no limit] + -5 <int> High-quality trim. Trim <int> bases from the high-quality (left) end of each read before alignment. [0] + -3 <int> Low-quality trim. Trim <int> bases from the low-quality (right) end of each read before alignment. [0] + -n <int> Mismatch seed. Maximum number of mismatches permitted in the seed (defined with seed length option). Can be 0, 1, 2, or 3. [2] + -e <int> Mismatch quality. Maximum permitted total of quality values at mismatched read positions. Bowtie rounds quality values to the nearest 10 and saturates at 30. [70] + -l <int> Seed length. The number of bases on the high-quality end of the read to which the -n ceiling applies. Must be at least 5. [28] + --nomaqround Suppress MAQ rounding. Values are internally rounded to the nearest 10 and saturate at 30. This options turns off that rounding. [off] + -v <int> MAQ- or SOAP-like alignment policy. This option turns off the default MAQ-like alignment policy in favor of a SOAP-like one. End-to-end alignments with at most <int> mismatches. [off] + -I <int> Minimum insert. The minimum insert size for valid paired-end alignments. Does checking on untrimmed reads if -5 or -3 is used. [0] + --fr Mate orientation. The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. [--fr] + --rf Mate orientation. [off] + --ff Mate orientation. [off] + -X <int> Maximum insert. The maximum insert size for valid paired-end alignments. Does checking on untrimmed reads if -5 or -3 is used. [250] + --pairtries <int> Maximum alignment attempts for paired-end data. [100] + --nofw No forward aligning. Choosing this option means that Bowtie will not attempt to align against the forward reference strand. [off] + --norc No reverse-complement aligning. Setting this will mean that Bowtie will not attempt to align against the reverse-complement reference strand. [off] + --maxbts <int> Maximum backtracks. The maximum number of backtracks permitted when aligning a read in -n 2 or -n 3 mode. [125 without --best] [800 with --best] + -y Try hard. Try as hard as possible to find valid alignments when they exist, including paired-end alignments. [off] + --chunkmbs <int> Thread memory. The number of megabytes of memory a given thread is given to store path descriptors in --best mode. [32] + -k <int> Valid alignments. The number of valid alignments per read or pair. [off] + -a All valid alignments. Choosing this means that all valid alignments per read or pair will be reported. [off] + -m <int> Suppress alignments. Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. [no limit] + --best Best mode. Make Bowtie guarantee that reported singleton alignments are "best" in terms of stratum (the number of mismatches) and quality values at mismatched position. [off] + --strata Best strata. When running in best mode, report alignments that fall into the best stratum if there are ones falling into more than one. [off] + -B <int> First base number. When outputting alignments, number the first base of a reference sequence as <int>. [0] + -z <int> Phased. Alternate between using the forward and mirror indexes in a series of phases such that only one half of the index is resident in memory at one time. Cannot be used with paired-end alignment. [off] + -o <int> Offrate override. Override the offrate of the index with <int>. Some row markings are discarded when index read into memory. <int> must be greater than the value used to build the index (default: 5). [off] + --mm I/O for index loading. Choosing this option means that memory-mapped I/O will be used to load the index instead of the normal POSIX/C file I/O. Allows memory-efficient parallelization where using -p is not desirable. [off] + --seed <int> Random seed. Use <int> as the seed for the pseudo-random number generator. [off] + + </help> +</tool>
participants (1)
-
Greg Von Kuster