April 2013 - galaxy-dev - lists.galaxyproject.org

Error running tophat2 in Galaxy
by Sachit Adhikari 02 Jul '13

02 Jul '13

I am getting this error: Error in tophat: [2013-02-13 20:46:41] Beginning TopHat run (v2.0.7) ----------------------------------------------- [2013-02-13 20:46:41] Checking for Bowtie Bowtie version: 2.0.6.0 [2013-02-13 20:46:41] Checking for Samtools Samtools version: 0.1.18.0 [2013-02-13 20:46:41] Checking for Bowtie index files [2013-02-13 20:46:41] Checking for reference FASTA file Warning: Could not find FASTA file /data/rathi/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome.fa [2013-02-13 20:46:41] Reconstituting reference FASTA file from Bowtie index Executing: /usr/bin/bowtie2-inspect /data/rathi/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome > ./tophat_out/tmp/genome.fa [2013-02-13 20:48:51] Generating SAM header for /data/rathi/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome format: fastq quality scale: phred33 (default) [2013-02-13 20:49:23] Preparing reads left reads: min. length=34, max. length=34, 2 kept reads (0 discarded) Warning: you have only one segment per read. If the read length is greater than or equal to 45bp, we strongly recommend that you decrease --segment-length to about half the read length because TopHat will work better with multiple segments [2013-02-13 20:49:23] Mapping left_kept_reads to genome genome with Bowtie2 [2013-02-13 20:49:56] Searching for junctions via segment mapping Coverage-search algorithm is turned on, making this step very slow Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory. Warning: junction database is empty! [2013-02-13 20:51:18] Reporting output tracks [FAILED] Error running /usr/local/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p4 --no-closure-search --no-microexon-search --sam-header ./tophat_out/tmp/genome_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/bin/samtools --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 ./tophat_out/tmp/genome.fa ./tophat_out/junctions.bed ./tophat_out/insertions.bed ./tophat_out/deletions.bed ./tophat_out/fusions.out ./tophat_out/tmp/accepted_hits ./tophat_out/tmp/left_kept_reads.bam Loading ...done What's wrong?

2 1

Tool menu customization :: Card #727
by Björn Grüning 13 Jun '13

13 Jun '13

Hi, thanks to the awesome work from John Chilton in pull request #160 [1] I hacked up a first version of a tool customization that can be controlled by the user [2]. Requested as trello card #727. The user will have a new preference panel (see the attached screenshot) and can toggle several customization options. The options can be specified by each administrator as filter modules under lib/galaxy/tools/filters/. More details in John's original pull request. For example, one use case would be to offer different module sets for different studies. You can specify system-customizations (John's work) with tool_filters = module:function, module:function2 tool_label_filters = ... tool_section_filters = ... and offer user-customizations with: user_tool_filters = examples:restrict_upload_to_admins user_tool_section_filters = examples:restrict_text user_tool_label_filters = ... at the same time. Only user-customizations will be shown in the preference panel. The description of each filter is parsed from the docstring and shown to the user. The patch requires no modification to the database, all user preferences will be stored in user_preference with three special names. Is there any plan from the core development team how such a feature should be addressed. Is that approach flexible enough? It would be great to get some feedback in which direction such a feature should evolve and if its worth to put more time on it. Thanks John, hope you code will be merged! Thanks for your comments, Bjoern [1] https://bitbucket.org/galaxy/galaxy-central/pull-request/160/implement-dyna… [2] https://bitbucket.org/BjoernGruening/galaxy-central-bgruening/commits/68e7b…

1 1

Galaxy on Cluster - how to set -a flag with username
by greg 12 Jun '13

12 Jun '13

In our local galaxy install we want the cluster jobs to be run from the galaxy user but we want to include a -a [account name] to our grid software bills properly. Here's what I currently have in universe.wsgi: default_cluster_job_runner = drmaa://-V -pe batch 8/ What I want is something like this: default_cluster_job_runner = drmaa://-V -pe batch 8 -a [logged in user name]/ Is this possible? Thanks, Greg

3 5

Issue with set_user_disk_usage.py and Postgres 8.x
by Lance Parsons 06 Jun '13

06 Jun '13

The recent updates to set_user_disk_usage.py for Postgres users have an issue with Postgres 8.x. The SQL in the pgcalc method (line 51) leads to the following error: sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "d.total_size" must appear in the GROUP BY clause or be used in an aggregate function LINE 4: FROM ( SELECT d.total_siz... ^ The problem is that version of Postgres before 9.x were a bit more restrictive in the use of GROUP BY. This can be fixed using DISTINCT ON instead. See this StackOverflow post for more info: http://stackoverflow.com/questions/1769361/postgresql-group-by-different-fr… I've included a patch below. Let me know if a pull request would be preferred. --- a/scripts/set_user_disk_usage.py +++ b/scripts/set_user_disk_usage.py @@ -52,7 +52,7 @@ sql = """ UPDATE galaxy_user SET disk_usage = (SELECT COALESCE(SUM(total_size), 0) - FROM ( SELECT d.total_size + FROM ( SELECT DISTINCT ON (d.id) d.total_size, d.id FROM history_dataset_association hda JOIN history h ON h.id = hda.history_id JOIN dataset d ON hda.dataset_id = d.id @@ -62,7 +62,7 @@ AND d.purged = false AND d.id NOT IN (SELECT dataset_id FROM library_dataset_dataset_association) - GROUP BY d.id) sizes) + ) sizes) WHERE id = :id RETURNING disk_usage; """ -- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University

3 4

Re: [galaxy-dev] [galaxy-user] List of genomes
by Jennifer Jackson 01 Jun '13

01 Jun '13

Hello Yongde, The list of genomes is gathered from many sources and is comprehensive to facilitate external display functionality (at UCSC - main and microbial-, Ensembl, Wormbase, etc.). When assigning a dataset in a standard format to one of these sources, those available will appear as links within the dataset's box. Trackster (Galaxy's native visualization tool) is available to most common data formats, even in the absence of an assigned database, through the use of the Custom Reference Genome function (aka "Custom Build"). We think this is a great advantage, in particular for cases such as yours - since you don't have restrict yourself to external applications that happen to host your genome. Click on the link here and select "Trackster" to give it a test run: The Custom Reference Genome function is also intended to be used for smaller genomes such as this one when performing alignments and most other jobs - no pre-indexing of the genome is necessary. Simply load the genome in fasta format as a dataset and use it with tools, using a "reference genome from the history". The rational is that these are many, small, easily indexed during the course of job processing, and provides immediate access to genomes that are either newly published, or not widely used, or simply too numerous as a whole class for us to practically process in full and keep current. We have detailed help about how to use the Custom Reference Genome method, including troubleshooting help should you need it, although in practice you will likely find this to be fairly simple with 2-3 preparatory steps, depending on the source. Most if not all of these can be done within Galaxy. http://wiki.galaxyproject.org/Support#Custom_reference_genome Hopefully this helps. If you do need more guidance, please let us know, Best, Jen Galaxy team On 4/29/13 9:01 AM, YBao wrote: > Hi All, > > I was trying to map a set of data to a genome, Klebsiella pneumoniae > subsp. Pneumoniae MGH 78578(31). While uploading the reads, I was able > to find the reference genome as listed above. However, when I tried to > map the data using wither bowtie or BWA, the pull down list did not > include this genome. Can someone help or enlighten me as why it did > not make into the list? > > Thanks > > Yongde > > -- > Yongde Bao > DNA Sciences Core > Dept. of Microbiology, Immunology, > and Cancer Biology > UVA > > > ___________________________________________________________ > The Galaxy User list should be used for the discussion of > Galaxy analysis and other features on the public server > at usegalaxy.org. Please keep all replies on the list by > using "reply all" in your mail client. For discussion of > local Galaxy instances and the Galaxy source code, please > use the Galaxy Development list: > > http://lists.bx.psu.edu/listinfo/galaxy-dev > > To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/ > > To search Galaxy mailing lists use the unified search at: > > http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org

2 1

Displaying genomic sequences in Trackster
by Naharajan Lakshmanaperumal 24 May '13

24 May '13

Dear all, We have our own galaxy instance and the idea is to have trackster enabled for users to be able to visualize NGS mapping. We were able to configure trackster in our instance and the visualization works fine. We have two questions regarding trackster: 1) We can't display genomic sequences in trackster. As per the tutorial, we set the location of the .2bit file in the twobit.loc file for the trackster to be able to display the genomic sequence but for some reason it doesn't display it. The name of the builds is the same in all places i.e) in ucsc/chrom/builds.txt and also in the .loc files. Any ideas on what else should be done? 2) While saving the visualization, there is always an error message saying "could not save visualization" and it doesn't seem to be a web browser issue. How do we then save the visualization? Thanks in advance, Naharajan

3 4

tool_dependencies inside tool_dependencies
by Björn Grüning 23 May '13

23 May '13

Hi, is there a general rule to handle dependencies inside of tool_dependencies.xml? Lets assume I write a matplotlib orphan tool_dependencies.xml file. matplotlib depends on numpy. Numpy has already a orphan definition. Is there a way to include numpy as dependency inside the matplotlib-definition, so that I did not need to fetch and compile numpy inside of matplotlib? I tried to specify it beforehand but that did not work. Thanks! Bjoern

6 15

Function for deleting specific datasets from the history
by Aristos Aristodimou 13 May '13

13 May '13

Dear dev, I have two questions 1) Is there a function that I can use to delete a dataset from the history? I have a tool that uses as input a hidden dataset and I want to delete the hidden dataset once the tool is executed. I have the filename of the dataset and I was wondering if there is a function that I can use for deleting it. 2) When a user deletes a dataset is there a way to get the filename of the deleted dataset? Thanks, Aris

2 1

routing to a cluster or not on a per-tool basis
by Dan Tenenbaum 08 May '13

08 May '13

Hi, I have some tools that run really quickly without using any kind of cluster. I would prefer not to run these tools on a cluster, as the overhead of submitting these jobs makes them take much longer than they otherwise would. I have other tools that are computationally intensive and need to be run on a cluster. I would like to expose all these tools in the same Galaxy instance, but have some tools run on the cluster and others not. Is this possible? Thanks, Dan

2 3

Difficulties using <repeat> tagset with min attribute
by Cory Spencer 08 May '13

08 May '13

Hi all - I've been trying to get the <repeat>...</repeat> tag working with a min attribute for some time now, though without any success. It works in other tools distributed with Galaxy, but when I attempt to use it in one of our custom tools, it dies with a "AttributeError: 'ExpressionContext' object has no attribute 'keys'" exception. Can anybody offer any insight? The full traceback is: ⇝ AttributeError: 'ExpressionContext' object has no attribute 'keys' URL: http://localhost:8080/tool_runner?tool_id=scde-list-compare Module weberror.evalexception.middleware:364 in respond view >> app_iter = self.application(environ, detect_start_response) Module paste.debug.prints:98 in __call__ view >> environ, self.app) Module paste.wsgilib:539 in intercept_output view >> app_iter = application(environ, replacement_start_response) Module paste.recursive:80 in __call__ view >> return self.application(environ, start_response) Module paste.httpexceptions:632 in __call__ view >> return self.application(environ, start_response) Module galaxy.web.framework.base:160 in __call__ view >> body = method( trans, **kwargs ) Module galaxy.web.controllers.tool_runner:68 in index view >> template, vars = tool.handle_input( trans, params.__dict__ ) Module galaxy.tools:1320 in handle_input view >> state = self.new_state( trans ) Module galaxy.tools:1248 in new_state view >> self.fill_in_new_state( trans, inputs, state.inputs ) Module galaxy.tools:1257 in fill_in_new_state view >> state[ input.name ] = input.get_initial_value( trans, context ) Module galaxy.tools.parameters.grouping:100 in get_initial_value view >> rval_dict[ input.name ] = input.get_initial_value( trans, context ) Module galaxy.tools.parameters.basic:1016 in get_initial_value view >> return SelectToolParameter.get_initial_value( self, trans, context ) Module galaxy.tools.parameters.basic:785 in get_initial_value view >> if self.need_late_validation( trans, context ): Module galaxy.tools.parameters.basic:1022 in need_late_validation view >> if super( ColumnListParameter, self ).need_late_validation( trans, context ): Module galaxy.tools.parameters.basic:766 in need_late_validation view >> for layer in context.itervalues(): Module UserDict:116 in itervalues view >> for _, v in self.iteritems(): Module UserDict:109 in iteritems view >> for k in self: Module UserDict:96 in __iter__ view >> for k in self.keys(): AttributeError: 'ExpressionContext' object has no attribute 'keys'

2 2