how is metadata generated
by KOH Jia Yu Jayce
Hi
Just wondering how metadata files in galaxy-dist/database/files/_metadata_files are generated? Are there any configurations in the xmls that specify these to be generated? Thanks
9 years, 11 months
debugging jobs in 'new' state
by Shantanu Pavgi
We experienced an issue where some of the galaxy jobs were sitting in the 'new' state for a quite long time. They were not waiting for cluster resources to become available, but haven't been even queued up through DRMAA. We are currently using non-debug mode and following were my observations:
* No indication of new jobs in paster.log file
* database/pbs script didn't contain any associated job scripts
* in backend database - job table contained their galaxy job id but no command_line input was recorded
Also, not all the jobs are waiting in the 'new' state. Many jobs submitted after above waiting jobs got completed successfully on the cluster. Is there any job submission logic within galaxy which is being used for submitting jobs? Any clues on how to debug this issue will be really helpful.
--
Thanks,
Shantanu.
10 years
Defining Job Runners Dynamically
by John Chilton
Hello All,
I just issued a pull request that augments Galaxy to allow defining
job runners dynamically at runtime
(https://bitbucket.org/galaxy/galaxy-central/pull-request/12/dynamic-job-r...).
Whether it makes the cut or not, I thought I would describe enhancements
here in case anyone else would find it useful.
There a couple use cases we hope this will help us address for our
institution - one is dynamically switching queues based on user (we have
a very nice shared memory resource that can only be used by researchers
with NIH funding) and the other is inspecting input sizes to give more
accurate max walltimes to pbs (a small number of cufflinks jobs for
instance take over three days on our cluster but defining max walltimes
in excess of that for all jobs could result in our queue sitting idle
around our monthly downtimes). You might also imagine using this to
dynamically switch queues entirely based on input sizes or parameters,
or alter queue priorities based on the submitting user or input
sizes/parameters.
There are two steps to use this - you must add a line in universe.ini
and define a function to compute the true job runner string in the new
file lib/galaxy/jobs/rules.py.
This first step is similar to what you would do to statically assign
a tool to a particular job runner. If you would like to dynamically
assign a job runner for cufflinks you would start by adding a line like
one of the following to universe.ini
cufflinks = dynamic:///python
-or-
cufflinks = dynamic:///python/compute_runner
If you use the first form, a function called cufflinks must be defined
in rules.py, adding the extra argument after python/ lets you specify a
particular function by name (compute_runner in this example). This
second option could let you assign job runners with the same function
for multiple tools.
The only other step is to define a python function in rules.py that
produces a string corresponding to a valid job runner such as
"local:///" or "pbs:///queue/-l walltime=48:00:00/".
If the functions defined in this file take in arguments, these arguments
should have names from the follow list: job_wrapper, user_email, app,
job, tool, tool_id, job_id, user. The plumbing will map these arguments
to the implied galaxy object. For instance, job_wrapper is the
JobWrapper instance for the job that gets passed to the job runner,
user_email is the user's email address or None, app is the main
application configuration object used throughout the code base that can
be used for instance to get values defined in universe.ini, job, tool,
and user are model objects, and job_id and tool_id the relevant ids.
If you are writing a function that routes a certain list of users to a
particular queue or increases their priority, you will probably only
need to take in one argument - user_email. However, if you are going to
look at input file sizes you may want to take in an argument called job
and use the following piece of code to find the input size for input
named "input1" in the tool xml.
inp_data = dict( [ ( da.name, da.dataset ) for da in
job.input_datasets ] )
inp_data.update( [ ( da.name, da.dataset ) for da in
job.input_library_datasets ] )
input1_file = inp_data[ "input1" ].file_name
input1_size = os.path.getsize( input1_file )
This whole concept works for a couple of small tests on my local
machine, but there are certain aspects of the job runner code that makes
me feel there may be corner cases I am not seeing where this approach
may not work - so your millage may vary.
-John
------------------------------------------------
John Chilton
Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
E-Mail: chilton(a)msi.umn.edu
10 years, 2 months
bug: unsorted bam files and import into data library
by Florian Wagner
Hi,
when a tool outputs an unsorted bam file, the indexing fails (quietly)
and its metadata variable "bam_index" points to an inexistent file. This
causes a nasty bug when trying to import the dataset into a data library
and actually makes the library unusable unless you delete the broken
entry from the - in my case Postgresql - database by hand. Are you
working on it?
Thanks, Florian
10 years, 3 months
How to allow anonymous users to run workflows?
by Tim te Beek
Hi all,
Was wondering how I can allow anonymous users to run workflows in my
local Galaxy instance, as currently users need to be logged in to run
workflows. I'd like drop this requirement in light of the intended
publication of a workflow in a journal which demands that "Web
services must not require mandatory registration by the user.". Could
any you tell me how I can accomplish this?
I've seen the option to use an external authentication method which
could be employed to artificially 'login' anonymous users for a single
session, but it appears this would also disable the normal users
administration mechanisms in Galaxy, so I'm not sure this would be a
good fit. Any hints on how to proceed, either via this route or
otherwise, would be much appreciated.
Best regards,
Tim
10 years, 3 months
MergeSamFiles.jar and TMPDIR
by Glen Beane
We recently updated to the latest galaxy-dist, and learned that the sam_merge.xml tool now uses picard MergeSamFiles.jar to merge the files instead of the samtools merge wrapper sam_merge.py.
this is a problem for us because MergeSamFiles.jar does not honor $TMPDIR when creating temporary file names (the jvm developers inexplicably hard code the value of java.io.tmpdir to /tmp in Unix/Linux rather than doing the Right Thing) . On our cluster, TMPDIR is set to something like /scratch/batch_job_id/. This location has plenty of free space, however /tmp does not and now we can't successfully merge largeish bam files.
In case anyone else is bit by this, I think there are two options
the Picard tools take an optional TMP_DIR= argument that lets us specify the location we want to use for a temporary directory. Initially we ended up modifying the .xml to add TMP_DIR=\$TMPDIR to the arguments to MergeSamFiles.jar. This works, but we could potentially need to do this with multiple Picard tools and not just MergeSamFiles. Now I am probably going to go with the following solution:
add something like "export _JAVA_OPTIONS=-Djava.io.tmpdir=$TMPDIR" to the .bashrc file for my Galaxy user.
--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153
10 years, 3 months
Galaxy installation with mysql database
by Alex R Bigelow
Hi,
I am trying to install a local instance of Galaxy with an infobright mysql server; I created a database called "galaxy" (the user is also "galaxy," and it has all the privileges it should need), and the database_connection line is as follows:
database_connection = mysql://galaxy:***********@localhost/galaxy?unix_socket=/tmp/mysql-ib.sock
When I do this, I get the following error:
Traceback (most recent call last):
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory
app = UniverseApplication( global_conf = global_conf, **kwargs )
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/app.py", line 32, in __init__
create_or_verify_database( db_url, kwargs.get( 'global_conf', {} ).get( '__file__', None ), self.config.database_engine_options )
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/model/migrate/check.py", line 65, in create_or_verify_database
db_schema = schema.ControlledSchema( engine, migrate_repository )
File "/gen21/alex/Apps/galaxy-dist/eggs/sqlalchemy_migrate-0.5.4-py2.7.egg/migrate/versioning/schema.py", line 24, in __init__
self._load()
File "/gen21/alex/Apps/galaxy-dist/eggs/sqlalchemy_migrate-0.5.4-py2.7.egg/migrate/versioning/schema.py", line 36, in _load
self.table = Table(tname, self.meta, autoload=True)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/schema.py", line 108, in __call__
return type.__call__(self, name, metadata, *args, **kwargs)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/schema.py", line 236, in __init__
_bind_or_error(metadata).reflecttable(self, include_columns=include_columns)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py", line 1265, in reflecttable
self.dialect.reflecttable(conn, table, include_columns)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/databases/mysql.py", line 1664, in reflecttable
sql = self._show_create_table(connection, table, charset)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/databases/mysql.py", line 1835, in _show_create_table
raise exc.NoSuchTableError(full_name)
NoSuchTableError: migrate_version
I found this question in the archives:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-March/002216.html
As per both replies, I tried a virtual_env, which didn't work, and I also deleted the "galaxy" database so that it would create a fresh one, but, of course, now it can't connect to the database because it doesn't exist. How do I tell Galaxy to create the database it needs?
Thanks again for all the support,
Alex Bigelow
10 years, 3 months
Re: [galaxy-dev] Tool shed and datatypes
by Jim Johnson
Greg,
It would be great if there were a way to expand upon the core datatypes using the ToolShed.
Would it be possible to have a separate datatype repository within the ToolShed?
Datatype
name=""
description=""
datatype_dependencies=[]
definition=<python code>
The tool config could be expanded to have requirement for datatypes.
<requirement type="datatype">ssmap</requirement>
Table datatype
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('datatype_id_seq'::regclass)
name | character varying(255) |
version | character varying(40) |
description | text |
definition | text |
UNIQUE (name)
Table datatype_datatype_association
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('datatype_id_seq'::regclass)
datatype_id | integer |
requires_id | integer |
FOREIGN KEY (datatype_id) REFERENCES datatype(id)
FOREIGN KEY (requires_id) REFERENCES datatype(id)
Then for my mothur metagenomics tools I could define:
name="ssmap" description="Secondary Structure Map" version="1.0" datatype_dependencies=[tabular]
definition=
from galaxy.datatypes.tabular import Tabular
class SecondaryStructureMap(Tabular):
file_ext = 'ssmap'
def __init__(self, **kwd):
"""Initialize secondary structure map datatype"""
Tabular.__init__( self, **kwd )
self.column_names = ['Map']
def sniff( self, filename ):
"""
Determines whether the file is a secondary structure map format
A single column with an integer value which indicates the row that this row maps to.
check you make sure is structMap[10] = 380 then structMap[380] = 10.
"""
...
Then the align.check.xml tool_config could require the 'ssmap' datatype:
<tool id="mothur_align_check" name="Align.check" version="1.19.0">
<description>Calculate the number of potentially misaligned bases</description>
<requirements>
<requirement type="binary">mothur</requirement>
<requirement type="datatype">ssmap</requirement>
</requirements>
> John,
>
> I've been following this message thread, and it seems it's gone in a direction that differs from your initial question about the possibility for Galaxy to handle automatic editing of the datatypes_conf.xml file when certain Galaxy tool shed tools are automatically installed. There are some complexities to consider in attempting this. One of the issues to consider is that the work for adding support for a new datatype to Galaxy lies outside of the intended function of the tool shed. If new support is added to the Galaxy code base, an entry for that new datatype should be manually added to the table at the same time. There may be benefits to enabling automatic changes to datatype entries that already exist in the file (e.g., adding a new converter for an existing datatype entry), but perhaps adding a completely new datatype to the file may not be appropriate. I'll continue to think about this - send additional thought and feedback, as doing so is always helpful
>
> Thanks!
>
> Greg
>
>
> On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:
>
>> One of the things we’re facing is the sheer size of a whole human genome at 30x coverage. An effective way to deal with that is by compressing the FASTQ files. That works for BWA and our ELAND, which can directly read a compressed FASTQ, but other tools crash when reading compressed FASTQ filesfiles. One way to address that would be to introduce a new type, for example “CompressedFastQ”, with a conversion to FASTQ defined. BWA could take both types as input. This would allow the best of both worlds – efficient storage and use by all existing tools.
>>
>> Another example would be adding the CASAVA tools to Galaxy. Some of the statistics generation tools use custom file formats. To be able to make the use of those tools optional and configurable, they should be separate from the aligner, but that would require that Galaxy be made aware of the custom file formats – we’d have to add a datatype.
>>
>> John Duddy
>> Sr. Staff Software Engineer
>> Illumina, Inc.
>> 9885 Towne Centre Drive
>> San Diego, CA 92121
>> Tel: 858-736-3584
>> E-mail: jduddy at illumina.com
>>
>> From: Greg Von Kuster [mailto:greg at bx.psu.edu]
>> Sent: Wednesday, October 05, 2011 6:25 PM
>> To: Duddy, John
>> Cc: galaxy-dev at lists.bx.psu.edu
>> Subject: Re: [galaxy-dev] Tool shed and datatypes
>>
>> Hello John,
>>
>> The Galaxy tool shed currently is not enabled to automatically edit the datatypes_conf.xml file, although I could add this feature if the need exists. Can you elaborate on what you are looking to do regarding this?
>>
>> Thanks!
>>
>>
>> On Oct 5, 2011, at 1:52 PM, Duddy, John wrote:
>>
>>
>> Can we introduce new file types via tools in the tool shed? It seems Galaxy can load them if they are in the datatypes configuration file. Does tool installation automate the editing of that file?
>>
>>
>> John Duddy
>> Sr. Staff Software Engineer
>> Illumina, Inc.
>> 9885 Towne Centre Drive
>> San Diego, CA 92121
>> Tel: 858-736-3584
>> E-mail: jduddy at illumina.com
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client. To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>> http://lists.bx.psu.edu/
>>
>> Greg Von Kuster
>> Galaxy Development Team
>> greg at bx.psu.edu
>>
10 years, 4 months
Version String
by SHAUN WEBB
Hi. I am using the tag:
<version_string>path to tool --version</version_string>
In my xml file. I would expect the tool version to be printed in the
information window for each dataset but the version field is blank. I
am running the latest Galaxy dist, is there anything else I need to do?
I have tried using <version_command> also.
Thanks
Shaun
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
10 years, 5 months