how to purge histories/datasets not accessed/updated for a certain time
by Chaolin Zhang
Hi,
We have a local mirror of the galaxy system and the disk is occupied really quickly. Is there a way to purge histories/datasets not accessed/updated for a certain period of time, no matter if they are deleted by the user? It looks like the current scripts for clean up only purges deleted histories/datasets.
Thanks!
Chaolin
10 years, 10 months
plotting/statistics tools suggestion
by Marshall Hampton
I've just started seriously looking at Galaxy, and I already have a
suggestion (everyone's a critic...): switch from using rpy or even
rpy2 to scipy/numpy/matplotlib for basic statistics and plots. I
wrestled a bit with getting rpy to work on my local setup and decided
it would be quicker to write my own plotting tool extensions, which
indeed turned out to be the case. Since you already require python it
seems much more natural to use python-native tools.
To give some positive feedback, learning to write an extension was
surprisingly easy and encourages me to work on more.
I use currently use Sage (http://www.sagemath.org/) to both analyze
next-generation sequence data (454 and Illumina) and create
interactive tools for the biologists I collaborate with. The Sage
project involves many of the same issues and challenges facing Galaxy.
Sage is based on python, but includes R. I realize that there are
many things you would want to do with R that aren't included in
scipy/biopython, so it might be worthwhile to look at how Sage wraps
R. Its far from perfect, but I prefer it to rpy2. In the Sage source
tree the interface is at: $SAGE_ROOT/devel/sage/sage/interfaces/r.py.
(Ugly online copy at:
http://hg.sagemath.org/sage-main/file/361a4ad7d52c/sage/interfaces/r.py).
-Marshall Hampton
Department of Mathematics and Statistics and the Integrated Biosciences Program
University of Minnesota Duluth
10 years, 10 months
suggestions for the SAM-to-BAM tool
by Assaf Gordon
Hi,
Couple of things that can be slightly improved in the SAM-to-BAM tool:
1. "Reference list" is not informative (it's the technical way to say: "list of chromosomes and their sizes based on a FASTA file"). Users do not generally know what "reference list" is.
2. The "Locally Cached" option is not informative (I had to look in the source code to understand what it means).
What it should say is something like: "Get list of chromosomes/sizes based on the dataset's organism/database" (could be shorter, but should be friendly enough).
3. There's no option of having the chromosome list in the SAM file header. Some SAM files will contain the header (can even be done in the standard bowtie tool wrapper) - saves the need to specify where to get the "reference list" from.
4. Autodetection in the "set-metadata" step will go a long way here: if the SAM file already have a header, then no need to even ask about it.
If it doesn't have a header but have a DBKEY, then we're still OK.
If no DBKEY and no header, then complain or ask for a FASTA file from current history.
(I realize the implementing this feature is hard and annoying, I don't imply that it's easy to do, just that it's needed).
5. Inside the python script (sam_to_bam.py) there's a comment that says: "for some reason the samtools view command gzips the resulting bam file without warning" .
Not sure why one cares about that, but "samtools view -u" will output an uncompressed BAM file.
6. samtools support piping, so a lot of I/O (and some time) can be spared by piping the two commands together:
samtools view -u -b -S "INPUT.SAM" | samtools sort - OUTPUT
Instead of running two commands and generating a temporary unsorted BAM file.
-gordon
10 years, 10 months
importing annotation with dataset
by Davide Cittaro
Hi, is it just me or when I try to import a dataset from a library to a history, annotation is not retained (whereas "info" is)?
d
/*
Davide Cittaro, PhD
Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy
tel.: +39(02)574303007
e-mail: davide.cittaro(a)ifom-ieo-campus.it
*/
10 years, 10 months
importing into history from library
by Glen Beane
We're running the latest galaxy-dist and noticed a problem importing files from libraries into our history.
If we click the down arrow to the right of a dataset and choose "Import this dataset into selected histories" it works fine, however if our datasets are in folders within the library and we select multiple datasets and then do "Import selected datasets into histories" nothing happens. If I go to a library with datasets that are not in a folder then selecting multiple works.
Has anyone else seen this?
--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153
10 years, 10 months
bwa failure preparing job
by Branden Timm
Hi All,
I'm having issues running BWA for Illumina with the latest version of
Galaxy (5433:c1aeb2f33b4a).
It seems that the error is a python list error while preparing the job:
Traceback (most recent call last):
File "/home/galaxy/galaxy-central/lib/galaxy/jobs/runners/local.py", line 58, in run_job
job_wrapper.prepare()
File "/home/galaxy/galaxy-central/lib/galaxy/jobs/__init__.py", line 371, in prepare
self.command_line = self.tool.build_command_line( param_dict )
File "/home/galaxy/galaxy-central/lib/galaxy/tools/__init__.py", line 1575, in build_command_line
command_line = fill_template( self.command, context=param_dict )
File "/home/galaxy/galaxy-central/lib/galaxy/util/template.py", line 9, in fill_template
return str( Template( source=template_text, searchList=[context] ) )
File "/home/galaxy/galaxy-central/eggs/Cheetah-2.2.2-py2.6-linux-x86_64-ucs4.egg/Cheetah/Template.py", line 1004, in __str__
return getattr(self, mainMethName)()
File "DynamicallyCompiledCheetahTemplate.py", line 106, in respond
IndexError: list index out of range
I checked the bwa_index.loc file for errors, it seems that the line for
the reference genome I'm trying to map against is correct (all
whitespace is tab characters):
synpcc7002 synpcc7002 Synechococcus
/home/galaxy/galaxy-central/bwa_
indices/SYNPCC7002
I'm not sure what the next troubleshooting step is, any ideas?
--
Branden Timm
btimm(a)glbrc.wisc.edu
10 years, 10 months
Error when adding datasets
by Louise-Amélie Schmitt
Hello everyone
I have an issue when trying to import new datasets or when putting a
dataset into a history. I saw Edward Kirton had the same problem but he
got no answer:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-May/002732.html
Here is the error message I get when clicking the "Add datasets" button
in a library, in the admin's "Manage data libraries" panel:
UnmappedInstanceError: Class '__builtin__.NoneType' is not mapped
URL:
http://manni/galaxy/library_common/upload_library_dataset?library_id=f2db...
File
'/g/funcgen/galaxy/eggs/WebError-0.8a-py2.6.egg/weberror/evalexception/middleware.py', line 364 in respond
app_iter = self.application(environ, detect_start_response)
File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/debug/prints.py',
line 98 in __call__
environ, self.app)
File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/wsgilib.py', line
539 in intercept_output
app_iter = application(environ, replacement_start_response)
File '/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/recursive.py',
line 80 in __call__
return self.application(environ, start_response)
File
'/g/funcgen/galaxy/lib/galaxy/web/framework/middleware/remoteuser.py',
line 109 in __call__
return self.app( environ, start_response )
File
'/g/funcgen/galaxy/eggs/Paste-1.6-py2.6.egg/paste/httpexceptions.py',
line 632 in __call__
return self.application(environ, start_response)
File '/g/funcgen/galaxy/lib/galaxy/web/framework/base.py', line 145 in
__call__
body = method( trans, **kwargs )
File '/g/funcgen/galaxy/lib/galaxy/web/controllers/library_common.py',
line 907 in upload_library_dataset
trans.sa_session.refresh( history )
File
'/g/funcgen/galaxy/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.6.egg/sqlalchemy/orm/scoping.py', line 127 in do
return getattr(self.registry(), name)(*args, **kwargs)
File
'/g/funcgen/galaxy/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.6.egg/sqlalchemy/orm/session.py', line 925 in refresh
raise exc.UnmappedInstanceError(instance)
UnmappedInstanceError: Class '__builtin__.NoneType' is not mapped
Now when does it occur:
I have two databases. One test database I created a month ago which
works fine, even now. The other one I created recently which is supposed
to be the final database. But the latter keeps triggering the above
message, even when I drop it and create it all over again. I even tried
to create a third one, all clean and new but the problem remains. I
tried to trash all the eggs so Galaxy gets fresh new eggs, with no
effect at all. The error's still there.
If you have any clue, I'll be forever grateful.
Cheers,
L-A
10 years, 11 months
Migration error: fields in MySQL
by John Eppley
I had an error upgrading my galaxy instance. I got the following exception while migrating the db (during step 64->65):
sqlalchemy.exc.ProgrammingError: (ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'fields FROM form_definition' at line 1") u'SELECT id, fields FROM form_definition' []
It seems my version (4.1.22-log) of MySQL did not like 'fields' as a column name. If I alias the formdefinition as f and us f.fields, the error goes away. I also had to modify migration 76 for the same reason.
Here is my diff of the migrations dir:
diff -r 50e249442c5a lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py
--- a/lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py Thu Apr 07 08:39:07 2011 -0400
+++ b/lib/galaxy/model/migrate/versions/0065_add_name_to_form_fields_and_values.py Fri Apr 15 11:09:26 2011 -0400
@@ -39,7 +39,7 @@
return ''
# Go through the entire table and add a 'name' attribute for each field
# in the list of fields for each form definition
- cmd = "SELECT id, fields FROM form_definition"
+ cmd = "SELECT f.id, f.fields FROM form_definition f"
result = db_session.execute( cmd )
for row in result:
form_definition_id = row[0]
@@ -53,7 +53,7 @@
field[ 'helptext' ] = field[ 'helptext' ].replace("'", "''").replace('"', "")
field[ 'label' ] = field[ 'label' ].replace("'", "''")
fields_json = to_json_string( fields_list )
- cmd = "UPDATE form_definition SET fields='%s' WHERE id=%i" %( fields_json, form_definition_id )
+ cmd = "UPDATE form_definition f SET f.fields='%s' WHERE f.id=%i" %( fields_json, form_definition_id )
db_session.execute( cmd )
# replace the values list in the content field of the form_values table with a name:value dict
cmd = "SELECT form_values.id, form_values.content, form_definition.fields" \
@@ -112,7 +112,7 @@
cmd = "UPDATE form_values SET content='%s' WHERE id=%i" %( to_json_string( values_list ), form_values_id )
db_session.execute( cmd )
# remove name attribute from the field column of the form_definition table
- cmd = "SELECT id, fields FROM form_definition"
+ cmd = "SELECT f.id, f.fields FROM form_definition f"
result = db_session.execute( cmd )
for row in result:
form_definition_id = row[0]
@@ -124,5 +124,5 @@
for index, field in enumerate( fields_list ):
if field.has_key( 'name' ):
del field[ 'name' ]
- cmd = "UPDATE form_definition SET fields='%s' WHERE id=%i" %( to_json_string( fields_list ), form_definition_id )
+ cmd = "UPDATE form_definition f SET f.fields='%s' WHERE id=%i" %( to_json_string( fields_list ), form_definition_id )
db_session.execute( cmd )
diff -r 50e249442c5a lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py
--- a/lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py Thu Apr 07 08:39:07 2011 -0400
+++ b/lib/galaxy/model/migrate/versions/0076_fix_form_values_data_corruption.py Fri Apr 15 11:09:26 2011 -0400
@@ -32,7 +32,7 @@
def upgrade():
print __doc__
metadata.reflect()
- cmd = "SELECT form_values.id as id, form_values.content as field_values, form_definition.fields as fields " \
+ cmd = "SELECT form_values.id as id, form_values.content as field_values, form_definition.fields as fdfields " \
+ " FROM form_definition, form_values " \
+ " WHERE form_values.form_definition_id=form_definition.id " \
+ " ORDER BY form_values.id"
@@ -46,7 +46,7 @@
except Exception, e:
corrupted_rows = corrupted_rows + 1
# content field is corrupted
- fields_list = from_json_string( _sniffnfix_pg9_hex( str( row['fields'] ) ) )
+ fields_list = from_json_string( _sniffnfix_pg9_hex( str( row['fdfields'] ) ) )
field_values_str = _sniffnfix_pg9_hex( str( row['field_values'] ) )
try:
#Encoding errors? Just to be safe.
-j
10 years, 11 months
Support for subdirs in dataset extra_files_path
by Jim Johnson
Request is issue#494 https://bitbucket.org/galaxy/galaxy-central/issue/494/support-sub-dirs-in...
I'm finding that some qiime metagenomics applications build HTML results with an inherent directory structure. For some other applications, e.g. FastQC, I've been able to flatten the hierarchy and edit the html, but that appears problematic for qiime.
Galaxy hasn't supported a dataset extra_files_path hierarchy, though the developers don't seem opposed to the idea: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-October/003605.html
I added a route in lib/galaxy/web/buildapp.py and modified the dataset download code to traverse a hierarchy in lib/galaxy/web/controllers/dataset.py
I don't think these add any security vulnerabilities, (I tried the obvious ../../ ).
$ hg diff lib/galaxy/web/buildapp.py
diff -r 6ae06d89fec7 lib/galaxy/web/buildapp.py
--- a/lib/galaxy/web/buildapp.py Wed Mar 16 09:01:57 2011 -0400
+++ b/lib/galaxy/web/buildapp.py Wed Mar 16 10:24:13 2011 -0500
@@ -94,6 +94,8 @@
webapp.add_route( '/async/:tool_id/:data_id/:data_secret', controller='async', action='index', tool_id=None, data_id=None, data_secret=None )
webapp.add_route( '/:controller/:action', action='index' )
webapp.add_route( '/:action', controller='root', action='index' )
+ # allow for subdirectories in extra_files_path
+ webapp.add_route( '/datasets/:dataset_id/display/{filename:.+?}', controller='dataset', action='display', dataset_id=None, filename=None)
webapp.add_route( '/datasets/:dataset_id/:action/:filename', controller='dataset', action='index', dataset_id=None, filename=None)
webapp.add_route( '/display_application/:dataset_id/:app_name/:link_name/:user_id/:app_action/:action_param', controller='dataset', action='display_application', dataset_id=None, user_id=None, app_name = None, link_name = None, app_action = None, action_param = None )
webapp.add_route( '/u/:username/d/:slug', controller='dataset', action='display_by_username_and_slug' )
$
$ hg diff lib/galaxy/web/controllers/dataset.py
diff -r 6ae06d89fec7 lib/galaxy/web/controllers/dataset.py
--- a/lib/galaxy/web/controllers/dataset.py Wed Mar 16 09:01:57 2011 -0400
+++ b/lib/galaxy/web/controllers/dataset.py Wed Mar 16 10:24:29 2011 -0500
@@ -266,17 +266,18 @@
log.exception( "Unable to add composite parent %s to temporary library download archive" % data.file_name)
msg = "Unable to create archive for download, please report this error"
messagetype = 'error'
- flist = glob.glob(os.path.join(efp,'*.*')) # glob returns full paths
- for fpath in flist:
- efp,fname = os.path.split(fpath)
- try:
- archive.add( fpath,fname )
- except IOError:
- error = True
- log.exception( "Unable to add %s to temporary library download archive" % fname)
- msg = "Unable to create archive for download, please report this error"
- messagetype = 'error'
- continue
+ for root, dirs, files in os.walk(efp):
+ for fname in files:
+ fpath = os.path.join(root,fname)
+ rpath = os.path.relpath(fpath,efp)
+ try:
+ archive.add( fpath,rpath )
+ except IOError:
+ error = True
+ log.exception( "Unable to add %s to temporary library download archive" % rpath)
+ msg = "Unable to create archive for download, please report this error"
+ messagetype = 'error'
+ continue
if not error:
if params.do_action == 'zip':
archive.close()
10 years, 11 months
Sharing tool definitions between XML files
by Peter
Hi all,
I'm wondering if there is any established way to share parameter
definitions between tool wrapper XML files?
For example, I am currently working on NCBI BLAST+ wrappers, and these
tools have a lot in common. For example, the output format option will
be a select parameter, and it will be the same for blastn, blastp,
tblastn, etc. I can just cut and paste the definition but this seems
inelegant and will be a long term maintenance burden if it ever needs
updating (lots of places would need to be updated the same way). I'd
like to have a shared XML snippet defining this parameter which I
could then reference/include from each tool wrapper than needs it.
I'm thinking of something like XML's <!ENTITY ...> might work. Is this possible?
Peter
10 years, 11 months