Problems with DataImport
by Ted Goldstein
Hi there,
Here are three interrelated issues.
I am trying to use Galaxy with some large cancer genomic datasets here at UCSC and do some systems biology. I have petabyte size dataset data libraries which will constantly be in flux at the edges. I would prefer to just have the Galaxy read the metadata from the file system for large datasets without using the database. Is there a convenient api boundary to write an adapter to the dataset object interface?
In the meantime, I am going to try to just import day using the link. Its great that this feature is in already When I import into a couple of a modest megabyte size dataset using "Link to files without copying to Galaxy" option, the status never changes from "queued". Is this a bug? Is there a known work around? I have many large datasets.
Also, it takes a long time to expand the dataset name link. (My experiment on import is a data tree of about a thousand files). Is this a known bug?
Thanks!
Ted
8 years, 11 months
Re: [galaxy-dev] Error viewing BAM files in IGV
by Jim Johnson
Hi,
I'm seeing the same behavior. Galaxy is returning a web page rather than the requested .bai index file for the BAM file.
In class WebApplication ( lib/galaxy/web/framework/base.py ) in __call__( self, environ, start_response ) line 133
# Setup the transaction
trans = self.transaction_factory( environ )
Gets routed to the root controller and returns the galaxy server main page html.
JJ
> On Dec 14, 2011, at 4:22 PM, Alexander Graf wrote:
>
>> Hi nate,
>> I have tried it with several BAM files, resulting in the same error.
>> If I download the BAM and bai files from Galaxy and load it into IGV manually, everything works like charm.
>> Up to now I could not figure out why it is not working.
>> Could I have better success switching to the nginx-server?
> Hi Alex,
>
> It should work with Apache as well. I don't have an environment set up here to test, but could you take a look at the Apache access and error logs to determine whether the file is being found and read properly? It's possible that the request is returning something other than a 200 code and the file data.
>
> --nate
>
>> Alex
>>
>> Am 12.12.2011 um 16:11 schrieb Nate Coraor:
>>
>>> On Dec 9, 2011, at 6:45 AM, Alexander Graf wrote:
>>>
>>>> Hello,
>>>> I have recently updated our Galaxy dist and I'm running into problems viewing BAM files in IGV (v2.0.22), saying: Invalid GZIP header.
>>> Hi Alex,
>>>
>>> Your config below looks okay at first glance. Can you verify that the file in question is a valid BAM? Or is this happening with all BAMs?
>>>
>>> --nate
>>>
>>>> I have configured the Apache as explained in the Wiki using this http.conf:
>>>>
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------------------
>>>> <VirtualHost *:80>
>>>> ServerName 127.0.0.1
>>>> RewriteEngine on
>>>>
>>>> RewriteRule ^/galaxy$ /galaxy/ [R]
>>>> RewriteRule ^/galaxy/static/style/(.*)/opt/galaxy/static/june_2007_style/blue/$1 [L]
>>>> RewriteRule ^/galaxy/static/scripts/(.*)/opt/galaxy/static/scripts/packed/$1 [L]
>>>> RewriteRule ^/galaxy/static/(.*)/opt/galaxy/static/$1 [L]
>>>> RewriteRule ^/galaxy/favicon.ico /opt/galaxy/static/favicon.ico [L]
>>>> RewriteRule ^/galaxy/robots.txt /opt/galaxy/static/robots.txt [L]
>>>> RewriteRule ^/galaxy(.*) http://localhost:8081$1 [P]
>>>>
>>>> <Proxyhttp://localhost:8081>
>>>> Order deny,allow
>>>> Allow from all
>>>> </Proxy>
>>>>
>>>> <Location "/galaxy">
>>>> # Define the authentication method
>>>> XSendFile on
>>>> XSendFilePath /
>>>> # Compress all uncompressed content.http.conf
>>>> SetOutputFilter DEFLATE
>>>> SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
>>>> SetEnvIfNoCase Request_URI \.(?:t?gz|zip|bz2)$ no-gzip dont-vary
>>>> </Location>
>>>> <Directory "/galaxy/static">
>>>> ExpiresActive On
>>>> ExpiresDefault "access plus 6 hours"
>>>> </Directory>
>>>> </VirtualHost>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> By having this parts changed in universe_wsgi.ini:
>>>> ----------------------------------------------------------------------------------------------------------------------------------------
>>>> [server:main]
>>>> use = egg:Paste#http
>>>> port = 8081
>>>> host = 0.0.0.0
>>>> use_threadpool = True
>>>>
>>>> [filter:gzip]
>>>> use = egg:Paste#gzip
>>>>
>>>> [filter:proxy-prefix]
>>>> use = egg:PasteDeploy#prefix
>>>> prefix = /galaxy
>>>>
>>>> [app:main]
>>>> paste.app_factory = galaxy.web.buildapp:app_factory
>>>> filter-with = proxy-prefix
>>>> cookie_path = /galaxy
>>>> apache_xsendfile = True
>>>> upstream_gzip = False
>>>> ----------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> The resulting Galaxy error log is:
>>>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> galaxy.web.framework DEBUG 2011-12-09 12:32:08,825 Error: this request returned None from get_history():http://10.153.182.203/galaxy/root
>>>> 10.163.241.110 - - [09/Dec/2011:12:32:08 +0200] "GET /galaxy/root?app_action=data&user_id=c9a3f3a19e75965d&app_name=igv_bam&link_name=local_default&action_param=galaxy_9b0f702d0207cd78.bam.bai&dataset_id=9b0f702d0207cd78 HTTP/1.1" 200 - "-" "IGV Version 2.0.22 (1360)11/29/2011 02:24 PM Java/1.6.0_22"
>>>> 10.163.241.110 - - [09/Dec/2011:12:32:08 +0200] "HEAD /galaxy/display_application/9b0f702d0207cd78/igv_bam/local_default/c9a3f3a19e75965d/data/galaxy_9b0f702d0207cd78.bam HTTP/1.1" 302 - "-" "IGV Version 2.0.22 (1360)11/29/2011 02:24 PM Java/1.6.0_22"
>>>> galaxy.web.framework DEBUG 2011-12-09 12:32:08,915 Error: this request returned None from get_history():http://10.153.182.203/galaxy/root
>>>> 10.163.241.110 - - [09/Dec/2011:12:32:08 +0200] "HEAD /galaxy/root?app_action=data&user_id=c9a3f3a19e75965d&app_name=igv_bam&link_name=local_default&action_param=galaxy_9b0f702d0207cd78.bam&dataset_id=9b0f702d0207cd78 HTTP/1.1" 200 - "-" "IGV Version 2.0.22 (1360)11/29/2011 02:24 PM Java/1.6.0_22"
>>>> ----------------------------------------
>>>> Exception happened during processing of request from ('127.0.0.1', 52683)
>>>> Traceback (most recent call last):
>>>> File "/opt/galaxy/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1053, in process_request_in_thread
>>>> self.finish_request(request, client_address)
>>>> File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
>>>> self.RequestHandlerClass(request, client_address, self)
>>>> File "/usr/lib/python2.6/SocketServer.py", line 618, in __init__
>>>> self.finish()
>>>> File "/usr/lib/python2.6/SocketServer.py", line 661, in finish
>>>> self.wfile.flush()
>>>> File "/usr/lib/python2.6/socket.py", line 297, in flush
>>>> self._sock.sendall(buffer(data, write_offset, buffer_size))
>>>> error: [Errno 32] Broken pipe
>>>> ----------------------------------------
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> Thanks in advance for your help
>>>>
>>>> Alex
>>>>
>>>> ___________________________________________________________
>>>> Please keep all replies on the list by using "reply all"
>>>> in your mail client. To manage your subscriptions to this
>>>> and other Galaxy lists, please use the interface at:
>>>>
>>>> http://lists.bx.psu.edu/
8 years, 11 months
Customizing the history panel
by Lukasse, Pieter
Hi,
I'm looking for ways to customize the history panel that gets added to the workflow history after a step has finished.
Below is the standard set of buttons/icons that we normally get:
[cid:image001.png@01CCA9D3.2C2EF410]
What I would like to do is to either add some extra buttons to it OR extend the functionality of such a button. How can this be done?
I only found the page below mentioning there are some extra optional buttons, but it does not tell me how to enable/disable the buttons:
http://wiki.g2.bx.psu.edu/Learn/Managing%20Datasets
Thanks and regards,
Pieter Lukasse.
8 years, 11 months
dadaset (file) visualization
by Alfonso Núñez Salgado
Hi,
I'm new in galaxy, but i've made several installations in different
distributions. At his moment I'm using RHEL 6.1 and I'm not able to
visualize certain files. After uploading a SAM file I click on the file
title and a preview of few lines appears whith the rest of the available
options. When i click on the "eye" button nothing happen. If i try to
download it, i do receive an empty file.
This behaviour occurs with other file extension (BAM, HTML), but not
whith others (FASTA, FASTAq)
Using wget:
wget
"http://10.0.0.2:8090/galaxy/datasets/5969b1f7201f12ae/display/?preview=True"
--2011-12-01 18:18:56--
http://10.0.0.2:8090/galaxy/datasets/5969b1f7201f12ae/display/?preview=True
Connecting to 10.0.0.2:8090... connected.
HTTP request sent, awaiting response... 10.0.0.2 - -
[01/Dec/2011:18:18:56 +0000] "GET
/galaxy/datasets/5969b1f7201f12ae/display/?preview=True HTTP/1.0" 200 -
"-" "Wget/1.12 (linux-gnu)"
200 OK
Length: unspecified [text/plain]
Saving to: `index.html?preview=True.3'
finaly index.html?preview=True.3 is an empty file, and obviously, no
errors is displayed with any of the debugging options.
I've done the same tests in other distributions and apparently every
thing seems right
Can any one of you get some light to this unexplainable black hole?
--
=====================================
Alfonso Núñez Salgado
Unidad de Bioinformática
Centro de Biologia Molecular Severo Ochoa
C/Nicolás Cabrera 1
Universidad Autónoma de Madrid
Cantoblanco, 28049 Madrid (Spain)
Phone: (34) 91-196-4633
Fax: (34) 91-196-4420
web: http://ub.cbm.uam.es/
=====================================
8 years, 11 months
How to allow anonymous users to run workflows?
by Tim te Beek
Hi all,
Was wondering how I can allow anonymous users to run workflows in my
local Galaxy instance, as currently users need to be logged in to run
workflows. I'd like drop this requirement in light of the intended
publication of a workflow in a journal which demands that "Web
services must not require mandatory registration by the user.". Could
any you tell me how I can accomplish this?
I've seen the option to use an external authentication method which
could be employed to artificially 'login' anonymous users for a single
session, but it appears this would also disable the normal users
administration mechanisms in Galaxy, so I'm not sure this would be a
good fit. Any hints on how to proceed, either via this route or
otherwise, would be much appreciated.
Best regards,
Tim
8 years, 11 months
MergeSamFiles.jar and TMPDIR
by Glen Beane
We recently updated to the latest galaxy-dist, and learned that the sam_merge.xml tool now uses picard MergeSamFiles.jar to merge the files instead of the samtools merge wrapper sam_merge.py.
this is a problem for us because MergeSamFiles.jar does not honor $TMPDIR when creating temporary file names (the jvm developers inexplicably hard code the value of java.io.tmpdir to /tmp in Unix/Linux rather than doing the Right Thing) . On our cluster, TMPDIR is set to something like /scratch/batch_job_id/. This location has plenty of free space, however /tmp does not and now we can't successfully merge largeish bam files.
In case anyone else is bit by this, I think there are two options
the Picard tools take an optional TMP_DIR= argument that lets us specify the location we want to use for a temporary directory. Initially we ended up modifying the .xml to add TMP_DIR=\$TMPDIR to the arguments to MergeSamFiles.jar. This works, but we could potentially need to do this with multiple Picard tools and not just MergeSamFiles. Now I am probably going to go with the following solution:
add something like "export _JAVA_OPTIONS=-Djava.io.tmpdir=$TMPDIR" to the .bashrc file for my Galaxy user.
--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153
8 years, 12 months
Galaxy installation with mysql database
by Alex R Bigelow
Hi,
I am trying to install a local instance of Galaxy with an infobright mysql server; I created a database called "galaxy" (the user is also "galaxy," and it has all the privileges it should need), and the database_connection line is as follows:
database_connection = mysql://galaxy:***********@localhost/galaxy?unix_socket=/tmp/mysql-ib.sock
When I do this, I get the following error:
Traceback (most recent call last):
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/web/buildapp.py", line 82, in app_factory
app = UniverseApplication( global_conf = global_conf, **kwargs )
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/app.py", line 32, in __init__
create_or_verify_database( db_url, kwargs.get( 'global_conf', {} ).get( '__file__', None ), self.config.database_engine_options )
File "/gen21/alex/Apps/galaxy-dist/lib/galaxy/model/migrate/check.py", line 65, in create_or_verify_database
db_schema = schema.ControlledSchema( engine, migrate_repository )
File "/gen21/alex/Apps/galaxy-dist/eggs/sqlalchemy_migrate-0.5.4-py2.7.egg/migrate/versioning/schema.py", line 24, in __init__
self._load()
File "/gen21/alex/Apps/galaxy-dist/eggs/sqlalchemy_migrate-0.5.4-py2.7.egg/migrate/versioning/schema.py", line 36, in _load
self.table = Table(tname, self.meta, autoload=True)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/schema.py", line 108, in __call__
return type.__call__(self, name, metadata, *args, **kwargs)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/schema.py", line 236, in __init__
_bind_or_error(metadata).reflecttable(self, include_columns=include_columns)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py", line 1265, in reflecttable
self.dialect.reflecttable(conn, table, include_columns)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/databases/mysql.py", line 1664, in reflecttable
sql = self._show_create_table(connection, table, charset)
File "/gen21/alex/Apps/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/databases/mysql.py", line 1835, in _show_create_table
raise exc.NoSuchTableError(full_name)
NoSuchTableError: migrate_version
I found this question in the archives:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-March/002216.html
As per both replies, I tried a virtual_env, which didn't work, and I also deleted the "galaxy" database so that it would create a fresh one, but, of course, now it can't connect to the database because it doesn't exist. How do I tell Galaxy to create the database it needs?
Thanks again for all the support,
Alex Bigelow
8 years, 12 months
Quota will not decrease with permanent delete
by Mary Anne Alliegro
Hi Galaxy Users,
I have permanently deleted numerous files.
My usage % has decreased, but this is NOT reflected in my Gb report
(upper right)- it remains the same.
Am I missing some phantom trash bin?
If not, will Galaxy recalculate my Gb usage so that I may proceed with
my project?
Thank you,
Mary Anne
9 years
rgenetics & galaxy
by Ivan Merelli
Hi,
I'm trying to use rgenetics in galaxy, but I have
some problem in installting rpy-1.0.3 with R-2.13.1.
I found this guide, which is about the same problem of mine:
http://gmod.827538.n3.nabble.com/Rpy-1-0-3-and-R-2-13-1-td3359590.html
but didn't work for me (although the part about the
modulefile is quite obscure for me). The problem seems
to be correlated to an header file (Rdevices.h) which
is no more present in the latest R distribution.
Any hints to solve this
problem? May I use rpy2?
Best,
Ivan
9 years
Re: [galaxy-dev] Tool shed and datatypes
by Jim Johnson
Greg,
It would be great if there were a way to expand upon the core datatypes using the ToolShed.
Would it be possible to have a separate datatype repository within the ToolShed?
Datatype
name=""
description=""
datatype_dependencies=[]
definition=<python code>
The tool config could be expanded to have requirement for datatypes.
<requirement type="datatype">ssmap</requirement>
Table datatype
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('datatype_id_seq'::regclass)
name | character varying(255) |
version | character varying(40) |
description | text |
definition | text |
UNIQUE (name)
Table datatype_datatype_association
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------
id | integer | not null default nextval('datatype_id_seq'::regclass)
datatype_id | integer |
requires_id | integer |
FOREIGN KEY (datatype_id) REFERENCES datatype(id)
FOREIGN KEY (requires_id) REFERENCES datatype(id)
Then for my mothur metagenomics tools I could define:
name="ssmap" description="Secondary Structure Map" version="1.0" datatype_dependencies=[tabular]
definition=
from galaxy.datatypes.tabular import Tabular
class SecondaryStructureMap(Tabular):
file_ext = 'ssmap'
def __init__(self, **kwd):
"""Initialize secondary structure map datatype"""
Tabular.__init__( self, **kwd )
self.column_names = ['Map']
def sniff( self, filename ):
"""
Determines whether the file is a secondary structure map format
A single column with an integer value which indicates the row that this row maps to.
check you make sure is structMap[10] = 380 then structMap[380] = 10.
"""
...
Then the align.check.xml tool_config could require the 'ssmap' datatype:
<tool id="mothur_align_check" name="Align.check" version="1.19.0">
<description>Calculate the number of potentially misaligned bases</description>
<requirements>
<requirement type="binary">mothur</requirement>
<requirement type="datatype">ssmap</requirement>
</requirements>
> John,
>
> I've been following this message thread, and it seems it's gone in a direction that differs from your initial question about the possibility for Galaxy to handle automatic editing of the datatypes_conf.xml file when certain Galaxy tool shed tools are automatically installed. There are some complexities to consider in attempting this. One of the issues to consider is that the work for adding support for a new datatype to Galaxy lies outside of the intended function of the tool shed. If new support is added to the Galaxy code base, an entry for that new datatype should be manually added to the table at the same time. There may be benefits to enabling automatic changes to datatype entries that already exist in the file (e.g., adding a new converter for an existing datatype entry), but perhaps adding a completely new datatype to the file may not be appropriate. I'll continue to think about this - send additional thought and feedback, as doing so is always helpful
>
> Thanks!
>
> Greg
>
>
> On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:
>
>> One of the things we’re facing is the sheer size of a whole human genome at 30x coverage. An effective way to deal with that is by compressing the FASTQ files. That works for BWA and our ELAND, which can directly read a compressed FASTQ, but other tools crash when reading compressed FASTQ filesfiles. One way to address that would be to introduce a new type, for example “CompressedFastQ”, with a conversion to FASTQ defined. BWA could take both types as input. This would allow the best of both worlds – efficient storage and use by all existing tools.
>>
>> Another example would be adding the CASAVA tools to Galaxy. Some of the statistics generation tools use custom file formats. To be able to make the use of those tools optional and configurable, they should be separate from the aligner, but that would require that Galaxy be made aware of the custom file formats – we’d have to add a datatype.
>>
>> John Duddy
>> Sr. Staff Software Engineer
>> Illumina, Inc.
>> 9885 Towne Centre Drive
>> San Diego, CA 92121
>> Tel: 858-736-3584
>> E-mail: jduddy at illumina.com
>>
>> From: Greg Von Kuster [mailto:greg at bx.psu.edu]
>> Sent: Wednesday, October 05, 2011 6:25 PM
>> To: Duddy, John
>> Cc: galaxy-dev at lists.bx.psu.edu
>> Subject: Re: [galaxy-dev] Tool shed and datatypes
>>
>> Hello John,
>>
>> The Galaxy tool shed currently is not enabled to automatically edit the datatypes_conf.xml file, although I could add this feature if the need exists. Can you elaborate on what you are looking to do regarding this?
>>
>> Thanks!
>>
>>
>> On Oct 5, 2011, at 1:52 PM, Duddy, John wrote:
>>
>>
>> Can we introduce new file types via tools in the tool shed? It seems Galaxy can load them if they are in the datatypes configuration file. Does tool installation automate the editing of that file?
>>
>>
>> John Duddy
>> Sr. Staff Software Engineer
>> Illumina, Inc.
>> 9885 Towne Centre Drive
>> San Diego, CA 92121
>> Tel: 858-736-3584
>> E-mail: jduddy at illumina.com
>>
>> ___________________________________________________________
>> Please keep all replies on the list by using "reply all"
>> in your mail client. To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>> http://lists.bx.psu.edu/
>>
>> Greg Von Kuster
>> Galaxy Development Team
>> greg at bx.psu.edu
>>
9 years