Re: [galaxy-dev] Exporting histories fails: no space left on device

29 Mar 2013

      I can confirm that the proxy settings are the reason for the failing 
export. When I go to localhost:8080 directly, I can export large files 
from the Data Library.

When going via the proxy using the URL, download of large files does not 
work. Here is a hint  on what the solution might be 
(http://serverfault.com/questions/185894/proxy-error-502-reason-error-reading...)

*** The error in the browser:

  Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request 
/POST /library_common/act_on_multiple_datasets 
<http://galaxy.bits.vib.be/library_common/act_on_multiple_datasets>/.

Reason: *Error reading from remote server*

*** The error in the http logs:

[Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] (70007)The 
timeout specified has expired: proxy: error reading status line from 
remote server localhost, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1
[Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] proxy: Error 
reading from remote server returned by 
/library_common/act_on_multiple_datasets, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1

*** Our proxy settings
I would really appreciate if somebody could have a look at our current 
Apache proxy settings. Since I suspect the problem to be a time-out, I 
have tried modifying related parameters, with no luck.

=======
[root@galaxy conf.d]# cat galaxy_web.conf
NameVirtualHost 157.193.230.103:80
<VirtualHost 157.193.230.103:80>
     ServerName galaxy.bits.vib.be
     SetEnv force-proxy-request-1.0 1    # tried this, does not help
     SetEnv proxy-nokeepalive 1             # tried this, does not help
     KeepAliveTimeout 600                     # tried this, does not help
     ProxyPass /library_common/act_on_multiple_datasets 
http://galaxy.bits.vib.be/library_common /act_on_multiple_datasets max=6 
keepalive=On timeout=600 retry=10  #tried this, does not help.
     <Proxy balancer://galaxy>
         BalancerMember http://localhost:8080
         BalancerMember http://localhost:8081
         BalancerMember http://localhost:8082
         BalancerMember http://localhost:8083
         BalancerMember http://localhost:8084
         BalancerMember http://localhost:8085
         BalancerMember http://localhost:8086
         BalancerMember http://localhost:8087
         BalancerMember http://localhost:8088
         BalancerMember http://localhost:8089
         BalancerMember http://localhost:8090
         BalancerMember http://localhost:8091
         BalancerMember http://localhost:8092
     </Proxy>
     RewriteEngine on
     RewriteLog "/tmp/apacheGalaxy.log"
# <Location "/">
#     AuthType Basic
#     AuthBasicProvider ldap
#     AuthLDAPURL "ldap://smeagol.vib.be:389/DC=vib,DC=local?sAMAccountName
#     AuthLDAPBindDN vib\administrator
#     AuthLDAPBindPassword <tofillin>
#     AuthzLDAPAuthoritative off
#     Require valid-user
#     # Set the REMOTE_USER header to the contents of the LDAP query 
response's "uid" attribute
#     RequestHeader set REMOTE_USER %{AUTHENTICATE_sAMAccountName}
# </Location>
     RewriteRule ^/static/style/(.*) 
/home/galaxy/galaxy-dist/static/june_2007_style/blue/$1 [L]
     RewriteRule ^/static/scripts/(.*) 
/home/galaxy/galaxy-dist/static/scripts/packed/$1 [L]
     RewriteRule ^/static/(.*) /home/galaxy/galaxy-dist/static/$1 [L]
     RewriteRule ^/favicon.ico 
/home/galaxy/galaxy-dist/static/favicon.ico [L]
     RewriteRule ^/robots.txt /home/galaxy/galaxy-dist/static/robots.txt [L]
     RewriteRule ^(.*) balancer://galaxy$1 [P]
</VirtualHost>
======

Thanks,

Joachim

Joachim Jacob

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 03/28/2013 03:21 PM, Joachim Jacob | VIB | wrote:
...
OK, it seems to be a proxy error.
When the proxy does not receive data from the server, it times out, 
and closes the connection.
I think the process that packs the datasets takes too long, so the 
connection is closed before the packaging is finished? Just a gues...
From the httpd logs:
=====
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] (70007)The 
timeout specified has expired: proxy: error reading status line from 
remote server localhost, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1
[Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] proxy: Error 
reading from remote server returned by 
/library_common/act_on_multiple_datasets, referer: 
http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1
=====
See if changing time out settings fixes this issue.
Cheers,
Joachim
Joachim Jacob
Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib
On 03/28/2013 02:58 PM, Joachim Jacob | VIB | wrote:
...
Hi Assaf,
After all, the problem appears not to be total size of the history, 
but the size of the individual datasets.
Now, histories which contain big datasets (>1GB) imported from Data 
Libraries causes the exporting process to crash. Can somebody confirm 
if this is a bug? I uploaded the datasets to a directory, which are 
then imported from that directory into a Data Library.
Downloading data sets >1GB from a data library directly (as tar.gz) 
also crashes.
Note: I have re-enabled abrt, but waiting for some jobs to be 
finished to restart.
Cheers,
Joachim.
Joachim Jacob
Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib
On Tue 26 Mar 2013 03:45:43 PM CET, Assaf Gordon wrote:
...
Hello Joachim,
Joachim Jacob | VIB | wrote, On 03/26/2013 10:01 AM:
...
abrt was filling the root directory indeed. So disabled it.
I have done some exporting tests, and the behaviour is not consistent.
1. *size*: in general, it worked out for smaller datasets, and 
usually crashed on bigger ones (starting from 3 GB). So size is key?
2. But now I have found several histories of 4.5GB that I was able 
to export... So far for the size hypothesis.
Another observation: when the export crashes, the corresponding 
webhandler process dies.
A crashing python process crosses the fine boundary between the 
Galaxy code and Python internals... perhaps the Galaxy developers 
can help with this problem.
It would be helpful to find a reproducible case with a specific 
history or a specific sequence of events, then someone can help you 
with the debugging.
Once you find a history that causes a crash (every time or 
sometimes, but in a reproducible way), try to pinpoint when exactly 
it happens:
Is it when you start preparing the export (and "export_history.py" 
is running as a job), or when you start downloading the exported file.
(I'm a bit behind on the export mechanism, so perhaps there are 
other steps involved?).
Couple of things to try:
1. set "cleanup_job=never" in your universe_wsgi.ini - this will 
keep the temporary files, and will help you re-produce jobs later.
2. Enable "abrt" again - it is not the problem (just the symptom).
You can cleanup the "/var/spool/abrt/XXX" directory from previous 
crash logs, then reproduce a new crash, and look at the collected 
files (assuming you have enough space to store at least one crash).
In particular, look at the file called "coredump" - it will tell you 
which script has crashed.
Try running:
     $ file /var/spool/abrt/XXXX/coredump
     coredump ELF 64-bit LSB core file x86-64, version 1 (SYSV), 
SVR4-style, from 'python XXXXXX.py'
Instead of "XXXX.py" it would show the python script that crashed 
(hopefully with full command-line parameters).
It won't show which python statement caused the crash, but it will 
point in the right direction.
...
So now I suspect something to be wrong with the datasets, but I am 
not able to trace something meaningful in the logs.  I am not 
confident in turning on logging in Python yet, but apparently this 
happens with the module "logging" initiated like logging.getLogger( 
__name__ ).
It could be a bad dataset (file on disk), or a problem in the 
database, or something completely different (a bug in the python 
archive module).
No point guessing until there are more details.
-gordon