I can confirm that the proxy settings are the reason for the failing export. When I go to localhost:8080 directly, I can export large files from the Data Library. When going via the proxy using the URL, download of large files does not work. Here is a hint on what the solution might be (http://serverfault.com/questions/185894/proxy-error-502-reason-error-reading...) *** The error in the browser: Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request /POST /library_common/act_on_multiple_datasets <http://galaxy.bits.vib.be/library_common/act_on_multiple_datasets>/. Reason: *Error reading from remote server* *** The error in the http logs: [Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] (70007)The timeout specified has expired: proxy: error reading status line from remote server localhost, referer: http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1 [Fri Mar 29 10:22:03 2013] [error] [client 157.193.10.20] proxy: Error reading from remote server returned by /library_common/act_on_multiple_datasets, referer: http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1 *** Our proxy settings I would really appreciate if somebody could have a look at our current Apache proxy settings. Since I suspect the problem to be a time-out, I have tried modifying related parameters, with no luck. ======= [root@galaxy conf.d]# cat galaxy_web.conf NameVirtualHost 157.193.230.103:80 <VirtualHost 157.193.230.103:80> ServerName galaxy.bits.vib.be SetEnv force-proxy-request-1.0 1 # tried this, does not help SetEnv proxy-nokeepalive 1 # tried this, does not help KeepAliveTimeout 600 # tried this, does not help ProxyPass /library_common/act_on_multiple_datasets http://galaxy.bits.vib.be/library_common /act_on_multiple_datasets max=6 keepalive=On timeout=600 retry=10 #tried this, does not help. <Proxy balancer://galaxy> BalancerMember http://localhost:8080 BalancerMember http://localhost:8081 BalancerMember http://localhost:8082 BalancerMember http://localhost:8083 BalancerMember http://localhost:8084 BalancerMember http://localhost:8085 BalancerMember http://localhost:8086 BalancerMember http://localhost:8087 BalancerMember http://localhost:8088 BalancerMember http://localhost:8089 BalancerMember http://localhost:8090 BalancerMember http://localhost:8091 BalancerMember http://localhost:8092 </Proxy> RewriteEngine on RewriteLog "/tmp/apacheGalaxy.log" # <Location "/"> # AuthType Basic # AuthBasicProvider ldap # AuthLDAPURL "ldap://smeagol.vib.be:389/DC=vib,DC=local?sAMAccountName # AuthLDAPBindDN vib\administrator # AuthLDAPBindPassword <tofillin> # AuthzLDAPAuthoritative off # Require valid-user # # Set the REMOTE_USER header to the contents of the LDAP query response's "uid" attribute # RequestHeader set REMOTE_USER %{AUTHENTICATE_sAMAccountName} # </Location> RewriteRule ^/static/style/(.*) /home/galaxy/galaxy-dist/static/june_2007_style/blue/$1 [L] RewriteRule ^/static/scripts/(.*) /home/galaxy/galaxy-dist/static/scripts/packed/$1 [L] RewriteRule ^/static/(.*) /home/galaxy/galaxy-dist/static/$1 [L] RewriteRule ^/favicon.ico /home/galaxy/galaxy-dist/static/favicon.ico [L] RewriteRule ^/robots.txt /home/galaxy/galaxy-dist/static/robots.txt [L] RewriteRule ^(.*) balancer://galaxy$1 [P] </VirtualHost> ====== Thanks, Joachim Joachim Jacob Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib On 03/28/2013 03:21 PM, Joachim Jacob | VIB | wrote:
OK, it seems to be a proxy error.
When the proxy does not receive data from the server, it times out, and closes the connection. I think the process that packs the datasets takes too long, so the connection is closed before the packaging is finished? Just a gues...
From the httpd logs: ===== [Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] (70007)The timeout specified has expired: proxy: error reading status line from remote server localhost, referer: http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1 [Thu Mar 28 15:14:46 2013] [error] [client 157.193.10.52] proxy: Error reading from remote server returned by /library_common/act_on_multiple_datasets, referer: http://galaxy.bits.vib.be/library_common/browse_library?sort=name&f-description=All&f-name=All&id=142184b92db50a63&cntrller=library&async=false&show_item_checkboxes=false&operation=browse&page=1 =====
See if changing time out settings fixes this issue.
Cheers, Joachim
Joachim Jacob
Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib
On 03/28/2013 02:58 PM, Joachim Jacob | VIB | wrote:
Hi Assaf,
After all, the problem appears not to be total size of the history, but the size of the individual datasets.
Now, histories which contain big datasets (>1GB) imported from Data Libraries causes the exporting process to crash. Can somebody confirm if this is a bug? I uploaded the datasets to a directory, which are then imported from that directory into a Data Library.
Downloading data sets >1GB from a data library directly (as tar.gz) also crashes.
Note: I have re-enabled abrt, but waiting for some jobs to be finished to restart.
Cheers, Joachim.
Joachim Jacob
Rijvisschestraat 120, 9052 Zwijnaarde Tel: +32 9 244.66.34 Bioinformatics Training and Services (BITS) http://www.bits.vib.be @bitsatvib
On Tue 26 Mar 2013 03:45:43 PM CET, Assaf Gordon wrote:
Hello Joachim,
Joachim Jacob | VIB | wrote, On 03/26/2013 10:01 AM:
abrt was filling the root directory indeed. So disabled it.
I have done some exporting tests, and the behaviour is not consistent.
1. *size*: in general, it worked out for smaller datasets, and usually crashed on bigger ones (starting from 3 GB). So size is key? 2. But now I have found several histories of 4.5GB that I was able to export... So far for the size hypothesis.
Another observation: when the export crashes, the corresponding webhandler process dies.
A crashing python process crosses the fine boundary between the Galaxy code and Python internals... perhaps the Galaxy developers can help with this problem.
It would be helpful to find a reproducible case with a specific history or a specific sequence of events, then someone can help you with the debugging.
Once you find a history that causes a crash (every time or sometimes, but in a reproducible way), try to pinpoint when exactly it happens: Is it when you start preparing the export (and "export_history.py" is running as a job), or when you start downloading the exported file. (I'm a bit behind on the export mechanism, so perhaps there are other steps involved?).
Couple of things to try:
1. set "cleanup_job=never" in your universe_wsgi.ini - this will keep the temporary files, and will help you re-produce jobs later.
2. Enable "abrt" again - it is not the problem (just the symptom). You can cleanup the "/var/spool/abrt/XXX" directory from previous crash logs, then reproduce a new crash, and look at the collected files (assuming you have enough space to store at least one crash). In particular, look at the file called "coredump" - it will tell you which script has crashed. Try running: $ file /var/spool/abrt/XXXX/coredump coredump ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'python XXXXXX.py'
Instead of "XXXX.py" it would show the python script that crashed (hopefully with full command-line parameters).
It won't show which python statement caused the crash, but it will point in the right direction.
So now I suspect something to be wrong with the datasets, but I am not able to trace something meaningful in the logs. I am not confident in turning on logging in Python yet, but apparently this happens with the module "logging" initiated like logging.getLogger( __name__ ).
It could be a bad dataset (file on disk), or a problem in the database, or something completely different (a bug in the python archive module). No point guessing until there are more details.
-gordon