Adding data libraries from filesystem path creating duplicates
Hi All, I've noticed an issue a couple of times now where I've added a directory of fastq's from an NFS mounted filesystem (reference only rather than copying into galaxy) and then galaxy times out. Load average begins to get really high and then consumes all the RAM and sometimes crashes. These are the same symptom as I had before with this issue that was never resolved; http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553 What I've noticed is that in the dataset I'm uploading to galaxy, there are suddenly many duplicates. In this example that's just happened, there are 288 fastq.gz files in the physical folder, but galaxy has created 6 references to each file resulting in 1728 datasets in the folder (see attached images). When this happened before and crashed the galaxy application, whenever it restarted it'd try to resume what it was doing which created an endless loop of retrying and crashing until the job was removed. Does anyone know what may be causing this? Cheers, Martin -- -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
If I had to guess, I would guess this is caused by a mis-configured proxy (nginx or Apache) that is resubmitting a POST request that is taking Galaxy to long to respond to. Order of events being something like: - User clicks to upload library items. - Proxy gets requests and passes to Galaxy - Galaxy takes a long time to process request and doesn't respond within a timeout. - Proxy resends POST request to Galaxy. - Galaxy takes a long time to process request and doesn't respond within a timeout. ... Proxies should never resend POST requests to Galaxy as far a I can imagine, but we have seen this for instance when submitting workflows. Some people have had their proxy retry that request repeatedly. I don't really know if this is a problem with the default proxy configurations we list on the wiki or if it comes down to customizations or special loaded extensions at various sites that have encountered this. Is this enough to help debug the problem? I'm not really an expert on specific proxies, etc... and you have it there and seem to be able to reproduce the problem. If you do want further help I would post the proxy you are using, the extensions, the configuration, and the Galaxy logs corresponding to this incident to see if we can see the repeated posts and the route that is being posted to. If you are not using a proxy, then I am stumped :(. -John On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers <mjv08@aber.ac.uk> wrote:
Hi All,
I've noticed an issue a couple of times now where I've added a directory of fastq's from an NFS mounted filesystem (reference only rather than copying into galaxy) and then galaxy times out. Load average begins to get really high and then consumes all the RAM and sometimes crashes. These are the same symptom as I had before with this issue that was never resolved;
http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
What I've noticed is that in the dataset I'm uploading to galaxy, there are suddenly many duplicates. In this example that's just happened, there are 288 fastq.gz files in the physical folder, but galaxy has created 6 references to each file resulting in 1728 datasets in the folder (see attached images).
When this happened before and crashed the galaxy application, whenever it restarted it'd try to resume what it was doing which created an endless loop of retrying and crashing until the job was removed.
Does anyone know what may be causing this?
Cheers,
Martin
--
-- Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi John, Thanks for taking the time to reply. I never thought to look at the proxy settings but I think you're right, the behaviour seems to match what you've described. Like you I'm not really an expert on proxies and have no idea what would be mis-configured that would cause this. I'm using nginx and the configuration is as described in the wiki. I've not loaded any special extensions. nginx is configured like this; upstream galaxy_app { server localhost:8090; server localhost:8091; server localhost:8092; server localhost:8093; server localhost:8094; server localhost:8095; } server { # pass to uWSGI by default location / { proxy_pass http://galaxy_app; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-URL-SCHEME https; } ....static content } and in galaxy.ini I have a bunch of handlers, e.g. [server:handler0] use = egg:Paste#http port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 I thought that maybe the issue was to do with the 'job admin complication' very briefly mentioned here; https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthed... so I added this to my nginx conf location /admin/jobs { proxy_pass http://localhost:8090; } so this complication is not the one I'm having here. Are any of the people John mentioned having this issue here on the dev board? Cheers, Martin On 09/14/2015 03:11 PM, John Chilton wrote:
If I had to guess, I would guess this is caused by a mis-configured proxy (nginx or Apache) that is resubmitting a POST request that is taking Galaxy to long to respond to. Order of events being something like:
- User clicks to upload library items. - Proxy gets requests and passes to Galaxy - Galaxy takes a long time to process request and doesn't respond within a timeout. - Proxy resends POST request to Galaxy. - Galaxy takes a long time to process request and doesn't respond within a timeout. ...
Proxies should never resend POST requests to Galaxy as far a I can imagine, but we have seen this for instance when submitting workflows. Some people have had their proxy retry that request repeatedly.
I don't really know if this is a problem with the default proxy configurations we list on the wiki or if it comes down to customizations or special loaded extensions at various sites that have encountered this.
Is this enough to help debug the problem? I'm not really an expert on specific proxies, etc... and you have it there and seem to be able to reproduce the problem. If you do want further help I would post the proxy you are using, the extensions, the configuration, and the Galaxy logs corresponding to this incident to see if we can see the repeated posts and the route that is being posted to.
If you are not using a proxy, then I am stumped :(.
-John
On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers <mjv08@aber.ac.uk> wrote:
Hi All,
I've noticed an issue a couple of times now where I've added a directory of fastq's from an NFS mounted filesystem (reference only rather than copying into galaxy) and then galaxy times out. Load average begins to get really high and then consumes all the RAM and sometimes crashes. These are the same symptom as I had before with this issue that was never resolved;
http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
What I've noticed is that in the dataset I'm uploading to galaxy, there are suddenly many duplicates. In this example that's just happened, there are 288 fastq.gz files in the physical folder, but galaxy has created 6 references to each file resulting in 1728 datasets in the folder (see attached images).
When this happened before and crashed the galaxy application, whenever it restarted it'd try to resume what it was doing which created an endless loop of retrying and crashing until the job was removed.
Does anyone know what may be causing this?
Cheers,
Martin
--
-- Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
- -- - -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University SY23 3FG w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQEcBAEBAgAGBQJV/81wAAoJEHa0a8GkKQgIKD8IAIMP5gYGAjX62u129JboJtWL chSzRp3GN8ZGC5qosueMwlcNFdfTObDeM81z7vcXz1FT1N0mBXSMFEv1YaCc7ZrW YEKIc9m6n2qR4dqEwRXlXLfDv6A7Unoq+T5DG4s+teb0RfIg457KT2ilu9PVjeS0 408lAlIvwiutK2msa1/RsiaYwtXKVMPVAXNukP+KN5StYTkNt7rDDGoy7fwnx+FP P/JXTP9jrHRLNl+Uegx/oaNCm/R6WV9h+vRx1xXFe45M9RfHZSxF0fbGvqMDOPg+ vzvbClZwuxNuafFTB0E5NnqppN13MLt8Sc2wDT1PFeXAUqR2N4ZOQwM5yywXWAs= =gilj -----END PGP SIGNATURE-----
If you can consistently cause the problem I wonder if it is worth trying this (http://serverfault.com/questions/528653/how-can-i-stop-nginx-from-retrying-p...) advice out - it would be good to know if it helps. There is a gist here - https://gist.github.com/wojons/6154645. I don't think any POST in Galaxy should be retries instead of errored on so I don't see any downside of adding this to the nginx configuration. -John On Mon, Sep 21, 2015 at 10:27 AM, Martin Vickers <mjv08@aber.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi John,
Thanks for taking the time to reply. I never thought to look at the proxy settings but I think you're right, the behaviour seems to match what you've described.
Like you I'm not really an expert on proxies and have no idea what would be mis-configured that would cause this.
I'm using nginx and the configuration is as described in the wiki. I've not loaded any special extensions.
nginx is configured like this;
upstream galaxy_app { server localhost:8090; server localhost:8091; server localhost:8092; server localhost:8093; server localhost:8094; server localhost:8095; }
server { # pass to uWSGI by default location / { proxy_pass http://galaxy_app; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-URL-SCHEME https; }
....static content
}
and in galaxy.ini I have a bunch of handlers, e.g.
[server:handler0] use = egg:Paste#http port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5
I thought that maybe the issue was to do with the 'job admin complication' very briefly mentioned here;
https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthed...
so I added this to my nginx conf
location /admin/jobs { proxy_pass http://localhost:8090; }
so this complication is not the one I'm having here.
Are any of the people John mentioned having this issue here on the dev board?
Cheers,
Martin
On 09/14/2015 03:11 PM, John Chilton wrote:
If I had to guess, I would guess this is caused by a mis-configured proxy (nginx or Apache) that is resubmitting a POST request that is taking Galaxy to long to respond to. Order of events being something like:
- User clicks to upload library items. - Proxy gets requests and passes to Galaxy - Galaxy takes a long time to process request and doesn't respond within a timeout. - Proxy resends POST request to Galaxy. - Galaxy takes a long time to process request and doesn't respond within a timeout. ...
Proxies should never resend POST requests to Galaxy as far a I can imagine, but we have seen this for instance when submitting workflows. Some people have had their proxy retry that request repeatedly.
I don't really know if this is a problem with the default proxy configurations we list on the wiki or if it comes down to customizations or special loaded extensions at various sites that have encountered this.
Is this enough to help debug the problem? I'm not really an expert on specific proxies, etc... and you have it there and seem to be able to reproduce the problem. If you do want further help I would post the proxy you are using, the extensions, the configuration, and the Galaxy logs corresponding to this incident to see if we can see the repeated posts and the route that is being posted to.
If you are not using a proxy, then I am stumped :(.
-John
On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers <mjv08@aber.ac.uk> wrote:
Hi All,
I've noticed an issue a couple of times now where I've added a directory of fastq's from an NFS mounted filesystem (reference only rather than copying into galaxy) and then galaxy times out. Load average begins to get really high and then consumes all the RAM and sometimes crashes. These are the same symptom as I had before with this issue that was never resolved;
http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
What I've noticed is that in the dataset I'm uploading to galaxy, there are suddenly many duplicates. In this example that's just happened, there are 288 fastq.gz files in the physical folder, but galaxy has created 6 references to each file resulting in 1728 datasets in the folder (see attached images).
When this happened before and crashed the galaxy application, whenever it restarted it'd try to resume what it was doing which created an endless loop of retrying and crashing until the job was removed.
Does anyone know what may be causing this?
Cheers,
Martin
--
-- Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
- --
- -- Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University SY23 3FG
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux)
iQEcBAEBAgAGBQJV/81wAAoJEHa0a8GkKQgIKD8IAIMP5gYGAjX62u129JboJtWL chSzRp3GN8ZGC5qosueMwlcNFdfTObDeM81z7vcXz1FT1N0mBXSMFEv1YaCc7ZrW YEKIc9m6n2qR4dqEwRXlXLfDv6A7Unoq+T5DG4s+teb0RfIg457KT2ilu9PVjeS0 408lAlIvwiutK2msa1/RsiaYwtXKVMPVAXNukP+KN5StYTkNt7rDDGoy7fwnx+FP P/JXTP9jrHRLNl+Uegx/oaNCm/R6WV9h+vRx1xXFe45M9RfHZSxF0fbGvqMDOPg+ vzvbClZwuxNuafFTB0E5NnqppN13MLt8Sc2wDT1PFeXAUqR2N4ZOQwM5yywXWAs= =gilj -----END PGP SIGNATURE-----
participants (2)
-
John Chilton
-
Martin Vickers