-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All, I have a weird issue that's just cropped up. After a new install of galaxy (checked out on Monday from github) on a ubuntu vm, using postgres rather than sqlite as well as a few other production recommendations, I started playing around with the Data Libraries functionality. I linked a bunch of fastq.gz files into galaxy (around 150 in total) and everything was working fine. I went home and the next day, it was down. I tried to start it up as usual (using an init.d script), it worked for less than a minute and then disappeared again. So I tried running it as the galaxy user using ./run.sh and I get a seg fault; Starting server in PID 23173. serving on http://144.124.110.39:8080 Segmentation fault Tried again with strace Starting server in PID 23552. serving on http://144.124.110.39:8080 [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552 - --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552, si_status=SIGKILL, si_utime=1590, si_stime=1930} --- rt_sigreturn() = 23552 write(2, "Killed\n", 7Killed ) = 7 read(10, "", 8192) = 0 exit_group(137) = ? +++ exited with 137 +++ I can't see anything odd in the log file and I've turned debugging on in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be causing it? Cheers, - -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQEcBAEBAgAGBQJVp80YAAoJEHa0a8GkKQgIGlIH/1VfAPbs/5ApDBdyoOV5qf1y oCOv93IojARyfI0ksSjF8NRNzw5fNp1R8AzZzomaR3SOUkBuZutre600sy0azTZw E6gjxtMuvaMyEsOTXtToVarVJT0wTG8+5DJRIYLxtYZm7kvbZK0WuzrN2zDT6663 Rnm7zI/zBpTAyp6uXwgmz0x5gpH6KFwRcEHEbU3JWy6nj1zithJShwYPlBuhT5IB OaPwOKflcZpZ8NBTEGsh038JrkU+eE50a9aEjQ2m/DpfM/TN9ujgEFm1dyy/iQS7 ewwQUpWJDkA/u0ZX602dsNdV2LvGuKVVMEHiQ25zaUQZ/iGTwKBQsFM2LlDybgA= =jzYG -----END PGP SIGNATURE-----
Hi Martin, Is there anything in the syslog? --nate On Thu, Jul 16, 2015 at 11:26 AM, Martin Vickers <mjv08@aber.ac.uk> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi All,
I have a weird issue that's just cropped up. After a new install of galaxy (checked out on Monday from github) on a ubuntu vm, using postgres rather than sqlite as well as a few other production recommendations, I started playing around with the Data Libraries functionality. I linked a bunch of fastq.gz files into galaxy (around 150 in total) and everything was working fine. I went home and the next day, it was down.
I tried to start it up as usual (using an init.d script), it worked for less than a minute and then disappeared again. So I tried running it as the galaxy user using ./run.sh and I get a seg fault;
Starting server in PID 23173. serving on http://144.124.110.39:8080 Segmentation fault
Tried again with strace
Starting server in PID 23552. serving on http://144.124.110.39:8080 [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552 - --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552, si_status=SIGKILL, si_utime=1590, si_stime=1930} --- rt_sigreturn() = 23552 write(2, "Killed\n", 7Killed ) = 7 read(10, "", 8192) = 0 exit_group(137) = ? +++ exited with 137 +++
I can't see anything odd in the log file and I've turned debugging on in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be causing it?
Cheers,
- -- Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux)
iQEcBAEBAgAGBQJVp80YAAoJEHa0a8GkKQgIGlIH/1VfAPbs/5ApDBdyoOV5qf1y oCOv93IojARyfI0ksSjF8NRNzw5fNp1R8AzZzomaR3SOUkBuZutre600sy0azTZw E6gjxtMuvaMyEsOTXtToVarVJT0wTG8+5DJRIYLxtYZm7kvbZK0WuzrN2zDT6663 Rnm7zI/zBpTAyp6uXwgmz0x5gpH6KFwRcEHEbU3JWy6nj1zithJShwYPlBuhT5IB OaPwOKflcZpZ8NBTEGsh038JrkU+eE50a9aEjQ2m/DpfM/TN9ujgEFm1dyy/iQS7 ewwQUpWJDkA/u0ZX602dsNdV2LvGuKVVMEHiQ25zaUQZ/iGTwKBQsFM2LlDybgA= =jzYG -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi Nate, Thanks for the reply. In syslog I'm getting; Jul 16 20:36:41 galaxy kernel: [ 117.123921] Out of memory: Kill process 1390 (python) score 986 or sacrifice child Jul 16 20:36:41 galaxy kernel: [ 117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB (END) It's a 32GB VM. I could increase it but I wouldn't expect 32GB to be too little. I've attached the full syslog. Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 ________________________________ From: Nate Coraor <nate@bx.psu.edu> Sent: 16 July 2015 04:36 PM To: Martin Vickers [mjv08] Cc: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] ./run.sh segfault Hi Martin, Is there anything in the syslog? --nate On Thu, Jul 16, 2015 at 11:26 AM, Martin Vickers <mjv08@aber.ac.uk<mailto:mjv08@aber.ac.uk>> wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All, I have a weird issue that's just cropped up. After a new install of galaxy (checked out on Monday from github) on a ubuntu vm, using postgres rather than sqlite as well as a few other production recommendations, I started playing around with the Data Libraries functionality. I linked a bunch of fastq.gz files into galaxy (around 150 in total) and everything was working fine. I went home and the next day, it was down. I tried to start it up as usual (using an init.d script), it worked for less than a minute and then disappeared again. So I tried running it as the galaxy user using ./run.sh and I get a seg fault; Starting server in PID 23173. serving on http://144.124.110.39:8080 Segmentation fault Tried again with strace Starting server in PID 23552. serving on http://144.124.110.39:8080 [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552 - --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552, si_status=SIGKILL, si_utime=1590, si_stime=1930} --- rt_sigreturn() = 23552 write(2, "Killed\n", 7Killed ) = 7 read(10, "", 8192) = 0 exit_group(137) = ? +++ exited with 137 +++ I can't see anything odd in the log file and I've turned debugging on in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be causing it? Cheers, - -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk<mailto:mjv08@aber.ac.uk> t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQEcBAEBAgAGBQJVp80YAAoJEHa0a8GkKQgIGlIH/1VfAPbs/5ApDBdyoOV5qf1y oCOv93IojARyfI0ksSjF8NRNzw5fNp1R8AzZzomaR3SOUkBuZutre600sy0azTZw E6gjxtMuvaMyEsOTXtToVarVJT0wTG8+5DJRIYLxtYZm7kvbZK0WuzrN2zDT6663 Rnm7zI/zBpTAyp6uXwgmz0x5gpH6KFwRcEHEbU3JWy6nj1zithJShwYPlBuhT5IB OaPwOKflcZpZ8NBTEGsh038JrkU+eE50a9aEjQ2m/DpfM/TN9ujgEFm1dyy/iQS7 ewwQUpWJDkA/u0ZX602dsNdV2LvGuKVVMEHiQ25zaUQZ/iGTwKBQsFM2LlDybgA= =jzYG -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi, I was just looking at what the galaxy program was doing using top and it is using all the memory! I've rebooted the machine with 90GB RAM and it remained up and running using 57GB until I logged into galaxy and looked to see if there were any running jobs in the admin panel. The memory usages nudged up past 90GB and then died. So it's definitely a memory problem. Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 ________________________________ From: galaxy-dev <galaxy-dev-bounces@lists.galaxyproject.org> on behalf of Martin Vickers [mjv08] <mjv08@aber.ac.uk> Sent: 16 July 2015 08:48 PM To: Nate Coraor Cc: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] ./run.sh segfault Hi Nate, Thanks for the reply. In syslog I'm getting; Jul 16 20:36:41 galaxy kernel: [ 117.123921] Out of memory: Kill process 1390 (python) score 986 or sacrifice child Jul 16 20:36:41 galaxy kernel: [ 117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB (END) It's a 32GB VM. I could increase it but I wouldn't expect 32GB to be too little. I've attached the full syslog. Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807 ________________________________ From: Nate Coraor <nate@bx.psu.edu> Sent: 16 July 2015 04:36 PM To: Martin Vickers [mjv08] Cc: galaxy-dev@lists.galaxyproject.org Subject: Re: [galaxy-dev] ./run.sh segfault Hi Martin, Is there anything in the syslog? --nate On Thu, Jul 16, 2015 at 11:26 AM, Martin Vickers <mjv08@aber.ac.uk<mailto:mjv08@aber.ac.uk>> wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi All, I have a weird issue that's just cropped up. After a new install of galaxy (checked out on Monday from github) on a ubuntu vm, using postgres rather than sqlite as well as a few other production recommendations, I started playing around with the Data Libraries functionality. I linked a bunch of fastq.gz files into galaxy (around 150 in total) and everything was working fine. I went home and the next day, it was down. I tried to start it up as usual (using an init.d script), it worked for less than a minute and then disappeared again. So I tried running it as the galaxy user using ./run.sh and I get a seg fault; Starting server in PID 23173. serving on http://144.124.110.39:8080 Segmentation fault Tried again with strace Starting server in PID 23552. serving on http://144.124.110.39:8080 [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552 - --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552, si_status=SIGKILL, si_utime=1590, si_stime=1930} --- rt_sigreturn() = 23552 write(2, "Killed\n", 7Killed ) = 7 read(10, "", 8192) = 0 exit_group(137) = ? +++ exited with 137 +++ I can't see anything odd in the log file and I've turned debugging on in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be causing it? Cheers, - -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk<mailto:mjv08@aber.ac.uk> t: 01970 62 2807 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQEcBAEBAgAGBQJVp80YAAoJEHa0a8GkKQgIGlIH/1VfAPbs/5ApDBdyoOV5qf1y oCOv93IojARyfI0ksSjF8NRNzw5fNp1R8AzZzomaR3SOUkBuZutre600sy0azTZw E6gjxtMuvaMyEsOTXtToVarVJT0wTG8+5DJRIYLxtYZm7kvbZK0WuzrN2zDT6663 Rnm7zI/zBpTAyp6uXwgmz0x5gpH6KFwRcEHEbU3JWy6nj1zithJShwYPlBuhT5IB OaPwOKflcZpZ8NBTEGsh038JrkU+eE50a9aEjQ2m/DpfM/TN9ujgEFm1dyy/iQS7 ewwQUpWJDkA/u0ZX602dsNdV2LvGuKVVMEHiQ25zaUQZ/iGTwKBQsFM2LlDybgA= =jzYG -----END PGP SIGNATURE----- ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Hi All, I'm still having this issue despite several attempts to try to resolve it. I've booted it on a 80GB VM, there are no users on it and only 1 or 2 tools installed from the tool shed. I have loaded around 150 fasta.gz files into a couple of data libraries which are on a nfs share. When galaxy starts it has a 57GB RAM foot print. If I leave it and do nothing, around 5 mins after I start galaxy something kicks in and starts consuming all the ram and then it segfaults. root@galaxy:~# top top - 10:01:34 up 20 min, 2 users, load average: 0.84, 0.49, 0.43 Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie %Cpu(s): 12.4 us, 0.2 sy, 0.0 ni, 87.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st KiB Mem: 81295232 total, 58937820 used, 22357408 free, 13508 buffers KiB Swap: 8640508 total, 69940 used, 8570568 free. 86132 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2867 galaxy 20 0 57.856g 0.054t 11460 S 101.6 71.4 1:38.15 python This is what I get in syslog when it crashes like this. Jul 20 09:50:25 galaxy kernel: [ 569.351158] show_signal_msg: 18 callbacks suppressed Jul 20 09:50:25 galaxy kernel: [ 569.351168] python[1883]: segfault at 24 ip 0000000000558077 sp 00007fc5cb9e6400 error 6 in python2.7[400000+2bc000] Jul 20 09:50:25 galaxy kernel: [ 569.444890] Core dump to |/usr/share/apport/apport 1409 11 0 1409 pipe failed If there isn't sufficient memory in the first place (i.e. less than 57GB), I get something more like this; Jul 16 20:36:41 galaxy kernel: [ 117.123921] Out of memory: Kill process 1390 (python) score 986 or sacrifice child Jul 16 20:36:41 galaxy kernel: [ 117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB (END) I can't see anything in the paster.log. I'm at a bit of a loss where to look for what is causing it. Any help would be greatly appreciated. Many thanks, Martin On 07/16/2015 08:48 PM, Martin Vickers [mjv08] wrote:
Hi Nate,
Thanks for the reply. In syslog I'm getting;
Jul 16 20:36:41 galaxy kernel: [ 117.123921] Out of memory: Kill
process 1390 (python) score 986 or sacrifice child
Jul 16 20:36:41 galaxy kernel: [ 117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB (END)
It's a 32GB VM. I could increase it but I wouldn't expect 32GB to be too little. I've attached the full syslog.
Dr. Martin Vickers
Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University
w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
------------------------- *From:* Nate Coraor <nate@bx.psu.edu> *Sent:* 16 July 2015 04:36 PM *To:* Martin Vickers [mjv08] *Cc:* galaxy-dev@lists.galaxyproject.org *Subject:* Re: [galaxy-dev] ./run.sh segfault
Hi Martin,
Is there anything in the syslog?
--nate
On Thu, Jul 16, 2015 at 11:26 AM, Martin Vickers <mjv08@aber.ac.uk <mailto:mjv08@aber.ac.uk>> wrote:
Hi All,
I have a weird issue that's just cropped up. After a new install of galaxy (checked out on Monday from github) on a ubuntu vm, using postgres rather than sqlite as well as a few other production recommendations, I started playing around with the Data Libraries functionality. I linked a bunch of fastq.gz files into galaxy (around 150 in total) and everything was working fine. I went home and the next day, it was down.
I tried to start it up as usual (using an init.d script), it worked for less than a minute and then disappeared again. So I tried running it as the galaxy user using ./run.sh and I get a seg fault;
Starting server in PID 23173. serving on http://144.124.110.39:8080 Segmentation fault
Tried again with strace
Starting server in PID 23552. serving on http://144.124.110.39:8080 [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552, si_status=SIGKILL, si_utime=1590, si_stime=1930} --- rt_sigreturn() = 23552 write(2, "Killed\n", 7Killed ) = 7 read(10, "", 8192) = 0 exit_group(137) = ? +++ exited with 137 +++
I can't see anything odd in the log file and I've turned debugging on in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be causing it?
Cheers,
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Environmental and Rural Sciences IBERS New Building Aberystwyth University w: http://www.martin-vickers.co.uk/ e: mjv08@aber.ac.uk t: 01970 62 2807
participants (3)
-
Martin Vickers
-
Martin Vickers [mjv08]
-
Nate Coraor