Hi All,

I'm still having this issue despite several attempts to try to resolve it. I've booted it on a 80GB VM, there are no users on it and only 1 or 2 tools installed from the tool shed. I have loaded around 150 fasta.gz files into a couple of data libraries which are on a nfs share. When galaxy starts it has a 57GB RAM foot print. If I leave it and do nothing, around 5 mins after I start galaxy something kicks in and starts consuming all the ram and then it segfaults.

root@galaxy:~# top
top - 10:01:34 up 20 min,  2 users,  load average: 0.84, 0.49, 0.43
Tasks: 180 total,   1 running, 179 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.4 us,  0.2 sy,  0.0 ni, 87.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem:  81295232 total, 58937820 used, 22357408 free,    13508 buffers
KiB Swap:  8640508 total,    69940 used,  8570568 free.    86132 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                 
 2867 galaxy    20   0 57.856g 0.054t  11460 S 101.6 71.4   1:38.15 python       

This is what I get in syslog when it crashes like this.

Jul 20 09:50:25 galaxy kernel: [  569.351158] show_signal_msg: 18 callbacks suppressed
Jul 20 09:50:25 galaxy kernel: [  569.351168] python[1883]: segfault at 24 ip 0000000000558077 sp 00007fc5cb9e6400 error 6 in python2.7[400000+2bc000]
Jul 20 09:50:25 galaxy kernel: [  569.444890] Core dump to |/usr/share/apport/apport 1409 11 0 1409 pipe failed

If there isn't sufficient memory in the first place (i.e. less than 57GB), I get something more like this;

Jul 16 20:36:41 galaxy kernel: [  117.123921] Out of memory: Kill process 1390 (python) score 986 or sacrifice child
Jul 16 20:36:41 galaxy kernel: [  117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB (END)

I can't see anything in the paster.log.

I'm at a bit of a loss where to look for what is causing it. Any help would be greatly appreciated.

Many thanks,

Martin

On 07/16/2015 08:48 PM, Martin Vickers [mjv08] wrote:
>
> Hi Nate,
>
>
> Thanks for the reply. In syslog I'm getting;
>
>
> Jul 16 20:36:41 galaxy kernel: [  117.123921] Out of memory: Kill process 1390 (python) score 986 or sacrifice child
> Jul 16 20:36:41 galaxy kernel: [  117.124087] Killed process 1390 (python) total-vm:43496348kB, anon-rss:32611892kB, file-rss:1800kB
> (END)
>
> It's a 32GB VM. I could increase it but I wouldn't expect 32GB to be too little. I've attached the full syslog.
>
>
> Dr. Martin Vickers
>
> Data Manager/HPC Systems Administrator
> Institute of Biological, Environmental and Rural Sciences
> IBERS New Building
> Aberystwyth University
>
> w: http://www.martin-vickers.co.uk/
> e: mjv08@aber.ac.uk
> t: 01970 62 2807
>
>
> -------------------------
> *From:* Nate Coraor <nate@bx.psu.edu>
> *Sent:* 16 July 2015 04:36 PM
> *To:* Martin Vickers [mjv08]
> *Cc:* galaxy-dev@lists.galaxyproject.org
> *Subject:* Re: [galaxy-dev] ./run.sh segfault

> Hi Martin,
>
> Is there anything in the syslog?
>
> --nate
>
> On Thu, Jul 16, 2015 at 11:26 AM, Martin Vickers <mjv08@aber.ac.uk <mailto:mjv08@aber.ac.uk>> wrote:
>

Hi All,

I have a weird issue that's just cropped up. After a new install of
galaxy (checked out on Monday from github) on a ubuntu vm, using
postgres rather than sqlite as well as a few other production
recommendations, I started playing around with the Data Libraries
functionality. I linked a bunch of fastq.gz files into galaxy (around
150 in total) and everything was working fine. I went home and the
next day, it was down.

I tried to start it up as usual (using an init.d script), it worked
for less than a minute and then disappeared again. So I tried running
it as the galaxy user using ./run.sh and I get a seg fault;

Starting server in PID 23173.
serving on http://144.124.110.39:8080
Segmentation fault

Tried again with strace

Starting server in PID 23552.
serving on http://144.124.110.39:8080
[{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 23552
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=23552,
si_status=SIGKILL, si_utime=1590, si_stime=1930} ---
rt_sigreturn()                          = 23552
write(2, "Killed\n", 7Killed
)                 = 7
read(10, "", 8192)                      = 0
exit_group(137)                         = ?
+++ exited with 137 +++

I can't see anything odd in the log file and I've turned debugging on
in galaxy.ini. I'm at a bit of a loss. Does anyone know what might be
causing it?

Cheers,

>     ___________________________________________________________
>     Please keep all replies on the list by using "reply all"
>     in your mail client.  To manage your subscriptions to this
>     and other Galaxy lists, please use the interface at:
>       https://lists.galaxyproject.org/
>
>     To search Galaxy mailing lists use the unified search at:
>       http://galaxyproject.org/search/mailinglists/
>
>
>
>
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/


--

--
Dr. Martin Vickers

Data Manager/HPC Systems Administrator
Institute of Biological, Environmental and Rural Sciences
IBERS New Building
Aberystwyth University

w: http://www.martin-vickers.co.uk/
e: mjv08@aber.ac.uk
t: 01970 62 2807