Hello Devs,

Now that the galaxy process has crashed, I'll send you more information regarding the logs.  

galaxy.jobs.runners.local DEBUG 2015-08-04 17:13:20,303 (2337) executing job script: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2337/galaxy_2337.sh
galaxy.jobs DEBUG 2015-08-04 17:13:20,487 (2337) Persisting job destination (destination id: multicore6)
galaxy.jobs DEBUG 2015-08-04 17:13:25,181 (2468) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2468
galaxy.jobs.handler DEBUG 2015-08-04 17:13:25,190 (2468) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:13:25,466 (2468) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:13:25,485 Job [2468] queued (295.181 ms)
galaxy.jobs.handler INFO 2015-08-04 17:13:25,490 (2468) Job dispatched
galaxy.jobs DEBUG 2015-08-04 17:13:25,538 (2469) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2469
galaxy.jobs.handler DEBUG 2015-08-04 17:13:25,545 (2469) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:13:25,777 (2469) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:13:25,798 Job [2469] queued (252.823 ms)
galaxy.jobs.handler INFO 2015-08-04 17:13:25,803 (2469) Job dispatched
galaxy.jobs.runners.local DEBUG 2015-08-04 17:14:09,922 execution finished: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2337/galaxy_2337.sh
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:10,476 loading metadata from file for: HistoryDatasetAssociation 5117
galaxy.jobs.runners.local DEBUG 2015-08-04 17:14:10,589 execution finished: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2318/galaxy_2318.sh
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:10,919 loading metadata from file for: HistoryDatasetAssociation 5116
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:11,500 loading metadata from file for: HistoryDatasetAssociation 5115
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:11,519 loading metadata from file for: HistoryDatasetAssociation 5084
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:12,270 loading metadata from file for: HistoryDatasetAssociation 5083
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:12,804 loading metadata from file for: HistoryDatasetAssociation 5082
galaxy.jobs INFO 2015-08-04 17:14:13,406 Collecting job metrics for <galaxy.model.Job object at 0x117e5b4d0>
galaxy.jobs DEBUG 2015-08-04 17:14:13,537 job 2337 ended (finish() executed in (3614.250 ms))
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:13,580 Cleaning up external metadata files
galaxy.jobs INFO 2015-08-04 17:14:14,001 Collecting job metrics for <galaxy.model.Job object at 0x11b03b410>
galaxy.jobs DEBUG 2015-08-04 17:14:14,068 job 2318 ended (finish() executed in (3449.205 ms))
galaxy.datatypes.metadata DEBUG 2015-08-04 17:14:14,112 Cleaning up external metadata files
galaxy.jobs DEBUG 2015-08-04 17:14:19,625 (2319) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2319
galaxy.jobs.handler DEBUG 2015-08-04 17:14:19,630 (2319) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:14:20,000 (2319) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:14:20,018 Job [2319] queued (387.377 ms)
galaxy.jobs.handler INFO 2015-08-04 17:14:20,021 (2319) Job dispatched
galaxy.jobs DEBUG 2015-08-04 17:14:20,061 (2320) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2320
galaxy.jobs.handler DEBUG 2015-08-04 17:14:20,067 (2320) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:14:20,254 (2320) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:14:20,268 Job [2320] queued (201.349 ms)
galaxy.jobs.handler INFO 2015-08-04 17:14:20,272 (2320) Job dispatched
galaxy.jobs DEBUG 2015-08-04 17:14:20,696 (2338) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2338
galaxy.jobs.handler DEBUG 2015-08-04 17:14:20,701 (2338) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:14:20,891 (2338) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:14:20,941 Job [2338] queued (239.991 ms)
galaxy.jobs.handler INFO 2015-08-04 17:14:20,945 (2338) Job dispatched
galaxy.jobs DEBUG 2015-08-04 17:14:20,982 (2339) Working directory for job is: /Users/galaxy/galaxy-dist/database/job_working_directory/002/2339
galaxy.jobs.handler DEBUG 2015-08-04 17:14:20,989 (2339) Dispatching to local runner
galaxy.jobs DEBUG 2015-08-04 17:14:21,456 (2339) Persisting job destination (destination id: local)
galaxy.jobs.runners DEBUG 2015-08-04 17:14:21,468 Job [2339] queued (478.726 ms)
galaxy.jobs.handler INFO 2015-08-04 17:14:21,472 (2339) Job dispatched
ERROR LINE: run.sh: line 81: 21118 Abort trap: 6           python ./scripts/paster.py serve $GALAXY_CONFIG_FILE $@




When trying to restart run.sh, this is what we see in the log.
Starting server in PID 57249.
Traceback (most recent call last):
  File "./scripts/paster.py", line 37, in <module>
    serve.run()
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/serve.py", line 1049, in run
    invoke(command, command_name, options, args[1:])
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/serve.py", line 1055, in invoke
    exit_code = runner.run(args)
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/serve.py", line 220, in run
    result = self.command()
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/serve.py", line 670, in command
    serve()
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/serve.py", line 654, in serve
    server(app)
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/loadwsgi.py", line 292, in server_wrapper
    **context.local_conf)
  File "/Users/galaxy/galaxy-dist/lib/galaxy/util/pastescript/loadwsgi.py", line 97, in fix_call
    val = callable(*args, **kw)
  File "/Users/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1342, in server_runner
    serve(wsgi_app, **kwargs)
  File "/Users/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1291, in serve
    request_queue_size=request_queue_size)
  File "/Users/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1134, in __init__
    request_queue_size=request_queue_size)
  File "/Users/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 1113, in __init__
    request_queue_size=request_queue_size)
  File "/Users/galaxy/galaxy-dist/eggs/Paste-1.7.5.1-py2.7.egg/paste/httpserver.py", line 360, in __init__
    HTTPServer.__init__(self, server_address, RequestHandlerClass)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 408, in __init__
    self.server_bind()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/BaseHTTPServer.py", line 108, in server_bind
    SocketServer.TCPServer.server_bind(self)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 419, in server_bind
    self.socket.bind(self.server_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 48] Address already in use
galaxy.jobs.handler INFO 2015-08-04 17:19:32,844 sending stop signal to worker thread
galaxy.jobs.handler INFO 2015-08-04 17:19:32,845 job handler queue stopped
galaxy.jobs.runners INFO 2015-08-04 17:19:32,845 LocalRunner: Sending stop signal to 2 worker threads
galaxy.jobs.runners INFO 2015-08-04 17:19:32,857 LocalRunner: Sending stop signal to 8 worker threads
galaxy.jobs.handler INFO 2015-08-04 17:19:32,899 sending stop signal to worker thread
galaxy.jobs.handler INFO 2015-08-04 17:19:32,912 job handler stop queue stopped

After we kill the running galaxy tool processes, we are able to successfully start run.sh again.  We migrated from a previous version, so could this be caused by outdated eggs or .xml files?  

Thank you,
-Hans


On Tue, Aug 4, 2015 at 2:03 PM, Hans Vasquez-Gross <havasquezgross@ucdavis.edu> wrote:
Hi Bjorn and Hans,

We are running Galaxy on our local webserver, so there is no job scheduler.  Instead, we are using the localJobRunner configuration in the job_conf.xml

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="8"/>
        <plugin id="multilocal" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="2"/>
    </plugins>
    <destinations default="local">
        <destination id="local" runner="local"/>
        <destination id="multicore6" runner="multilocal">
          <param id="local_slots">6</param>
        </destination>
    </destinations>
    <tools>
    <tool id="bowtie2" destination="multicore6" />
    <tool id="spades"  destination="multicore6" />
    <tool id="bbmap_1"  destination="multicore6" />
    <tool id="iuc_pear"  destination="multicore6" />
    <tool id="abyss-pe"  destination="multicore6" />
    <tool id="fastq_groomer_parallel"  destination="multicore6" />
    </tools>

    <handlers>
        <handler id="main"/>
    </handlers>
</job_conf>

Also, normally, we run galaxy in the daemon mode, but recently to help debug this issue, we have been running galaxy in interactive mode in a screen session.  

@Hans - The processes run under the galaxy user and everything seems to run fine.  I am trying to get a concrete example, but run.sh usually crashes during our trimming/assembly steps for NGS data.  Sometimes these workflows run to completion, but sometimes they crash the run.sh process.  When run.sh crashes, the individual running programs keep running as the galaxy user.  We cannot restart galaxy until we manually kill those running processes.  Here are what the running processes look like on the system.

galaxy         21162  99.4  0.0  2432784    616 s014  R     2:08PM 1425:10.57 seqtk seq -q 0 -X 255 -l 0 -Q 33 -s 11 -f 1.0 -L 0 -1 /Users/galaxy/data_galaxy/test_BACs/all-merged.interleaved.fq
galaxy         21159  99.4  0.0  2432784    616 s014  R     2:08PM 1425:16.72 seqtk seq -q 0 -X 255 -l 0 -Q 33 -s 11 -f 1.0 -L 0 -2 /Users/galaxy/data_galaxy/test_BACs/all-merged.interleaved.fq
galaxy         21118  92.5  0.5  3015188 367852 s014  R+    2:07PM 791:21.54 python ./scripts/paster.py serve universe_wsgi.ini
galaxy         21160   0.0  0.0  2433640   1044 s014  S     2:08PM   0:00.01 /bin/sh /Users/galaxy/galaxy-dist/database/job_working_directory/002/2180/galaxy_2180.sh
galaxy         21157   0.0  0.0  2433640   1044 s014  S     2:08PM   0:00.01 /bin/sh /Users/galaxy/galaxy-dist/database/job_working_directory/002/2179/galaxy_2179.sh
galaxy         21113   0.0  0.0  2433640   1000 s014  S+    2:07PM   0:00.00 sh run.sh


Thank you for the help,
-Hans