handler processes suddenly consuming all RAM
I recently upgraded galaxy to the 10/06 unannounced galaxy-dist. I typically run with 2 web processes and 2 job handlers on a single machine. It seems that now the job handlers consume all RAM at startup and are killed by the kernel. Anybody seen something like this? Bad job? bug? I’m considering a downgrade to the previous stable release. Brad -- Brad Langhorst, Ph.D. Applications and Product Development Scientist
In case this is useful to someone else… looks like this was some really huge upoad jobs that caused a DOS. Maybe the pubic instance has some config in place to avoid blowing up in this situation. While watching the logs during startup i see the memory usage of handler0 go from about 200M to about 4G right after the simple json module conflict warning. it seems to stop increasing right after it prints the cloudlaunch line. Soon thereafter it jumps to 7G of RAM at which point it starts swapping. Then I see a huge number of lines like this (fills my terminal buffer) galaxy.jobs DEBUG 2014-11-24 21:16:02,205 (159978) Persisting job destination (destination id: gridengine) galaxy.jobs.handler INFO 2014-11-24 21:16:02,315 (159978) Job dispatched galaxy.jobs DEBUG 2014-11-24 21:16:02,337 (159979) Working directory for job is: /mnt/galaxy/data/galaxy/job_working_directory/000/159/ 159979 galaxy.jobs.handler DEBUG 2014-11-24 21:16:02,352 (159979) Dispatching to gridengine runner galaxy.tools.deps DEBUG 2014-11-24 21:17:05,076 Building dependency shell command for dependency 'samtools' galaxy.tools.deps WARNING 2014-11-24 21:17:05,088 Failed to resolve dependency on 'samtools', ignoring galaxy.jobs DEBUG 2014-11-24 21:17:29,869 (159979) Persisting job destination (destination id: gridengine) galaxy.jobs.handler INFO 2014-11-24 21:17:30,177 (159979) Job dispatched galaxy.jobs.runners DEBUG 2014-11-24 21:22:24,137 (159978) command is: python /mnt/galaxy/data/galaxy/galaxy-dist/tools/data_source/upload.py /mnt/galaxy/data/galaxy/galaxy-dist /mnt/galaxy/data/galaxy/tmp/tmpVdupFJ /mnt/galaxy/data/galaxy/tmp/tmpLHXyqa 219868:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219868_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219868.dat 219869:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219869_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219869.dat 219870:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219870_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219870.dat 219871:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219871_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219871.dat 219872:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219872_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219872.dat 219873:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219873_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219873.dat 219874:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219874_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219874.dat 219875:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219875_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219875.dat 219876:/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/dataset_219876_files:/mnt/galaxy/data/galaxy/user-data/000/219/dataset_219876.dat … thousands and thousands of lines later... 978/metadata_results_LibraryDatasetDatasetAssociation_27280_rGPnso,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27280_xFpkLn /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27282_EjpcbG,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27282_wVaFjB,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27282_OeHNUy,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27282_G1Q9PY,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27282_17V4t6 /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27290_Y5X8Ih,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27290_8E55GA,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27290_W11KQm,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27290_65u8Ja,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27290_3jbsMb /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27296_DmijyX,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27296_u1ihbO,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27296_MYIvvx,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27296_eLwrTR,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27296_UpVqz7 /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27321_gSfC2s,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27321_ienD2I,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27321_hUnFKv,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27321_zWIzcm,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27321_4KvsAs /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27270_5oWQ0p,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27270_Rv3rwJ,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27270_VyM6CA,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27270_3nmQ_S,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27270_ZCTSSJ /mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_in_LibraryDatasetDatasetAssociation_27317_YBjGjl,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_kwds_LibraryDatasetDatasetAssociation_27317_xC8TCo,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_out_LibraryDatasetDatasetAssociation_27317_ygYy0B,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_results_LibraryDatasetDatasetAssociation_27317_WJO3MW,,/mnt/galaxy/data/galaxy/job_working_directory/000/159/159978/metadata_override_LibraryDatasetDatasetAssociation_27317_8MY55s; sh -c "exit $return_code" Whaaaa….. brad galaxy.web.framework.base DEBUG 2014-11-24 21:15:17,288 Enabling 'workflow' controller, class: WorkflowController /mnt/galaxy/data/galaxy/galaxy-dist/lib/galaxy/__init__.py:63: UserWarning: Module simplejson was already imported from /mnt/galaxy/data/galaxy/sw/lib/python2.7/site-packages/simplejson/__init__.pyc, but /mnt/galaxy/data/galaxy/galaxy-dist/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs2.egg is being added to sys.path self.check_version_conflict() galaxy.web.framework.base DEBUG 2014-11-24 21:15:29,670 Enabling 'cloudlaunch' controller, class: CloudController galaxy.web.framework.base DEBUG 2014-11-24 21:15:30,620 Enabling 'data_manager' controller, class: DataManager -- Brad Langhorst, Ph.D. Applications and Product Development Scientist On Nov 24, 2014, at 8:50 PM, Langhorst, Brad <Langhorst@neb.com<mailto:Langhorst@neb.com>> wrote: I recently upgraded galaxy to the 10/06 unannounced galaxy-dist. I typically run with 2 web processes and 2 job handlers on a single machine. It seems that now the job handlers consume all RAM at startup and are killed by the kernel. Anybody seen something like this? Bad job? bug? I’m considering a downgrade to the previous stable release. Brad -- Brad Langhorst, Ph.D. Applications and Product Development Scientist ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (1)
-
Langhorst, Brad