Hi all, I've noticed that when I try to visualize a BAM file on UCSC (with display_application) the galaxy web process drains RAM and never releases it... I easly go in OutOfMemory error. Is it trying to load BAM (or any other custom file defined to be visualized in the same way) in RAM? Can anybody explain how display_application works and how to debug it? d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
On May 28, 2010, at 10:22 AM, Davide Cittaro wrote:
Hi all, I've noticed that when I try to visualize a BAM file on UCSC (with display_application) the galaxy web process drains RAM and never releases it... I easly go in OutOfMemory error. Is it trying to load BAM (or any other custom file defined to be visualized in the same way) in RAM? Can anybody explain how display_application works and how to debug it?
Mmm... apparently the Paste egg loads the file to be shown to UCSC in memory (and there's no close()). I see this egg is not developed by GalaxyTeam, so I don't know if this should be issued as a galaxy bug. BTW, I've tried this $ diff -u Paste-1.6-py2.6.egg/paste/wsgilib.py.tmp Paste-1.6-py2.6.egg/paste/wsgilib.py --- Paste-1.6-py2.6.egg/paste/wsgilib.py.tmp 2010-05-28 11:06:00.174278394 +0200 +++ Paste-1.6-py2.6.egg/paste/wsgilib.py 2010-05-20 10:44:49.354765626 +0200 @@ -15,7 +15,6 @@ from traceback import print_exception import urllib from cStringIO import StringIO -import tempfile import sys from urlparse import urlsplit import warnings @@ -527,8 +526,7 @@ "If you provide conditional you must also provide " "start_response") data = [] - #output = StringIO() - output = tempfile.NamedTemporaryFile(dir='/data/galaxy_dist/database/tmp') + output = StringIO() def replacement_start_response(status, headers, exc_info=None): if conditional is not None and not conditional(status, headers): data.append(None) @@ -551,9 +549,7 @@ data.append(None) if len(data) < 2: data.append(None) - #data.append(output.getvalue()) - output.seek(0) - data.append(output.read()) + data.append(output.getvalue()) return data ## Deprecation warning wrapper: Essentially substituting the cStringIO handler with a temporary file. A temporary file for each galaxy history item I would like to see on UCSC is created, in multiple copies... Still the output is read and never released so memory easily drains... d
d /* Davide Cittaro
Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy
tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
/* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
On May 28, 2010, at 11:14 AM, Davide Cittaro wrote:
Mmm... apparently the Paste egg loads the file to be shown to UCSC in memory (and there's no close()). I see this egg is not developed by GalaxyTeam, so I don't know if this should be issued as a galaxy bug. BTW, I've tried this
[cut] Needless to say, it's pretty useless... the object is kept in memory until galaxy is alive.... d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Hi guys, sorry for this mail flooding, seriously... On May 28, 2010, at 1:59 PM, Davide Cittaro wrote:
[cut]
Needless to say, it's pretty useless... the object is kept in memory until galaxy is alive....
use_printdebug = False solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-( d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Davide Cittaro wrote:
use_printdebug = False
Ah, that was going to be my first question. I suggest just use_debug = False.
solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-(
d
/* Davide Cittaro
Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy
tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it <mailto:davide.cittaro@ifom-ieo-campus.it> */
------------------------------------------------------------------------
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On May 28, 2010, at 2:58 PM, Nate Coraor wrote:
Davide Cittaro wrote:
use_printdebug = False
Ah, that was going to be my first question. I suggest just use_debug = False.
solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-(
I've asked to open the galaxy test server to the UCSC in California... Still get truncated BAM files, at the beginning of chr10... What is nice is that files are truncated in a different manner on our mirror, like it can read some more information before end of communication... Unfortunately there's no log about this "truncation" error... :-( d
d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it <mailto:davide.cittaro@ifom-ieo-campus.it> */ ------------------------------------------------------------------------ _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
/* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Found something weird On May 28, 2010, at 3:05 PM, Davide Cittaro wrote:
On May 28, 2010, at 2:58 PM, Nate Coraor wrote:
Davide Cittaro wrote:
use_printdebug = False
Ah, that was going to be my first question. I suggest just use_debug = False.
solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-(
I've asked to open the galaxy test server to the UCSC in California... Still get truncated BAM files, at the beginning of chr10... What is nice is that files are truncated in a different manner on our mirror, like it can read some more information before end of communication... Unfortunately there's no log about this "truncation" error... :-(
It looks like I'm only able to load the portion of BAM file that is in the UCSC range previously selected. Suppose I have a clean session in UCSC, spanning chr10:1-50,000,000. As I link from history to UCSC a BAM file I can see reads for the same span, nothing more (and, obviously no reads on other chroms). What is strange is that the same doesn't apply on other chromosomes, it seems that galaxy tells the UCSC the content of BAM file from the beginning (chr10 in my sorted case) to the max span available (which is the end of chr10 at max)... It acts as if there is a kind of galaxy cache that is never emptied... Does this make sense to you? Besides, have you ever tried visualizatoin of BAM files when using remoteuser? d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Hi Davide, Galaxy doesn't do anything 'special' here. It provides access to 3 files to UCSC, a BAM file, the BAI index file, and a track definition 'file'. The track definition contains 4 pieces of information, the type of track, the url of the bam file, the dbkey and a name for the track. Check that the track file has valid information (especially the URL) and that the bam, bai and track files are accessible via HTTP. IIRC the UCSC browser does some caching on its end for BAM files by 'filename', so you will likely need to use different history items than the ones that were failing (due to having debug options on) or clear this cache. Can you confirm the odd behavior occurs on history items that did not experience the memory errors? Cloning the history or copying the history items (under edit attributes) should be sufficient. Thanks, Dan On May 28, 2010, at 9:31 AM, Davide Cittaro wrote:
Found something weird
On May 28, 2010, at 3:05 PM, Davide Cittaro wrote:
On May 28, 2010, at 2:58 PM, Nate Coraor wrote:
Davide Cittaro wrote:
use_printdebug = False
Ah, that was going to be my first question. I suggest just use_debug = False.
solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-(
I've asked to open the galaxy test server to the UCSC in California... Still get truncated BAM files, at the beginning of chr10... What is nice is that files are truncated in a different manner on our mirror, like it can read some more information before end of communication... Unfortunately there's no log about this "truncation" error... :-(
It looks like I'm only able to load the portion of BAM file that is in the UCSC range previously selected. Suppose I have a clean session in UCSC, spanning chr10:1-50,000,000. As I link from history to UCSC a BAM file I can see reads for the same span, nothing more (and, obviously no reads on other chroms). What is strange is that the same doesn't apply on other chromosomes, it seems that galaxy tells the UCSC the content of BAM file from the beginning (chr10 in my sorted case) to the max span available (which is the end of chr10 at max)... It acts as if there is a kind of galaxy cache that is never emptied... Does this make sense to you? Besides, have you ever tried visualizatoin of BAM files when using remoteuser?
d
/* Davide Cittaro
Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy
tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On May 28, 2010, at 4:38 PM, Daniel Blankenberg wrote:
Hi Davide,
Galaxy doesn't do anything 'special' here. It provides access to 3 files to UCSC, a BAM file, the BAI index file, and a track definition 'file'. The track definition contains 4 pieces of information, the type of track, the url of the bam file, the dbkey and a name for the track. Check that the track file has valid information (especially the URL) and that the bam, bai and track files are accessible via HTTP.
They are accessible. Indeed I can read the whole bam file with samtools from a remote machine, i.e.: samtools view http://host001.instruments.ifom-ieo-campus.it:8080/display_application/019b2... works perfectly
IIRC the UCSC browser does some caching on its end for BAM files by 'filename', so you will likely need to use different history items than the ones that were failing (due to having debug options on) or clear this cache. Can you confirm the odd behavior occurs on history items that did not experience the memory errors? Cloning the history or copying the history items (under edit attributes) should be sufficient.
I'm going to test on a new history and new BAM files, just to be sure.... wait for it... Nope... doesn't work. I only get reads for chr10 up to 103 Mb (hg18)... d
Thanks,
Dan
On May 28, 2010, at 9:31 AM, Davide Cittaro wrote:
Found something weird
On May 28, 2010, at 3:05 PM, Davide Cittaro wrote:
On May 28, 2010, at 2:58 PM, Nate Coraor wrote:
Davide Cittaro wrote:
use_printdebug = False
Ah, that was going to be my first question. I suggest just use_debug = False.
solves the memory issue (at least doesn't call paste.PrintDebugMiddleware which calls paste.intercept_output....) BTW, still not able to see BAM files... well, actually I can see the reads at the beginning of chromosome 10, which are the reads at the beginning of my BAM file :-(
I've asked to open the galaxy test server to the UCSC in California... Still get truncated BAM files, at the beginning of chr10... What is nice is that files are truncated in a different manner on our mirror, like it can read some more information before end of communication... Unfortunately there's no log about this "truncation" error... :-(
It looks like I'm only able to load the portion of BAM file that is in the UCSC range previously selected. Suppose I have a clean session in UCSC, spanning chr10:1-50,000,000. As I link from history to UCSC a BAM file I can see reads for the same span, nothing more (and, obviously no reads on other chroms). What is strange is that the same doesn't apply on other chromosomes, it seems that galaxy tells the UCSC the content of BAM file from the beginning (chr10 in my sorted case) to the max span available (which is the end of chr10 at max)... It acts as if there is a kind of galaxy cache that is never emptied... Does this make sense to you? Besides, have you ever tried visualizatoin of BAM files when using remoteuser?
d
/* Davide Cittaro
Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy
tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
/* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Hi, I created a metadata ("config") and want to run my tool using that. I am getting this error when I try to use the uploaded file with this new extension . I have specified the command line parameters as : <command interpreter="bash">kelvin_wrapper $confile ${os.path.join( confile.extra_files_path , 'pedfile.txt' )} ${os.path.join( confile.extra_files_path , 'mapfile.txt')} ${os.path.join( confile.extra_files_path , 'frequencyfile.txt' )} ${os.path.join( confile.extra_files_path , 'datafile.txt')} $brfile $pplfile $modfile</command> Error: Traceback (most recent call last): File "/export/home/galaxy/galaxy-dist/lib/galaxy/jobs/runners/sge.py", line 120, in queue_job job_wrapper.prepare() File "/export/home/galaxy/galaxy-dist/lib/galaxy/jobs/__init__.py", line 390, in prepare self.command_line = self.tool.build_command_line( param_dict ) File "/export/home/galaxy/galaxy-dist/lib/galaxy/tools/__init__.py", line 1343, in build_command_line command_line = fill_template( self.command, context=param_dict ) File "/export/home/galaxy/galaxy-dist/lib/galaxy/util/template.py", line 9, in fill_template return str( Template( source=template_text, searchList=[context] ) ) File "/export/home/galaxy/galaxy-dist/eggs/Cheetah-2.2.2-py2.4-linux-x86_64-ucs4.egg/Cheetah/Template.py", line 1004, in __str__ return getattr(self, mainMethName)() File "cheetah_DynamicallyCompiledCheetahTemplate_1275498667_76_83196.py", line 86, in respond NameError: global name 'confile' is not defined Tool execution generated the following error message: failure preparing job Can anybody please help Regards, Amit Modi ----------------------------------------- Confidentiality Notice: The following mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. The recipient is responsible to maintain the confidentiality of this information and to use the information only for authorized purposes. If you are not the intended recipient (or authorized to receive information for the intended recipient), you are hereby notified that any review, use, disclosure, distribution, copying, printing, or action taken in reliance on the contents of this e-mail is strictly prohibited. If you have received this communication in error, please notify us immediately by reply e-mail and destroy all copies of the original message. Thank you.
participants (4)
-
Daniel Blankenberg
-
Davide Cittaro
-
Modi, Amit
-
Nate Coraor