Just another example why python's misleadingly simple idioms are quite dangerous in production code (couldn't help myself from teasing about python... sorry about that). Seems like line 150 in "sam_to_bam.py" tries to read the entire BAM file into memory just to find out if it's empty or not... As a stop gap solution with minimal changes, change line 150 from: if len( open( tmp_aligns_file_name ).read() ) == 0: to if len( open( tmp_aligns_file_name ).read(10) ) == 0: Which will read up to the first 10 bytes (instead of the entire file). A slightly better (but still wrong) solution is to simply check the file size, with: if os.path.getsize(tmp_aligns_file_name) == 0: But it's still wrong because even an invalid sam file will create a non-empty BAM file (when using "samtools view -bt") - the BAM file will still contain the chromosome names and sizes. Example: ======== $ cat mm9.fa.fai chr1 197195432 6 50 51 chr10 129993255 201139354 50 51 chr11 121843856 333732482 50 51 chr12 121257530 458013223 50 51 chr13 120284312 581695911 50 51 chr13_random 400311 704385924 50 51 chr14 125194864 704794249 50 51 chr15 103494974 832493018 50 51 ... ... $ cat 1.sam Hello World This is not a SAM file $ samtools view -bt mm9.fa.fai -o 1.bam 1.sam [sam_header_read2] 35 sequences loaded. [sam_read1] reference 'This is not a SAM file' is recognized as '*'. [main_samview] truncated file. $ ls -l 1.* -rw-r--r-- 1 gordon hannon 348 Apr 7 00:57 1.bam -rw-r--r-- 1 gordon hannon 35 Apr 7 00:57 1.sam ======== So in short, this whole sam-to-bam wrapper tool is not suitable for large SAM files (if they don't fit entirely in memory), and not for error checking of invalid SAM files. -gordon On 04/07/2011 12:30 AM, Ryan Golhar wrote:
Here's what I get:
(galaxy_env)[galaxy@vail pbs]$ sh ./big.sh Samtools Version: 0.1.14 (r933:170) Traceback (most recent call last): File "/home/galaxy/galaxy-dist/tools/samtools/sam_to_bam.py", line 150, in __main__ if len( open( tmp_aligns_file_name ).read() ) == 0: MemoryError Error extracting alignments from (/home/galaxy/galaxy-dist/database/files/000/dataset_785.dat), (galaxy_env)[galaxy@vail pbs]$
On 4/6/11 7:29 PM, Assaf Gordon wrote:
Ryan,
Since we're shooting in the dark here, best to try and understand what's the exception.
Add the following line to the beginning of "sam_to_bam.py": import traceback
and add the following line to "sam_to_bam.py" line 156 (before the call to "stop_err"): traceback.print_exc()
Hopefully this will print out which exception you're getting, and where is it thrown from.
-gordon