suggestions for the SAM-to-BAM tool
Hi, Couple of things that can be slightly improved in the SAM-to-BAM tool: 1. "Reference list" is not informative (it's the technical way to say: "list of chromosomes and their sizes based on a FASTA file"). Users do not generally know what "reference list" is. 2. The "Locally Cached" option is not informative (I had to look in the source code to understand what it means). What it should say is something like: "Get list of chromosomes/sizes based on the dataset's organism/database" (could be shorter, but should be friendly enough). 3. There's no option of having the chromosome list in the SAM file header. Some SAM files will contain the header (can even be done in the standard bowtie tool wrapper) - saves the need to specify where to get the "reference list" from. 4. Autodetection in the "set-metadata" step will go a long way here: if the SAM file already have a header, then no need to even ask about it. If it doesn't have a header but have a DBKEY, then we're still OK. If no DBKEY and no header, then complain or ask for a FASTA file from current history. (I realize the implementing this feature is hard and annoying, I don't imply that it's easy to do, just that it's needed). 5. Inside the python script (sam_to_bam.py) there's a comment that says: "for some reason the samtools view command gzips the resulting bam file without warning" . Not sure why one cares about that, but "samtools view -u" will output an uncompressed BAM file. 6. samtools support piping, so a lot of I/O (and some time) can be spared by piping the two commands together: samtools view -u -b -S "INPUT.SAM" | samtools sort - OUTPUT Instead of running two commands and generating a temporary unsorted BAM file. -gordon
Hello - We haven't forgotten about these suggestions and still plan on working a least a few of them (Kelly would know more). I decided to put them into a bitbucket ticket for easier tracking: http://bitbucket.org/galaxy/galaxy-central/issue/601 Many thanks, as always, for the feedback! Best, Jen Galaxy team On 4/1/11 11:39 AM, Assaf Gordon wrote:
Hi,
Couple of things that can be slightly improved in the SAM-to-BAM tool: 1. "Reference list" is not informative (it's the technical way to say: "list of chromosomes and their sizes based on a FASTA file"). Users do not generally know what "reference list" is.
2. The "Locally Cached" option is not informative (I had to look in the source code to understand what it means). What it should say is something like: "Get list of chromosomes/sizes based on the dataset's organism/database" (could be shorter, but should be friendly enough).
3. There's no option of having the chromosome list in the SAM file header. Some SAM files will contain the header (can even be done in the standard bowtie tool wrapper) - saves the need to specify where to get the "reference list" from.
4. Autodetection in the "set-metadata" step will go a long way here: if the SAM file already have a header, then no need to even ask about it. If it doesn't have a header but have a DBKEY, then we're still OK. If no DBKEY and no header, then complain or ask for a FASTA file from current history. (I realize the implementing this feature is hard and annoying, I don't imply that it's easy to do, just that it's needed).
5. Inside the python script (sam_to_bam.py) there's a comment that says: "for some reason the samtools view command gzips the resulting bam file without warning" . Not sure why one cares about that, but "samtools view -u" will output an uncompressed BAM file.
6. samtools support piping, so a lot of I/O (and some time) can be spared by piping the two commands together: samtools view -u -b -S "INPUT.SAM" | samtools sort - OUTPUT Instead of running two commands and generating a temporary unsorted BAM file.
-gordon
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
participants (2)
-
Assaf Gordon
-
Jennifer Jackson