Lance and Peter; Peter, thanks for noticing the problem and duplicate tools. Lance, I'm happy to merge these so there are not two different versions out there. I prefer your use for genomeCoverageBed over my custom hacks. That's a nice approach I totally missed. I avoid the need for the sam indexes by creating the file directly from the information in the BAM header. I don't think there is any way around creating it since it's required by the UCSC tools as well, but everything you need is in the BAM header. There might be a sneaky way to do this with samtools -H and awk but I'm not nearly skilled enough to pull that out. Let me know what you think. I can also update my python wrapper script to use the genomeCoverageBed approach instead if you think that's easier. Brad
Hi Peter,
Thanks for the thoughtful comments. I believe the requirement for the genome was imposed by the use of an underlying BedTools utility. I also think that in a newer version of that tool, the requirement was removed, since you correctly point out it is not really necessary.
I will see if I can update the tool to remove that requirement and also see about changing the tool id. Sorry for the conflict, that was an oversight on my part, though it would be nice if the Tool Shed could check and warn when someone tries to create a new tool. I would suggest flagging the new repo as invalid until the id is updated instead of outright rejection.
As for the author info, you're right, I should really add that as well. That tool was put together very quickly to meet the need of a customer and I didn't properly clean things up before I uploaded. I'll let you know once I get an update out. Of course, any patches etc. are welcome. ;-)
Lance
Peter Cock wrote:
Hi Brad& Lance,
I've been using Brad's bam_to_bigwig tool in Galaxy but realized today (with a new dataset using a splice-aware mapper) that it doesn't seem to be ignoring CIGAR N operators where a read is split over an intron. Looking over Brad's Python script which calculates the coverage to write an intermediate wiggle file, this is done with the samtools via pysam. It is not obvious to me if this can be easily modified to ignore introns. Is this possible Brad?
I wasn't aware of Lance's rival bam_to_bigwig tool in the ToolShed till now, and that does talk about this issue. It has a boolean option to ignore gaps when computing coverage, recommended for RNA-Seq where reads are mapped across long splice junctions.
Lance, from your tool's help it sounds like it needs a genome database build filled in. I don't understand this requirement - Brad's tool works just fine for standalone BAM files (for example reads mapped to an in house assembly). Is that not supported in your tool?
Galaxy team - why does the ToolShed allow duplicate repository names (here bam_to_bigwig) AND duplicate tool IDs (again, here bam_to_bigwig)? Won't this cause chaos when sharing workflows? I would suggest checking this when a tool is uploaded and rejecting repository name or tool ID clashes.
Regards,
Peter
P.S.
Brad, your tool is missing an explicit<requirements> tag listing the UCSC binary wigToBigWig, and the Python library pysam.
Lance, your tool doesn't seem to include any author information like your name or email address. I'm inferring it is yours from the Galaxy tool shed user id, lparsons.
-- Lance Parsons - Scientific Programmer 134 Carl C. Icahn Laboratory Lewis-Sigler Institute for Integrative Genomics Princeton University