I want to make an intersection between a few hundreds of genomic intervals (predicted translocation sites from SVDetect) and low mappability regions in genomes (we are working with mm9 right now). UCSC has an excellent mappability track that exactly matches our sequencing data (50 bp kmers), but it seems very difficult to get that data into Galaxy. I want a BED format that summarizes intervals of low mappability (ie. less than 0.5 on the scale used by UCSC). The UCSC Table Browser has a limit of 10M lines, which seems to give just part of chromosome 1. It will be very messy to try to get the whole genome bit by bit using this method and then stitch it back together using some sort of concatenation. UCSC Help suggests downloading the mappability data for the whole genome as a bigwig formatted file, then convert to BED. I gave this a try, but we get a 4 GB file, with intervals of just one or two base pairs. Again, lots of work to get back to the nicer BED that I could make with the UCSC tools over smaller genomic regions. Also, super-painful to upload this huge file to Galaxy, and unhappy trying to write my own parsers to filter and smooth this file. Any other suggestions? Maybe someone else knows where to find a mappability file (for mm9) that has nice intervals in a Galaxy compatible format. —Stuart Brown
Hi Stuart, If you are able to rsync the Mapability bigWig file from the UCSC downloads server and covert to BED using their compiled tools (also available on same server), then the rest should be fairly straightforward. 1 - Load the data into Galaxy using FTP: http://wiki.g2.bx.psu.edu/FTPUpload 2 - Merge the fragmented intervals into ranges that better suit your needs with Galaxy tools in the group "Operate on Genomic Intervals", in particular see the "Merge" and "Cluster" tools. This data is large, but the only way to determine if it is too large to run on the public main instance is to try. If you end up with a memory error, then moving to a local or cloud instance would be the recommendation. Full instructions are here: http://usegalaxy.org Hopefully this simplifies the process for you! Best, Jen Galaxy team On 5/1/12 9:02 AM, Brown, Stuart wrote:
I want to make an intersection between a few hundreds of genomic intervals (predicted translocation sites from SVDetect) and low mappability regions in genomes (we are working with mm9 right now).
UCSC has an excellent mappability track that exactly matches our sequencing data (50 bp kmers), but it seems very difficult to get that data into Galaxy. I want a BED format that summarizes intervals of low mappability (ie. less than 0.5 on the scale used by UCSC). The UCSC Table Browser has a limit of 10M lines, which seems to give just part of chromosome 1. It will be very messy to try to get the whole genome bit by bit using this method and then stitch it back together using some sort of concatenation.
UCSC Help suggests downloading the mappability data for the whole genome as a bigwig formatted file, then convert to BED. I gave this a try, but we get a 4 GB file, with intervals of just one or two base pairs. Again, lots of work to get back to the nicer BED that I could make with the UCSC tools over smaller genomic regions. Also, super-painful to upload this huge file to Galaxy, and unhappy trying to write my own parsers to filter and smooth this file.
Any other suggestions? Maybe someone else knows where to find a mappability file (for mm9) that has nice intervals in a Galaxy compatible format.
—Stuart Brown
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Brown, Stuart
-
Jennifer Jackson