Dear Galaxy Help,
I was wondering if it would be possible to get the coordinates that caused the merge as the output from "Tools: Operate on Genomic Intervals: Merge the overlapping intervals of a query", rather than the entire merged interval as the output. Kind of like the output from "Intersect: Overlapping Pieces of intervals" option, which returns the exact base pair overlap between two queries. It might be helpful in some cases to see only the coordinates that caused the merge. From my limited Galaxy knowledge, by using the "Intersect" option and comparing a file to itself, the output would also include those complete overlaps of interval_1 in file1 to it's copy interval_1 in file2. If there is already a way to get just the coordinates that caused the merge, I would be interested to learn more.
Thanks again for your help! - Erika
********************************************************** E.M. Kvikstad Academic Computing Fellow IGDP Genetics Center for Comparative Genomics and Bioinformatics The Pennsylvania State University 208 Mueller Lab University Park, PA 16802 (814) 863-2185 kvik@bx.psu.edu
Erika,
Cluster, using a distance of 0, does the exact same thing as merge. However, you can specify a minimum number of intervals per cluster (2 ensures you're only grabbing merging intervals). Maximum distance can be set to a negative number, which the forces overlap (-1 forces 1 bp of overlap). You can also tweak your output to either merge, group (clustered intervals will be grouped together) or preserve the original ordering of the file.
I think that is what you are trying to do.
The other possibility is that you want to capture the overlapping regions of intervals within the same file. When two intervals are merged, they might not actually have any overlap. They only need to be touching, as in [a,b),[b,c) would be merged to [a,c). The overlapping interval there is [b,b), which doesn't really make sense (the length of that interval is 0).
I can easily write a tool to find regions that are referenced more than once (i.e. overlap with other intervals in the same file). However, this will not include that one case where two intervals are merged because they are next to each other.
I hope this helps,
_Ian
Erika wrote:
Dear Galaxy Help,
I was wondering if it would be possible to get the coordinates that caused the merge as the output from "Tools: Operate on Genomic Intervals: Merge the overlapping intervals of a query", rather than the entire merged interval as the output. Kind of like the output from "Intersect: Overlapping Pieces of intervals" option, which returns the exact base pair overlap between two queries. It might be helpful in some cases to see only the coordinates that caused the merge. From my limited Galaxy knowledge, by using the "Intersect" option and comparing a file to itself, the output would also include those complete overlaps of interval_1 in file1 to it's copy interval_1 in file2. If there is already a way to get just the coordinates that caused the merge, I would be interested to learn more.
Thanks again for your help!
- Erika
E.M. Kvikstad Academic Computing Fellow IGDP Genetics Center for Comparative Genomics and Bioinformatics The Pennsylvania State University 208 Mueller Lab University Park, PA 16802 (814) 863-2185 kvik@bx.psu.edu mailto:kvik@bx.psu.edu
Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
galaxy-user@lists.galaxyproject.org