Cluster, using a distance of 0, does the exact same thing as merge.
However, you can specify a minimum number of intervals per cluster (2
ensures you're only grabbing merging intervals). Maximum distance can
be set to a negative number, which the forces overlap (-1 forces 1 bp of
overlap). You can also tweak your output to either merge, group
(clustered intervals will be grouped together) or preserve the original
ordering of the file.
I think that is what you are trying to do.
The other possibility is that you want to capture the overlapping
regions of intervals within the same file. When two intervals are
merged, they might not actually have any overlap. They only need to be
touching, as in [a,b),[b,c) would be merged to [a,c). The overlapping
interval there is [b,b), which doesn't really make sense (the length of
that interval is 0).
I can easily write a tool to find regions that are referenced more than
once (i.e. overlap with other intervals in the same file). However,
this will not include that one case where two intervals are merged
because they are next to each other.
I hope this helps,
Dear Galaxy Help,
I was wondering if it would be possible to get the coordinates that
caused the merge as the output from "Tools: Operate on Genomic
Intervals: Merge the overlapping intervals of a query", rather than
the entire merged interval as the output. Kind of like the output
from "Intersect: Overlapping Pieces of intervals" option, which
returns the exact base pair overlap between two queries. It might be
helpful in some cases to see only the coordinates that caused the
merge. From my limited Galaxy knowledge, by using the "Intersect"
option and comparing a file to itself, the output would also include
those complete overlaps of interval_1 in file1 to it's copy interval_1
in file2. If there is already a way to get just the coordinates that
caused the merge, I would be interested to learn more.
Thanks again for your help!
Academic Computing Fellow
Center for Comparative Genomics and Bioinformatics
The Pennsylvania State University
208 Mueller Lab
University Park, PA 16802
Galaxy-user mailing list