Hello:
let me start by saying that I am very impressed by the service the
Galaxy web server provides to the community; it has proven very useful
for my work.
Today I came across a situation that puzzles me. I am trying to merge
exons corresponding to the same gene (but possibly from different
splice variants).
At the bottom of this email I am listing, as an example, the 153 exons
that are related to the different splice variants of FlyBase gene
CG32491 (obtained by the pattern matching (tool "Select lines that
match an expression" and pattern .+CG32491-. ) applied to the data
set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am
using bed format and the general Galaxy web server.
If I now apply the "Merge" tool to the intervals, I obtain 26
intervals (listed further below). Now applying the "subtract" tool to
the original 153 exons results in 8 "leftover" regions that I did not
expect. Somehow they seem to be missing in the merge result.
I then deactivated the strand information in the interval set of 153
exons. Applying the merge tool now results in 34 intervals (again
listed below). Checking the result via the subtract tool (subtracting
the merge result from the original data set of 153 exons) results, as
expected, in zero intervals.
So my questions are:
- is this the intended functionality of the tools? Maybe one can add
statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the
"strand" column is set or not? The original set of intervals all had
the same negative strand orientation, so it appears to me that the
merge operation should give the same result in both cases.
- subtracting the merged intervals (that do not have strand
information) from the set of 153 intervals results in 8 strands that
now have positive strand orientation (they originally had negative
strand orientation). Why does subtracting a set of intervals without
strand information from a set of intervals with strand information
change the strand orientation of the first set?
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor)
SAIC-Frederick, Inc.
Center for Cancer Research Nanobiology Program
National Cancer Institute
P.O. Box B
Frederick, MD 21702 USA
Phone: 301-846-5538
Fax: 301-846-5598
E-mail: eckart(a)mail.nih.gov
Here is the result (34 regions) of the merge operation (not using
strand orientation) applied to the 153 exon regions listed further
below ;
chr3R 17177330 17177608
chr3R 17177760 17178959
chr3R 17179070 17179456
chr3R 17179617 17180053
chr3R 17180159 17180416
chr3R 17180695 17181279
chr3R 17181479 17181973
chr3R 17182071 17182426
chr3R 17182532 17182690
chr3R 17182776 17183086
chr3R 17183242 17183480
chr3R 17183726 17183926
chr3R 17184011 17184791
chr3R 17186111 17186276
chr3R 17186349 17187009
chr3R 17187119 17187332
chr3R 17187391 17187860
chr3R 17187909 17188590
chr3R 17188688 17189606
chr3R 17189739 17190097
chr3R 17190173 17190367
chr3R 17190435 17190714
chr3R 17191725 17192060
chr3R 17192171 17192466
chr3R 17193631 17193960
chr3R 17194101 17194784
chr3R 17195183 17196364
chr3R 17196654 17196949
chr3R 17197044 17197789
chr3R 17197884 17198802
chr3R 17200781 17201634
chr3R 17202323 17202463
chr3R 17202540 17202798
chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand
orientation) applied to the 153 exon regions listed further below ;
chr3R 17177330 17177608
chr3R 17177760 17178959
chr3R 17179070 17179456
chr3R 17179617 17180053
chr3R 17180159 17180416
chr3R 17180695 17181279
chr3R 17181479 17181973
chr3R 17182071 17182426
chr3R 17182532 17182690
chr3R 17182776 17183086
chr3R 17183242 17183480
chr3R 17183726 17183926
chr3R 17184011 17184791
chr3R 17187909 17188590
chr3R 17188688 17189606
chr3R 17189739 17190097
chr3R 17190173 17190367
chr3R 17190435 17190714
chr3R 17195821 17196364
chr3R 17196654 17196949
chr3R 17197044 17197789
chr3R 17197884 17198802
chr3R 17200781 17201634
chr3R 17202323 17202463
chr3R 17202540 17202798
chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do
not intersect with the result of the 26 merged regions (result of
subtract tool of 153 exons that do not overlap with 26 merged exons;
note the change strand orientation):
chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 +
chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 +
chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 +
chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 +
chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 +
chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 +
chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 +
chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the
pattern matching (tool "Select lines that match an expression" and
pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons
(110,472 exons):
chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 -
chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 -
chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 -
chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 -
chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 -
chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 -
chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 -
chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 -
chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 -
chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 -
chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 -
chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 -
chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 -
chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 -
chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 -
chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 -
chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 -
chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 -
chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 -
chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 -
chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 -
chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 -
chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 -
chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 -
chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 -
chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 -
chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 -
chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 -
chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 -
chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 -
chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 -
chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 .
chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 .
chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 .
chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 .
chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 .
chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 .
chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 -
chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 -
chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 -
chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 -
chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 -
chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 -
chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 -
chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 -
chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 -
chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 .
chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 .
chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 .
chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 .
chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 .
chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 .
chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 .
chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 .
chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 .
chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 .
chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 .
chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 .
chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 .
chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 -
chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 -
chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 -
chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 -
chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 -
chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 -
chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 -
chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 -
chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 -
chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 -
chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -