Merge tool and strand orientation
Hello: let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals. So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation. - why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. - subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? Any comments are highly appreciated! Thanks, Eckart Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121 Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121 Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 + Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
Hello:
let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals.
So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. if strand is not set, then (+) strand is assumed. if strand is set, then
- subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? It is best to have the file types be the same or unexpected results can be produced. Hopefully the wiki can help you create a query that will
Hello Eckart, It may be helpful to review the help for the Interval tools (includes Merge): http://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc quote from wiki/GopsDesc help: "Merge reads a dataset, and combines all overlapping intervals into single intervals. When merging intervals, all columns besides chromosome, start, and end are lost. When two intervals are combined into one, it is ambiguous what the other columns represent or which field should be carried over to the resulting interval. For this reason, all columns except for chromosome, start and end are omitted from the output." The output coordinates are based on the positive strand as the default. This is the common format for BED, Interval and many other datatypes (but not all!). Apologies for the late reply. Please see the inline comments below to your specific questions. Best, Jen Galaxy team On 11/17/10 9:28 AM, Eckart Bindewald wrote: please see the wiki help link above that will be used (in your case: (-)). In either case, the result is transformed into (+) coordinates. This is why you are getting different results. produce the desired result.
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
Hello Jen: thank you for the detailed reply. I am, however, not yet convinced that everything is working as it should be. To make it easier for you and others to reproduce the discrepancy, I prepared two shared Galaxy histories: http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-dm... and http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-hg... In the first case, I obtain 110,472 drosophila (dm3) exons annotated by Flybase from the UCSC site (UCSC table flyBaseGene). Running merge on that data set, one obtains 59,228 regions (data set 2 in that history). Data set 3 in that history is a copy of data set 2, with the strand column deactivated. Running merge now on data set 3, one obtains 59,236 regions. There is a discrepancy of 8 regions, that, for some reason, are not appearing in the result of the merge operation if the strand information is used. The discrepancy in the form of 8 regions is shown in data set 7 in that history, obtained using the subtract tool. I re-ran that procedure using exons of human genes in the second history mentioned above (UCSC table UCSC Genes - knownGene). In that case both versions of the merge operation give the exact same result, as it should be. According to the tool documentation, the strand information is being ignored in the merge tool. So there should not be a difference as shown in the case of the drosophila exons. Please try to reproduce these steps. I found similar differences for the coverage tool and the subtract tool, depending on wether strand information is activated or not. The fact that I only find these differences in Drosophila exons and not other data sets that I have tested, makes it seem likely to me that there is an issue with the way that particular data set is internally represented. Thanks, Eckart On Dec 1, 2010, at 1:04 PM, Jennifer Jackson wrote:
Hello Eckart,
It may be helpful to review the help for the Interval tools (includes Merge): http://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc
quote from wiki/GopsDesc help: "Merge reads a dataset, and combines all overlapping intervals into single intervals. When merging intervals, all columns besides chromosome, start, and end are lost. When two intervals are combined into one, it is ambiguous what the other columns represent or which field should be carried over to the resulting interval. For this reason, all columns except for chromosome, start and end are omitted from the output."
The output coordinates are based on the positive strand as the default. This is the common format for BED, Interval and many other datatypes (but not all!).
Apologies for the late reply. Please see the inline comments below to your specific questions.
Best,
Jen Galaxy team
Hello:
let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals.
So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. if strand is not set, then (+) strand is assumed. if strand is set,
- subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? It is best to have the file types be the same or unexpected results can be produced. Hopefully the wiki can help you create a query that will
On 11/17/10 9:28 AM, Eckart Bindewald wrote: please see the wiki help link above then that will be used (in your case: (-)). In either case, the result is transformed into (+) coordinates. This is why you are getting different results. produce the desired result.
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
On 12/01/2010 08:56 PM, Eckart Bindewald wrote:
Hello Jen:
thank you for the detailed reply. I am, however, not yet convinced that everything is working as it should be. To make it easier for you and others to reproduce the discrepancy, I prepared two shared Galaxy histories: http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-dm...
and http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-hg...
In the first case, I obtain 110,472 drosophila (dm3) exons annotated by Flybase from the UCSC site (UCSC table flyBaseGene). Running merge on that data set, one obtains 59,228 regions (data set 2 in that history). Data set 3 in that history is a copy of data set 2, with the strand column deactivated. Running merge now on data set 3, one obtains 59,236 regions. There is a discrepancy of 8 regions, that, for some reason, are not appearing in the result of the merge operation if the strand information is used. The discrepancy in the form of 8 regions is shown in data set 7 in that history, obtained using the subtract tool.
I re-ran that procedure using exons of human genes in the second history mentioned above (UCSC table UCSC Genes - knownGene). In that case both versions of the merge operation give the exact same result, as it should be. According to the tool documentation, the strand information is being ignored in the merge tool. So there should not be a difference as shown in the case of the drosophila exons.
Please try to reproduce these steps. I found similar differences for the coverage tool and the subtract tool, depending on wether strand information is activated or not. The fact that I only find these differences in Drosophila exons and not other data sets that I have tested, makes it seem likely to me that there is an issue with the way that particular data set is internally represented.
Hi Eckart It is the Drosophila annotation, which contains a few exons with strand "." on chromosome 3R. Obviously, if strand information is activated, then those lines are ignored completely see: http://main.g2.bx.psu.edu/u/hrh/h/merge-problem-when-strand- Regards, Hans
Thanks,
Eckart
On Dec 1, 2010, at 1:04 PM, Jennifer Jackson wrote:
Hello Eckart,
It may be helpful to review the help for the Interval tools (includes Merge): http://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc
quote from wiki/GopsDesc help: "Merge reads a dataset, and combines all overlapping intervals into single intervals. When merging intervals, all columns besides chromosome, start, and end are lost. When two intervals are combined into one, it is ambiguous what the other columns represent or which field should be carried over to the resulting interval. For this reason, all columns except for chromosome, start and end are omitted from the output."
The output coordinates are based on the positive strand as the default. This is the common format for BED, Interval and many other datatypes (but not all!).
Apologies for the late reply. Please see the inline comments below to your specific questions.
Best,
Jen Galaxy team
Hello:
let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals.
So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. if strand is not set, then (+) strand is assumed. if strand is set, then
- subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? It is best to have the file types be the same or unexpected results can be produced. Hopefully the wiki can help you create a query that will
On 11/17/10 9:28 AM, Eckart Bindewald wrote: please see the wiki help link above that will be used (in your case: (-)). In either case, the result is transformed into (+) coordinates. This is why you are getting different results. produce the desired result.
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Hans: Ah, that explains it! Thank you! In other words, there is an issue that the strand information is missing for a few of the regions in this particular data set. Two comments: 1. I propose generating part of the Galaxy wiki site that is related to "Known issues related to data sets" What I stumbled upon is arguably not a Galaxy bug, but a potential problem related to this data set. If one does not have to sift through archived forum mails but would collect data-set related issues in a wiki (or another curated form) then it would save people working with such data sets a lot of time and prevent a lot of confusion. 2. Since the strand information is ultimately not used for the merge tool (and other tools), I do no find it obvious at all, that regions for which strand information is missing are being ignored. Granted, it is defensible. But issues like that should be mentioned in the tool documentation. Anyways, thanks for the comments, Eckart On Dec 2, 2010, at 4:35 AM, Hans-Rudolf Hotz wrote:
On 12/01/2010 08:56 PM, Eckart Bindewald wrote:
Hello Jen:
thank you for the detailed reply. I am, however, not yet convinced that everything is working as it should be. To make it easier for you and others to reproduce the discrepancy, I prepared two shared Galaxy histories: http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-dm...
and http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-hg...
In the first case, I obtain 110,472 drosophila (dm3) exons annotated by Flybase from the UCSC site (UCSC table flyBaseGene). Running merge on that data set, one obtains 59,228 regions (data set 2 in that history). Data set 3 in that history is a copy of data set 2, with the strand column deactivated. Running merge now on data set 3, one obtains 59,236 regions. There is a discrepancy of 8 regions, that, for some reason, are not appearing in the result of the merge operation if the strand information is used. The discrepancy in the form of 8 regions is shown in data set 7 in that history, obtained using the subtract tool.
I re-ran that procedure using exons of human genes in the second history mentioned above (UCSC table UCSC Genes - knownGene). In that case both versions of the merge operation give the exact same result, as it should be. According to the tool documentation, the strand information is being ignored in the merge tool. So there should not be a difference as shown in the case of the drosophila exons.
Please try to reproduce these steps. I found similar differences for the coverage tool and the subtract tool, depending on wether strand information is activated or not. The fact that I only find these differences in Drosophila exons and not other data sets that I have tested, makes it seem likely to me that there is an issue with the way that particular data set is internally represented.
Hi Eckart
It is the Drosophila annotation, which contains a few exons with strand "." on chromosome 3R.
Obviously, if strand information is activated, then those lines are ignored completely
see: http://main.g2.bx.psu.edu/u/hrh/h/merge-problem-when-strand-
Regards, Hans
Thanks,
Eckart
On Dec 1, 2010, at 1:04 PM, Jennifer Jackson wrote:
Hello Eckart,
It may be helpful to review the help for the Interval tools (includes Merge): http://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc
quote from wiki/GopsDesc help: "Merge reads a dataset, and combines all overlapping intervals into single intervals. When merging intervals, all columns besides chromosome, start, and end are lost. When two intervals are combined into one, it is ambiguous what the other columns represent or which field should be carried over to the resulting interval. For this reason, all columns except for chromosome, start and end are omitted from the output."
The output coordinates are based on the positive strand as the default. This is the common format for BED, Interval and many other datatypes (but not all!).
Apologies for the late reply. Please see the inline comments below to your specific questions.
Best,
Jen Galaxy team
Hello:
let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals.
So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. if strand is not set, then (+) strand is assumed. if strand is set, then
- subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? It is best to have the file types be the same or unexpected results can be produced. Hopefully the wiki can help you create a query that will
On 11/17/10 9:28 AM, Eckart Bindewald wrote: please see the wiki help link above that will be used (in your case: (-)). In either case, the result is transformed into (+) coordinates. This is why you are getting different results. produce the desired result.
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Thanks Eckart for helping to identify the root cause of Han's data/tool usage discrepancy! And thanks Hans for the suggestions - we completely agree that the tool documentation could be improved and are going to work on that. For the data from UCSC itself, the use of a dot "." in the strand field is a legal, although undocumented, placeholder value to be interpreted as "undefined". The meaning of this value and how Galaxy's tools use data with it will be clarified. Warm regards, Jen Galaxy team On 12/2/10 9:00 AM, Eckart Bindewald wrote:
Hans:
Ah, that explains it! Thank you! In other words, there is an issue that the strand information is missing for a few of the regions in this particular data set.
Two comments:
1. I propose generating part of the Galaxy wiki site that is related to "Known issues related to data sets" What I stumbled upon is arguably not a Galaxy bug, but a potential problem related to this data set. If one does not have to sift through archived forum mails but would collect data-set related issues in a wiki (or another curated form) then it would save people working with such data sets a lot of time and prevent a lot of confusion.
2. Since the strand information is ultimately not used for the merge tool (and other tools), I do no find it obvious at all, that regions for which strand information is missing are being ignored. Granted, it is defensible. But issues like that should be mentioned in the tool documentation.
Anyways, thanks for the comments,
Eckart
On Dec 2, 2010, at 4:35 AM, Hans-Rudolf Hotz wrote:
On 12/01/2010 08:56 PM, Eckart Bindewald wrote:
Hello Jen:
thank you for the detailed reply. I am, however, not yet convinced that everything is working as it should be. To make it easier for you and others to reproduce the discrepancy, I prepared two shared Galaxy histories: http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-dm...
and http://main.g2.bx.psu.edu/u/Eckart/h/merging-with-and-without-strand-info-hg...
In the first case, I obtain 110,472 drosophila (dm3) exons annotated by Flybase from the UCSC site (UCSC table flyBaseGene). Running merge on that data set, one obtains 59,228 regions (data set 2 in that history). Data set 3 in that history is a copy of data set 2, with the strand column deactivated. Running merge now on data set 3, one obtains 59,236 regions. There is a discrepancy of 8 regions, that, for some reason, are not appearing in the result of the merge operation if the strand information is used. The discrepancy in the form of 8 regions is shown in data set 7 in that history, obtained using the subtract tool.
I re-ran that procedure using exons of human genes in the second history mentioned above (UCSC table UCSC Genes - knownGene). In that case both versions of the merge operation give the exact same result, as it should be. According to the tool documentation, the strand information is being ignored in the merge tool. So there should not be a difference as shown in the case of the drosophila exons.
Please try to reproduce these steps. I found similar differences for the coverage tool and the subtract tool, depending on wether strand information is activated or not. The fact that I only find these differences in Drosophila exons and not other data sets that I have tested, makes it seem likely to me that there is an issue with the way that particular data set is internally represented.
Hi Eckart
It is the Drosophila annotation, which contains a few exons with strand "." on chromosome 3R.
Obviously, if strand information is activated, then those lines are ignored completely
see: http://main.g2.bx.psu.edu/u/hrh/h/merge-problem-when-strand-
Regards, Hans
Thanks,
Eckart
On Dec 1, 2010, at 1:04 PM, Jennifer Jackson wrote:
Hello Eckart,
It may be helpful to review the help for the Interval tools (includes Merge): http://bitbucket.org/galaxy/galaxy-central/wiki/GopsDesc
quote from wiki/GopsDesc help: "Merge reads a dataset, and combines all overlapping intervals into single intervals. When merging intervals, all columns besides chromosome, start, and end are lost. When two intervals are combined into one, it is ambiguous what the other columns represent or which field should be carried over to the resulting interval. For this reason, all columns except for chromosome, start and end are omitted from the output."
The output coordinates are based on the positive strand as the default. This is the common format for BED, Interval and many other datatypes (but not all!).
Apologies for the late reply. Please see the inline comments below to your specific questions.
Best,
Jen Galaxy team
Hello:
let me start by saying that I am very impressed by the service the Galaxy web server provides to the community; it has proven very useful for my work. Today I came across a situation that puzzles me. I am trying to merge exons corresponding to the same gene (but possibly from different splice variants). At the bottom of this email I am listing, as an example, the 153 exons that are related to the different splice variants of FlyBase gene CG32491 (obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons, genome assembly dm3). I am using bed format and the general Galaxy web server. If I now apply the "Merge" tool to the intervals, I obtain 26 intervals (listed further below). Now applying the "subtract" tool to the original 153 exons results in 8 "leftover" regions that I did not expect. Somehow they seem to be missing in the merge result. I then deactivated the strand information in the interval set of 153 exons. Applying the merge tool now results in 34 intervals (again listed below). Checking the result via the subtract tool (subtracting the merge result from the original data set of 153 exons) results, as expected, in zero intervals.
So my questions are: - is this the intended functionality of the tools? Maybe one can add statements regarding these issues in the tool documentation.
- why does the outcome of the merge operation depend on whether the "strand" column is set or not? The original set of intervals all had the same negative strand orientation, so it appears to me that the merge operation should give the same result in both cases. if strand is not set, then (+) strand is assumed. if strand is set,
- subtracting the merged intervals (that do not have strand information) from the set of 153 intervals results in 8 strands that now have positive strand orientation (they originally had negative strand orientation). Why does subtracting a set of intervals without strand information from a set of intervals with strand information change the strand orientation of the first set? It is best to have the file types be the same or unexpected results can be produced. Hopefully the wiki can help you create a query that will
On 11/17/10 9:28 AM, Eckart Bindewald wrote: please see the wiki help link above then that will be used (in your case: (-)). In either case, the result is transformed into (+) coordinates. This is why you are getting different results. produce the desired result.
Any comments are highly appreciated!
Thanks,
Eckart
Dr. Eckart Bindewald (Contractor) SAIC-Frederick, Inc. Center for Cancer Research Nanobiology Program National Cancer Institute P.O. Box B Frederick, MD 21702 USA Phone: 301-846-5538 Fax: 301-846-5598 E-mail: eckart@mail.nih.gov
Here is the result (34 regions) of the merge operation (not using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17186111 17186276 chr3R 17186349 17187009 chr3R 17187119 17187332 chr3R 17187391 17187860 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17191725 17192060 chr3R 17192171 17192466 chr3R 17193631 17193960 chr3R 17194101 17194784 chr3R 17195183 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here is the result (26 regions) of the merge operation (using strand orientation) applied to the 153 exon regions listed further below ; chr3R 17177330 17177608 chr3R 17177760 17178959 chr3R 17179070 17179456 chr3R 17179617 17180053 chr3R 17180159 17180416 chr3R 17180695 17181279 chr3R 17181479 17181973 chr3R 17182071 17182426 chr3R 17182532 17182690 chr3R 17182776 17183086 chr3R 17183242 17183480 chr3R 17183726 17183926 chr3R 17184011 17184791 chr3R 17187909 17188590 chr3R 17188688 17189606 chr3R 17189739 17190097 chr3R 17190173 17190367 chr3R 17190435 17190714 chr3R 17195821 17196364 chr3R 17196654 17196949 chr3R 17197044 17197789 chr3R 17197884 17198802 chr3R 17200781 17201634 chr3R 17202323 17202463 chr3R 17202540 17202798 chr3R 17203009 17203121
Here are the 8 "leftover" regions from the original 153 exons that do not intersect with the result of the 26 merged regions (result of subtract tool of 153 exons that do not overlap with 26 merged exons; note the change strand orientation): chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 + chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 + chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 + chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 + chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 + chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 + chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 + chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 +
Here are the 153 exons related to FlyBase gene CG32491 obtained by the pattern matching (tool "Select lines that match an expression" and pattern .+CG32491-. ) applied to the data set of FlyBaseGene exons (110,472 exons): chr3R 17177330 17177608 CG32491-RR_exon_0_0_chr3R_17177331_r 0 - chr3R 17200781 17201634 CG32491-RR_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RR_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RR_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RR_exon_4_0_chr3R_17203010_r 0 - chr3R 17177760 17178358 CG32491-RA_exon_0_0_chr3R_17177761_r 0 - chr3R 17200781 17201634 CG32491-RA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RA_exon_4_0_chr3R_17203010_r 0 - chr3R 17178092 17178959 CG32491-RF_exon_0_0_chr3R_17178093_r 0 - chr3R 17200781 17201634 CG32491-RF_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RF_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RF_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RF_exon_4_0_chr3R_17203010_r 0 - chr3R 17179070 17179456 CG32491-RD_exon_0_0_chr3R_17179071_r 0 - chr3R 17200781 17201634 CG32491-RD_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RD_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RD_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RD_exon_4_0_chr3R_17203010_r 0 - chr3R 17179617 17180053 CG32491-RAC_exon_0_0_chr3R_17179618_r 0 - chr3R 17200781 17201634 CG32491-RAC_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAC_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAC_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAC_exon_4_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RG_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17180811 CG32491-RG_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RG_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RG_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RG_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RG_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RH_exon_0_0_chr3R_17180160_r 0 - chr3R 17180695 17181279 CG32491-RH_exon_1_0_chr3R_17180696_r 0 - chr3R 17200781 17201634 CG32491-RH_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RH_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RH_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RH_exon_5_0_chr3R_17203010_r 0 - chr3R 17180159 17180416 CG32491-RQ_exon_0_0_chr3R_17180160_r 0 - chr3R 17200781 17201634 CG32491-RQ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RQ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RQ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RQ_exon_4_0_chr3R_17203010_r 0 - chr3R 17180941 17181279 CG32491-RB_exon_0_0_chr3R_17180942_r 0 - chr3R 17181479 17181973 CG32491-RB_exon_1_0_chr3R_17181480_r 0 - chr3R 17200781 17201634 CG32491-RB_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RB_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RB_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RB_exon_5_0_chr3R_17203010_r 0 - chr3R 17182071 17182426 CG32491-RI_exon_0_0_chr3R_17182072_r 0 - chr3R 17182532 17182690 CG32491-RI_exon_1_0_chr3R_17182533_r 0 - chr3R 17200781 17201634 CG32491-RI_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RI_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RI_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RI_exon_5_0_chr3R_17203010_r 0 - chr3R 17182776 17183086 CG32491-RJ_exon_0_0_chr3R_17182777_r 0 - chr3R 17200781 17201634 CG32491-RJ_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RJ_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RJ_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RJ_exon_4_0_chr3R_17203010_r 0 - chr3R 17183242 17183480 CG32491-RP_exon_0_0_chr3R_17183243_r 0 - chr3R 17183726 17183926 CG32491-RP_exon_1_0_chr3R_17183727_r 0 - chr3R 17200781 17201634 CG32491-RP_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RP_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RP_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RP_exon_5_0_chr3R_17203010_r 0 - chr3R 17184011 17184791 CG32491-RK_exon_0_0_chr3R_17184012_r 0 - chr3R 17200781 17201634 CG32491-RK_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RK_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RK_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RK_exon_4_0_chr3R_17203010_r 0 - chr3R 17184021 17184318 CG32491-RL_exon_0_0_chr3R_17184022_r 0 - chr3R 17200781 17201634 CG32491-RL_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RL_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RL_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RL_exon_4_0_chr3R_17203010_r 0 - chr3R 17186111 17186276 CG32491-RT_exon_0_0_chr3R_17186112_f 0 . chr3R 17186349 17187009 CG32491-RT_exon_1_0_chr3R_17186350_f 0 . chr3R 17200781 17201634 CG32491-RT_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RT_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RT_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RT_exon_5_0_chr3R_17203010_f 0 . chr3R 17187119 17187332 CG32491-RZ_exon_0_0_chr3R_17187120_f 0 . chr3R 17187391 17187860 CG32491-RZ_exon_1_0_chr3R_17187392_f 0 . chr3R 17200781 17201634 CG32491-RZ_exon_2_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RZ_exon_3_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RZ_exon_4_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RZ_exon_5_0_chr3R_17203010_f 0 . chr3R 17187909 17188590 CG32491-RM_exon_0_0_chr3R_17187910_r 0 - chr3R 17200781 17201634 CG32491-RM_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RM_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RM_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RM_exon_4_0_chr3R_17203010_r 0 - chr3R 17188688 17189606 CG32491-RE_exon_0_0_chr3R_17188689_r 0 - chr3R 17200781 17201634 CG32491-RE_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RE_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RE_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RE_exon_4_0_chr3R_17203010_r 0 - chr3R 17189739 17190097 CG32491-RAB_exon_0_0_chr3R_17189740_r 0 - chr3R 17200781 17201634 CG32491-RAB_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAB_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAB_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAB_exon_4_0_chr3R_17203010_r 0 - chr3R 17190173 17190367 CG32491-RC_exon_0_0_chr3R_17190174_r 0 - chr3R 17190435 17190714 CG32491-RC_exon_1_0_chr3R_17190436_r 0 - chr3R 17200781 17201634 CG32491-RC_exon_2_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RC_exon_3_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RC_exon_4_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RC_exon_5_0_chr3R_17203010_r 0 - chr3R 17191725 17192060 CG32491-RY_exon_0_0_chr3R_17191726_f 0 . chr3R 17200781 17201634 CG32491-RY_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RY_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RY_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RY_exon_4_0_chr3R_17203010_f 0 . chr3R 17192171 17192466 CG32491-RX_exon_0_0_chr3R_17192172_f 0 . chr3R 17200781 17201634 CG32491-RX_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RX_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RX_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RX_exon_4_0_chr3R_17203010_f 0 . chr3R 17193631 17193960 CG32491-RW_exon_0_0_chr3R_17193632_f 0 . chr3R 17200781 17201634 CG32491-RW_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RW_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RW_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RW_exon_4_0_chr3R_17203010_f 0 . chr3R 17194101 17194784 CG32491-RV_exon_0_0_chr3R_17194102_f 0 . chr3R 17200781 17201634 CG32491-RV_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RV_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RV_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RV_exon_4_0_chr3R_17203010_f 0 . chr3R 17195183 17195967 CG32491-RU_exon_0_0_chr3R_17195184_f 0 . chr3R 17200781 17201634 CG32491-RU_exon_1_0_chr3R_17200782_f 0 . chr3R 17202323 17202463 CG32491-RU_exon_2_0_chr3R_17202324_f 0 . chr3R 17202540 17202798 CG32491-RU_exon_3_0_chr3R_17202541_f 0 . chr3R 17203009 17203121 CG32491-RU_exon_4_0_chr3R_17203010_f 0 . chr3R 17195821 17196364 CG32491-RS_exon_0_0_chr3R_17195822_r 0 - chr3R 17200781 17201634 CG32491-RS_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RS_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RS_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RS_exon_4_0_chr3R_17203010_r 0 - chr3R 17196654 17196949 CG32491-RAA_exon_0_0_chr3R_17196655_r 0 - chr3R 17200781 17201634 CG32491-RAA_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RAA_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RAA_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RAA_exon_4_0_chr3R_17203010_r 0 - chr3R 17197044 17197789 CG32491-RO_exon_0_0_chr3R_17197045_r 0 - chr3R 17200781 17201634 CG32491-RO_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RO_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RO_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RO_exon_4_0_chr3R_17203010_r 0 - chr3R 17197884 17198802 CG32491-RN_exon_0_0_chr3R_17197885_r 0 - chr3R 17200781 17201634 CG32491-RN_exon_1_0_chr3R_17200782_r 0 - chr3R 17202323 17202463 CG32491-RN_exon_2_0_chr3R_17202324_r 0 - chr3R 17202540 17202798 CG32491-RN_exon_3_0_chr3R_17202541_r 0 - chr3R 17203009 17203121 CG32491-RN_exon_4_0_chr3R_17203010_r 0 -
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org
participants (3)
-
Eckart Bindewald
-
Hans-Rudolf Hotz
-
Jennifer Jackson