I'm trying to use the GMAJ tool on Galaxy, but get this error: Error loading alignments from bundled file "input.maf": edu.psu.bx.gmaj.BadInputExceptions: Sequence length contradiction" s baboon.1 26891 63 + 1248010 TCAG... Current = 1248010. Previous = 550206 The maf files were extracted using the "Extract MAF blocks" tool, for TBA ENCODE alignments. The maf data are at http://main.g2.bx.psu.edu/display?id=3054
Ross, The problem here is that there are multiple sequences with the name 'baboon.1'. In fact, in the ENCODE TBA alignments there is a different 'baboon.1' for every ENCODE region, so when you concatenate MAFs across multiple regions you get a result that GMAJ (rightly) cannot understand. To correct this I think the only thing we can do is change the MAFs provided by the encode group so that every orthologous sequence has a unique name across encode regions. So baboon.1 -> baboon.ENm001_1, and so on. All, would such a change break any existing tools / analysis? -- jt On Aug 24, 2006, at 5:20 PM, Ross Hardison wrote:
I'm trying to use the GMAJ tool on Galaxy, but get this error:
Error loading alignments from bundled file "input.maf": edu.psu.bx.gmaj.BadInputExceptions: Sequence length contradiction" s baboon.1 26891 63 + 1248010 TCAG... Current = 1248010. Previous = 550206
The maf files were extracted using the "Extract MAF blocks" tool, for TBA ENCODE alignments.
The maf data are at http://main.g2.bx.psu.edu/display?id=3054
_______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
No conflicts on my end. Actually, it would be a rather convenient (and accurate) annotation to refer to the aligned species by region. -David On Aug 24, 2006, at 5:39 PM, James Taylor wrote:
Ross,
The problem here is that there are multiple sequences with the name 'baboon.1'. In fact, in the ENCODE TBA alignments there is a different 'baboon.1' for every ENCODE region, so when you concatenate MAFs across multiple regions you get a result that GMAJ (rightly) cannot understand.
To correct this I think the only thing we can do is change the MAFs provided by the encode group so that every orthologous sequence has a unique name across encode regions. So baboon.1 -> baboon.ENm001_1, and so on.
All, would such a change break any existing tools / analysis?
-- jt
On Aug 24, 2006, at 5:20 PM, Ross Hardison wrote:
I'm trying to use the GMAJ tool on Galaxy, but get this error:
Error loading alignments from bundled file "input.maf": edu.psu.bx.gmaj.BadInputExceptions: Sequence length contradiction" s baboon.1 26891 63 + 1248010 TCAG... Current = 1248010. Previous = 550206
The maf files were extracted using the "Extract MAF blocks" tool, for TBA ENCODE alignments.
The maf data are at http://main.g2.bx.psu.edu/display?id=3054
_______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
This sounds like the correct solution, and I cannot think of any tools which would be broken by doing this. Dan On Thu, 24 Aug 2006, James Taylor wrote:
Ross,
The problem here is that there are multiple sequences with the name 'baboon.1'. In fact, in the ENCODE TBA alignments there is a different 'baboon.1' for every ENCODE region, so when you concatenate MAFs across multiple regions you get a result that GMAJ (rightly) cannot understand.
To correct this I think the only thing we can do is change the MAFs provided by the encode group so that every orthologous sequence has a unique name across encode regions. So baboon.1 -> baboon.ENm001_1, and so on.
All, would such a change break any existing tools / analysis?
-- jt
On Aug 24, 2006, at 5:20 PM, Ross Hardison wrote:
I'm trying to use the GMAJ tool on Galaxy, but get this error:
Error loading alignments from bundled file "input.maf": edu.psu.bx.gmaj.BadInputExceptions: Sequence length contradiction" s baboon.1 26891 63 + 1248010 TCAG... Current = 1248010. Previous = 550206
The maf files were extracted using the "Extract MAF blocks" tool, for TBA ENCODE alignments.
The maf data are at http://main.g2.bx.psu.edu/display?id=3054
_______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
_______________________________________________ Galaxy-user mailing list Galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
participants (4)
-
D C King
-
Dan Blankenberg
-
James Taylor
-
Ross Hardison