problems accessing the sequences through genome browser
Dear Galaxy, I have some genomic intervals for the human genome and want to extract the same for the mouse.So, I used the MAF pairwise alignment tool. But, interestingly, the associated sequences are not to be found at all through the genome browser, for file that galaxy generated for me. Here are a few first alignments from the sample: ##maf version=1 a score=5002 s hg18.chr1 1820410 11 + 247249719 GGATCCAG-------ATG s mm9.chr4 769022 18 - 155630120 GAAACAAGGTGTTCCATG a score=20688 s hg18.chr1 2077163 10 + 247249719 TTTTCTTTTC s mm9.chr4 969004 10 - 155630120 CTTTCTTACC a score=15289 s hg18.chr1 2316453 90 + 247249719 CTCAGGTGAATTCCTCATGGCATCACAGCAGTGTTGAAA-----TAGGAGCAGATACG-TTACCTCCGC--TTGCCAGATAAGAAACTGGGACGCAGA s mm9.chr4 1172933 98 - 155630120 TTACAGTGAATTCTGCCTGGGATCCGTGCAGCATTGGAAATGGCTAGGGGCAGATAGGGTCACCTTCACAGTTGCTAGATAAGAAACAGGGTCGCGGA a score=20716 s hg18.chr1 2526225 31 + 247249719 CTTCCT-CTGGGCTTGGTCATCCTTCAAAGTC s mm9.chr4 1369046 32 - 155630120 CTTCCTCCTGGTCCCAACCATCTGTCAGATCC I just want the sequences from the mouse genome and extended up to 120 base pairs from the co-ordinate mentioned in the maf file. It only generates NNNNNNNNNNNNN....I was wondering how could Galaxy retrieve it, while I can't see the same in the genome browser. Kindly help. Amit.
Hi Amit, The issue you are seeing is likely due to the coordinate system used in the MAF format for - strand sequences, which is different from BED-style coordinates. Positions on the - strand are relative to the reverse complement of the source sequence. For example, the BED-style coordinate for mm9 in the first block would be "chr4 154861080 154861098 -" (where the start can be calculated as e.g.: 155630120 - 769022 - 18). You can use the MAF to interval (or BED) tool (under the convert formats tool section) to convert each MAF block to a valid set of genomic intervals. For more information on the MAF format see: http://genome.ucsc.edu/FAQ/FAQformat#format5 Also, it is notable that the UCSC Genome browser starts at position 1 (unlike BED, which starts at 0), so you will need to take this into account when manually examining the sequence track at the Genome browser (this difference between coordinate systems is handled automatically when uploading e.g. BED custom tracks to UCSC). Thanks for using Galaxy, Dan On Oct 25, 2010, at 8:47 AM, pande wrote:
Dear Galaxy, I have some genomic intervals for the human genome and want to extract the same for the mouse.So, I used the MAF pairwise alignment tool. But, interestingly, the associated sequences are not to be found at all through the genome browser, for file that galaxy generated for me. Here are a few first alignments from the sample:
##maf version=1 a score=5002 s hg18.chr1 1820410 11 + 247249719 GGATCCAG-------ATG s mm9.chr4 769022 18 - 155630120 GAAACAAGGTGTTCCATG
a score=20688 s hg18.chr1 2077163 10 + 247249719 TTTTCTTTTC s mm9.chr4 969004 10 - 155630120 CTTTCTTACC
a score=15289 s hg18.chr1 2316453 90 + 247249719 CTCAGGTGAATTCCTCATGGCATCACAGCAGTGTTGAAA-----TAGGAGCAGATACG-TTACCTCCGC--TTGCCAGATAAGAAACTGGGACGCAGA s mm9.chr4 1172933 98 - 155630120 TTACAGTGAATTCTGCCTGGGATCCGTGCAGCATTGGAAATGGCTAGGGGCAGATAGGGTCACCTTCACAGTTGCTAGATAAGAAACAGGGTCGCGGA
a score=20716 s hg18.chr1 2526225 31 + 247249719 CTTCCT-CTGGGCTTGGTCATCCTTCAAAGTC s mm9.chr4 1369046 32 - 155630120 CTTCCTCCTGGTCCCAACCATCTGTCAGATCC
I just want the sequences from the mouse genome and extended up to 120 base pairs from the co-ordinate mentioned in the maf file. It only generates NNNNNNNNNNNNN....I was wondering how could Galaxy retrieve it, while I can't see the same in the genome browser.
Kindly help.
Amit. _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
participants (2)
-
Daniel Blankenberg
-
pande