Hi Pande, Perhaps there is a problem interpreting the MAF format? MAF is different than BED or Interval in that it only has a start coordinate. it is also 1-based, which may or may not be problem since it is not clear how you are viewing or extracting sequences. Here is the MAF FAQ from UCSC: http://genome.ucsc.edu/FAQ/FAQformat.html#format5 To work with MAF data in Galaxy, use the tools in Convert Formats. Another option is to use the LiftOver tool. It is based on the same source data from UCSC and is a more direct method to convert coordinates between genomes. To do this: 1) load the human coordinates 2) liftOver human -> mouse (minmatch can be lowered to 0.10 for cross-species lifts, but if you find that you are getting to many "multiple matches" raise this back up. Sometimes being strict, then less strict through a progressive cycles with failed regions will yield the best overall results). 3) use Text Manipulation: Compute an expression on every row to expand the mouse intervals, if you want to expand the ranges 4) use Fetch Sequences if you want the fasta mouse genome sequence for the intervals Please let us know if you continue to have problems. Sharing a history with the problem datasets/operations would be a great way to explain the issue. Thanks! Jen Galaxy team On 10/25/10 6:11 AM, pande wrote:
Dear Galaxy, I have some genomic intervals for the human genome and want to extract the same for the mouse.So, I used the MAF pairwise alignment tool. But, interestingly, the associated sequences are not to be found at all through the genome browser, for file that galaxy generated for me. Here are a few first alignments from the sample:
##maf version=1 a score=5002 s hg18.chr1 1820410 11 + 247249719 GGATCCAG-------ATG s mm9.chr4 769022 18 - 155630120 GAAACAAGGTGTTCCATG
a score=20688 s hg18.chr1 2077163 10 + 247249719 TTTTCTTTTC s mm9.chr4 969004 10 - 155630120 CTTTCTTACC
a score=15289 s hg18.chr1 2316453 90 + 247249719 CTCAGGTGAATTCCTCATGGCATCACAGCAGTGTTGAAA-----TAGGAGCAGATACG-TTACCTCCGC--TTGCCAGATAAGAAACTGGGACGCAGA
s mm9.chr4 1172933 98 - 155630120 TTACAGTGAATTCTGCCTGGGATCCGTGCAGCATTGGAAATGGCTAGGGGCAGATAGGGTCACCTTCACAGTTGCTAGATAAGAAACAGGGTCGCGGA
a score=20716 s hg18.chr1 2526225 31 + 247249719 CTTCCT-CTGGGCTTGGTCATCCTTCAAAGTC s mm9.chr4 1369046 32 - 155630120 CTTCCTCCTGGTCCCAACCATCTGTCAGATCC
I just want the sequences from the mouse genome and extended up to 120 base pairs from the co-ordinate mentioned in the maf file. It only generates NNNNNNNNNNNNN....I was wondering how could Galaxy retrieve it, while I can't see the same in the genome browser.
Kindly help.
Amit. _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Jennifer Jackson http://usegalaxy.org