Re: [galaxy-user] [galaxy-bugs] Help with mRNA sequences
Lee-Ann: I shared a history (using chr22 data for simplicity): http://main.g2.bx.psu.edu/u/aun1/h/mrna-snps-example and a workflow: http://main.g2.bx.psu.edu/u/aun1/w/mrna-snps-example with you that do the trick. Basically, you 1. download mRNA data as BED and as mRNA sequences (history items 1 & 2) 2. Collapse sequences to tab-delimited format (history item 3) 3. Remove dots and numbers by replacing dots with tabs and cutting accession and sequence out (History items 4 & 5) 4. Join sequences with bed file (History item 6) 5. Downloading the SNPs (History item # 7) 6. Joining with SNPs (History item # 8). You can use workflow to run this analysis genomewide. Let me know if you have issues. Tx, anton galaxy team On Jun 8, 2010, at 6:56 AM, Lee Wood wrote:
Hi Anton
Sorry to reply directly to you, but I'm a bit desperate :) . The problem I'm having is not with joining the files. The problem is that I need to retrieve the mRNA co-ordiantes and mRNA sequences from Galaxy. So basically what I want is a file that contains the mRNA co-ordinates and sequences, this file will then be joined to the file containing SNPs to identify which SNPs are within which mRNAs.
The problem is that I can't retrieve the mRNA co-ordinates. When I go through the mRNA group and track for RefSeq genes it gives me transcript and not mRNA co-ordinates. Then I thought if I use the refSeqAli table instead of RefSeq genes I could get the co- ordinates, and join this file to the file containing the mRNA sequences (that I retieved by chosing output format as sequence and selecting sequence type as mRNA). The problem here is that because the sequence file only contains the mRNA accession number and sequence I have to join the two files based on the NM numbers. But in the refSeqAli file (co-ordinates file) the NM number looks something like this NM1234 whereas in the mRNA sequence file looks something like this NM1234.2 so they won't join.
Is there a way to retrieve the mRNA co-ordinates and sequences through Galaxy, or will I have to create a script to do it myself?
Sorry for the super confusing email, but I'd really really appreciate any help.
Thank-you Lee-Ann
On Tue, Jun 1, 2010 at 9:10 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote: Lee-Ann:
If you upload coordinates of mRNA mapping to the genome in BED format and join it with coordinates of SNPs as shown in this movie:
http://screencast.g2.bx.psu.edu/galaxy/quickie5_join/flow.html
you will be able to identify mRNA containing SNPs.
Let me know if you still have issues.
anton galaxy team
On May 31, 2010, at 7:16 AM, Lee Wood wrote:
Hi
Could you please help. I need to identify SNPs within mRNA sequences. I have manged to retrieve all mRNA sequences using the mRNA and EST group, Human mRNAs track and RefSeqGenes table, and outputting in sequence format. The problem is, that to identify which SNPs are within which mRNA sequences I need the mRNA co- ordinates. If I leave everything the same in the table browser and output as BED for example it gives me gene and transcript information and not mRNA information.
Your help will be greatly appreciated Lee-Ann _______________________________________________ galaxy-bugs mailing list galaxy-bugs@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-bugs
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
participants (1)
-
Anton Nekrutenko