Re: [galaxy-user] how to transfer gene id into protein id

27 Oct 2011

      Hello,

If the reference genome is in UCSC and has a RefSeq track, then you can 
extract a file with the transcript and protein identifiers from the 
Table Browser called "refLink" and subset it for rows in your query 
RefSeq transcript identifiers.

If the RefSeq data is at BioMart or another source, a similar path to 
the one I outline below will work with some modifications, it all 
depends on the file format, but Galaxy's tools can manipulate data is 
just about every way you will need.

Using a transcript identifier query, subset protein identifiers in a 
UCSC RefSeq track:

A.
Load your list of NM* identifiers ("Get Data -> Upload).
- set the file format to "tabular" (use "pencil" icon to "Edit 
Attributes -> Change data type") if needed.

B.
Load RefSeq id mapping data with "Get Data -> UCSC Main" and set the 
form parameters as needed, choosing the track "RefSeq Genes" and the 
table "refLink". Make sure the region is the entire genome. Send to 
Galaxy formatted as-is (tabular).

B.
Next, cut columns 3 and 4 out of the table with tool "Text Manipulation 
->Cut" and the options "c3,c4".

C. OPTIONAL, if you want the full list of coding RefSeqs for another 
purpose... remove the non-coding RefSeqs with the tool "Filter and Sort
-> Select" and the options "that: NOT Matching" and "the pattern: 
^NR_.*$". Be sure to enter the regular expression '^NR_.*$' without any 
quotes.

D. Perform a join using "Join, Subtract and Group -> Compare two 
Datasets" with the options>:
     - "Compare: <file of trans and prot id, filtered or not>"
     - "Using column: c1" where c1 is the trans ids
     - "against: <file of trans ids>"
     - "and column: c1" where c1 is the trans ids
     - "To find: Matching rows of first dataset"

E.
Result dataset is a two column tabular file:
    transcript id <tab> protein id

Hopefully this helps you and others who are doing a similar task. If you 
think you will be doing this a lot, be sure to consider extracting the 
steps into a workflow.

Thanks for using Galaxy,

Jen
Galaxy team

On 10/27/11 1:34 PM, Li, Jilong (MU-Student) wrote:
...
Hi,
I have some refseq gene id, like NM_*****.
How can I transfer these gene id into protein id, like NP_****?
Thank you very much!
Victor
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
-- 
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support