How can I extract sequence information fromm cuffdiff files?
Hi. I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it? Thanks. Humberto Dr. Humberto Boncristiani National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: humbfb@gmail.com
Hello, By no annotation, do you mean species-specific annotation (GTF) was not used? And you want to compare to a protein database like Genbank NR or RefSeq? Then these are the instructions. Please let us know if you had something else in mind. The sequence extraction can be done on Galaxy Main (if that is where you are working), but the BLAST will need to be run on a local or cloud install. To get set up (instance and data), start here: http://getgalaxy.org http://usegalaxy.org/cloud The BLAST+ wrapper recently moved from the distribution to the Tool Shed, but there are installation tools integrated to help get this into your instance. See the latest News Brief for details (Sept 7, 2012) - these are also good to follow as you maintain your instance: http://wiki.g2.bx.psu.edu/News http://wiki.g2.bx.psu.edu/DevNewsBriefs/2012_09_07 Questions about local/cloud installs are best directed to the galaxy-dev@bx.psu.edu mailing list: http://wiki.g2.bx.psu.edu/Mailing%20Lists To extract the transcript sequences, use the tool 'Fetch Sequences -> Extract Genomic DNA'. This will accept a custom reference genome from the history, if you have been using one, by changing the option "Source for Genomic Data:" to "History". Hopefully this helps, Jen Galaxy team On 9/13/12 10:09 AM, Humberto Boncristiani wrote:
Hi.
I got cuffdiff files with gene differential expression on it. I don't have the annotation, therefore I need to extract the sequence information from the genome coordinates and them blast them to identify those. How the easiest way to do it?
Thanks.
Humberto
*Dr. Humberto Boncristiani* National Research Council (NRC) Fellow Adjunct Research Associate Department of Biology Univ. North Carolina at Greensboro 312 Eberhart Bldg Greensboro, NC 27403, USA. Tel.:(1) 336-256-2591 Fax: (1) 336-334-5839 email: humbfb@gmail.com <mailto:humbfb@gmail.com>
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Humberto Boncristiani
-
Jennifer Jackson