Re: [galaxy-user] Extract sequences from [gtf file] + [genome FASTA file]

27 Jan 2011

      Hello Karen,

The following general workflow should help you to pull sequences from 
any source.

1) cut out the sequence IDs from the query (in this case, a GTF & BED 
file) and sort them.
Text Manipulation -> Cut columns from a table
Filter and Sort -> Sort
2) convert the target fasta file to tabular format
Convert Formats ->  FASTA-to-Tabular converter
3) join the two datasets based on the sequence ID
Join, Subtract and Group -> Join two Queries
4) covert to fasta
Convert Formats -> Tabular-to-FASTA
5) when starting with a GTF file, there will most likely be duplicates. 
To remove, use:
NGS: QC and manipulation -> Collapse sequences

Once you create the actual workflow that performs the job, be sure to 
save it so that you can just re-use it whenever you need to perform the 
same task. To do this, from the history pane (most right) use Options -> 
Extract workflow and following the instructions on the form to customize.

Hopefully this helps,

Jen
Galaxy team

On 1/26/11 12:05 PM, Karen Tang wrote:
...
Hi Galaxy people,
I have transcripts predicted by Cufflinks that are in a gtf file. How
can I extract the sequences corresponding to those transcripts, using
Galaxy?
[Cufflinks transcript predictions in gtf file] + [Genome sequence in
FASTA file] ---> [FASTA file of transcript sequences]
My genome is a custom genome (not at UCSC).
---------
I'll also need to do the same thing, except my predicted transcripts are
in a Scripture bed file.
Thanks for your help!
Karen Tang :)
Plant Biology
University of Minnesota
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
-- 
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org