Hi Perumal, There isn't a simple fasta extraction tool on the public Main Galaxy server, but the extraction is possible and could be grouped into a workflow for re-use once completed. This is simpler that it first looks, really just 4 steps: 1. Convert the fasta file to tabular: 'FASTA manipulation' -> <javascript:void(0)>FASTA-to-Tabular Settings: For the option "How many columns to divide title string into?:" use "2" if there is "identifier" and "description" text. See the next step for more details. 2. Load your list of identifiers as tabular This mean "tabular" text format. Adjust the datatype to be "tabular as needed, and any other formatting so that the "identifiers" are exactly the same in both files. I am not sure if this is what you meant by "fasta headers". To be clear, in the fasta file (#1) any characters after the leading ">" but before the first whitespace (tab, space, etc) are considered the "identifier" and everything else on the line is considered the "description". This file (#2) should only contain the "identifier", not the "description. Here is a link to FASTA format in case you run into problems here (the IDs not being exact will almost certainly be the root cause of any issues): http://wiki.galaxyproject.org/Learn/Datatypes#Fasta 3. Compare the two files together, subsetting out the entries in #1 that are present in #2. ' Join, Subtract and Group' -> Compare two Datasets Settings: Compare file #1, column 1 (c1), against file #1, column 2 (c1), 'To find' = Matching rows of 1st dataset. 4. Transform the results back to tabular format. 'FASTA manipulation' -> Tabular-to-FASTA Settings: Be sure to account for any description fields, if they are included in your data. At this point you can either put them into the final fasta output or omit the row/data altogether and just pull out identifiers/sequence. Hopefully this helps - Jen Galaxy team On 11/30/12 7:55 AM, Perumal Vijayan wrote:
I have successfully uploaded a large fasta file (2.5 million genomic sequence contigs) onto Galaxy server. I wish to extract a subset of sequences from this file. I have a list of the fasta headers. Is there a way I can accomplish this on Galaxy? -- Perumal Vijayan Saskatoon Canada
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org