Hi,
I uploaded a tab-delimited file(this was constructed within R using write.table) into Galaxy with chr, start, end, and esembl_TSS_name. Whilst I am able to use fetch sequence function, currently not able to include the Esembl ID in the FASTA output. I am able to include the Ensembl name in the interval format but not in FASTA format.
thanks, Lawrence
Hello Lawrence,
You are correct, by default, only identifiers, but not description text is produced when using the fetch sequence tool. Description text is not preserved by most Galaxy tools that use fasta files in Galaxy anyway, so be sure to include key names in the identifier itself. The identifier can be modified after the fasta file is created, if you want, using tools from the text manipulation tool set.
Once you have the fasta file, start by converting the file type from fasta to tabular. From there, alter the sequence identifier to be any value you want, by joining in new columns of data (other identifiers), merging together existing columns (e.g. converting spaces to underscores), adding new columns, and similar manipulations. You may want to cut out the sequence to save it back, work on the identifier, the merge it all back together at the end. As long as the end dataset is a two column tabular file.
It is very important that there is no extra white space - only one tab between the two columns. The first column is the identifier, the second the sequence. Next, convert this to fasta format as the final product.
This will take some experimentation, but these are very powerful tools that can do most of what can be done on the unix line command or with simple scripting. Once you work out a process that you like, it can be saved in a workflow, so that next time you want to do the same thing, you can just run the workflow instead of running a batch of tedious steps. Or, at a minimum, a saved workflow will provide you with a starter set of functions custom to your type of projects.
Best wishes and if you have questions about a particular tool, please let us know,
Take care,
Jen Galaxy team
On 11/9/11 4:06 AM, Lawrence Mckechnie wrote:
Hi,
I uploaded a tab-delimited file(this was constructed within R using write.table) into Galaxy with chr, start, end, and esembl_TSS_name. Whilst I am able to use fetch sequence function, currently not able to include the Esembl ID in the FASTA output. I am able to include the Ensembl name in the interval format but not in FASTA format.
thanks, Lawrence
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
galaxy-user@lists.galaxyproject.org