Hi Lizex,

It sounds like you are working on the command line and want to now import data into Galaxy to work with it? If so, I'll add in an extra comment to be careful about the reference genome when moving into Galaxy:
http://wiki.galaxyproject.org/Support#Rsync_data_and_moving_between_instances

To get the data into Galaxy - use FTP:
http://wiki.galaxyproject.org/FTPUpload

The gene expression file's XLOC IDs are the same as those in the GTF file's attribute field (9th field), used as input to Cuffdiff. To get the transcript sequence, you basically want to match up those identifiers, then extract the sequence from the reference genome.

(Note that this will not include any base-level variation from your sequence data - this method is creating transcripts, using the genomic, based off coordinates. This tool packages does not assemble new consensus sequences.)

The general path is:
0 - upload the "gene differential expression testing", GTF file, and reference genome if needed
2 - cut out the "XLOC" field from the " gene differential expression testing" file using the tool "Text Manipulation -> Cut"
3 - use the tool " Filter and Sort -> Filter GTF data by attribute values_list" to obtain only records related to your XLOC list
4 - obtain fasta sequence with the tool "Fetch Sequences -> Extract Genomic DNA" using the result from 3 as the query and your uploaded reference genome as a "Custom reference genome" if needed.

More about custom reference genomes & RNA seq tools is in these links:
http://wiki.galaxyproject.org/Support#Interpreting_scientific_results
http://wiki.galaxyproject.org/Support#Custom_reference_genome

Hopefully this helps,

Jen
Galaxy team

On 4/8/13 2:40 PM, Lizex Husselmann wrote:

Dear Galaxy community

I'm new to galaxy and would like to ask the following:
I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I would like to analyze the gene_exp.diff file by extracting the significant transcripts. I've used grep "yes" to extract only the significant transcripts. From this info I have the locus start and end coordinates of each transcript for example "XLOC_000544 XLOC_000544 - chr1:12763969-12765675 C0 C4 OK 3.16487 1628.25 9.00696 -4.57022 4.8722e-06 0.00905256 yes".
How can I go about to extract this information/or sequence from the reference genome.

Kind regards

Lizex

This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. (mailto:Servicedesk@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 000 604 Fax number : 0800 00 7788 Email address : arc@tip-offs.com
Please Call me : 32840
Website: www.tip-offs.com For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za.
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org