Extracting sequences for transcripts from reference genome
Dear Galaxy community I'm new to galaxy and would like to ask the following: I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I would like to analyze the gene_exp.diff file by extracting the significant transcripts. I've used grep "yes" to extract only the significant transcripts. From this info I have the locus start and end coordinates of each transcript for example "XLOC_000544 XLOC_000544 - chr1:12763969-12765675 C0 C4 OK 3.16487 1628.25 9.00696 -4.57022 4.8722e-06 0.00905256 yes". How can I go about to extract this information/or sequence from the reference genome. Kind regards Lizex This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. (mailto:Servicedesk@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 000 604 Fax number : 0800 00 7788 Email address : arc@tip-offs.com Please Call me : 32840 Website: www.tip-offs.com For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za.
Hi Lizex, It sounds like you are working on the command line and want to now import data into Galaxy to work with it? If so, I'll add in an extra comment to be careful about the reference genome when moving into Galaxy: http://wiki.galaxyproject.org/Support#Rsync_data_and_moving_between_instance... To get the data into Galaxy - use FTP: http://wiki.galaxyproject.org/FTPUpload The gene expression file's XLOC IDs are the same as those in the GTF file's attribute field (9th field), used as input to Cuffdiff. To get the transcript sequence, you basically want to match up those identifiers, then extract the sequence from the reference genome. (Note that this will not include any base-level variation from your sequence data - this method is creating transcripts, using the genomic, based off coordinates. This tool packages does not assemble new consensus sequences.) The general path is: 0 - upload the "gene differential expression testing", GTF file, and reference genome if needed 2 - cut out the "XLOC" field from the " gene differential expression testing" file using the tool "Text Manipulation -> Cut" 3 - use the tool " Filter and Sort -> Filter GTF data by attribute values_list" to obtain only records related to your XLOC list 4 - obtain fasta sequence with the tool "Fetch Sequences -> Extract Genomic DNA" using the result from 3 as the query and your uploaded reference genome as a "Custom reference genome" if needed. More about custom reference genomes & RNA seq tools is in these links: http://wiki.galaxyproject.org/Support#Interpreting_scientific_results http://wiki.galaxyproject.org/Support#Custom_reference_genome Hopefully this helps, Jen Galaxy team On 4/8/13 2:40 PM, Lizex Husselmann wrote:
Dear Galaxy community
I'm new to galaxy and would like to ask the following: I have trimmed, QC'ed my data received from Illumina HiScan SQ, paired and single end data. Mapped using Tophat, run cufflinks, cuffmerge and cuffdiff. I would like to analyze the gene_exp.diff file by extracting the significant transcripts. I've used grep "yes" to extract only the significant transcripts. From this info I have the locus start and end coordinates of each transcript for example "XLOC_000544 XLOC_000544 - chr1:12763969-12765675 C0 C4 OK 3.16487 1628.25 9.00696 -4.57022 4.8722e-06 0.00905256 yes". How can I go about to extract this information/or sequence from the reference genome.
Kind regards
Lizex This message is confidential and may be covered by legal professional privilege. It must not be read, copied, disclosed or used in any other manner by any person other than the addressee(s). Unauthorised use, disclosure or copying is strictly prohibited and may be unlawful. The views expressed in this email are those of the sender, unless otherwise stated. If you have received this email in error, please contact ARC Service Desk immediately. (mailto:Servicedesk@arc.agric.za) To report incidents of fraud and / or corruption in the ARC use our Ethics Hotline by: Phone number : 0800 000 604 Fax number : 0800 00 7788 Email address : arc@tip-offs.com Please Call me : 32840 Website: www.tip-offs.com For more information on the ARC Ethics Hotline, please visit our website at www.arc.agric.za.
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Lizex Husselmann