How to calculate GC content of transcripts only including exons from a GTF file
Hi everyone, I want to calculate GC content of transcripts in the gtf file like this: chr1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; chr1 Cufflinks exon 3 10 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "1"; chr1 Cufflinks exon 13 18 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "2"; chr1 Cufflinks exon 20 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "3"; and the genome sequence that transcript comes from is:
chr1 GTAGCGTCTCCGACGCGGATATGACCGCACGCTGATGCTCCCAGGGATGAGAGGCGTGCG
I have to calculate GC content of the transcript after getting the sequence of the transcript. So how can I get the sequence of the transcript. In this case, it would be AGCGTCTC + ACGCGG + TAT, meaning the transcript sequence would be AGCGTCTCACGCGGTAT. Is it possible in the Galaxy?
Hello, You can use the tool "Fetch Sequences -> Extract Genomic DNA" with a GTF file and a custom reference genome to get the fasta sequence. Instructions are here for Custom Genomes in general and "Tools on the Main Server" covers this tool: http://wiki.galaxyproject.org/Support#Custom_reference_genome The tool "EMBOSS -> geecee" can be used to perform the calculation on the resulting fasta sequences. Best, Jen Galaxy team On 9/10/13 11:14 PM, 师云 wrote:
chr1 GTAGCGTCTCCGACGCGGATATGACCGCACGCTGATGCTCCCAGGGATGAGAGGCGTGCG I have to calculate GC content of the transcript after getting the sequence of the transcript. So how can I get the sequence of the transcript. In this case, it would be AGCGTCTC + ACGCGG + TAT, meaning
Hi everyone, I want to calculate GC content of transcripts in the gtf file like this: chr1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; chr1 Cufflinks exon 3 10 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "1"; chr1 Cufflinks exon 13 18 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "2"; chr1 Cufflinks exon 20 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "3"; and the genome sequence that transcript comes from is: the transcript sequence would be AGCGTCTCACGCGGTAT. Is it possible in the Galaxy?
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson http://galaxyproject.org
hello, jen Thank you for your reply. I though it will return the GC content of each exon. I try it and find that the Galaxy will interpret features. Thank you. John From: Jennifer Jackson Sent: Wednesday, September 18, 2013 6:38 AM To: 师云 Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] How to calculate GC content of transcripts only including exons from a GTF file Hello, You can use the tool "Fetch Sequences -> Extract Genomic DNA" with a GTF file and a custom reference genome to get the fasta sequence. Instructions are here for Custom Genomes in general and "Tools on the Main Server" covers this tool: http://wiki.galaxyproject.org/Support#Custom_reference_genome The tool "EMBOSS -> geecee" can be used to perform the calculation on the resulting fasta sequences. Best, Jen Galaxy team On 9/10/13 11:14 PM, 师云 wrote: Hi everyone, I want to calculate GC content of transcripts in the gtf file like this: chr1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; chr1 Cufflinks exon 3 10 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "1"; chr1 Cufflinks exon 13 18 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "2"; chr1 Cufflinks exon 20 22 1000 + . gene_id "CUFF.23955"; transcript_id "CUFF.23955.1"; exon_number "3"; and the genome sequence that transcript comes from is:
chr1 GTAGCGTCTCCGACGCGGATATGACCGCACGCTGATGCTCCCAGGGATGAGAGGCGTGCG
I have to calculate GC content of the transcript after getting the sequence of the transcript. So how can I get the sequence of the transcript. In this case, it would be AGCGTCTC + ACGCGG + TAT, meaning the transcript sequence would be AGCGTCTCACGCGGTAT. Is it possible in the Galaxy? ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/ -- Jennifer Hillman-Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
师云