Dear Jen,
I am not much of a Galaxy user yet. Some
days ago I know something about Galaxy and found it a really wonderful
tool. And I am confused by a simple question regarding how to extract intron
sequences from [gtf file];
Here is a simple of a gtf
file:
1 Cufflinks transcript 3 22 1000 + . gene_id
"CUFF.26"; transcript_id "CUFF.26.1";
1 Cufflinks exon 3 22 1000 + . gene_id
"CUFF.26"; transcript_id "CUFF.26.1"; exon_number
"1";
1 Cufflinks transcript 10 40 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1";
1 Cufflinks exon 10 15 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1"; exon_number
"1";
1 Cufflinks exon 30 40 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1"; exon_number
"1";
I want to extract intron
from the [gtf] file. I found 2 ways may solve the question but it is both
useless;
1. I use (Filter and Sort) -> Filter to cut
the [gtf] file into 2 files such as the
follows:
File A ( contain
transcript ):
1 Cufflinks transcript 3 22 1000 + . gene_id
"CUFF.26"; transcript_id
"CUFF.26.1";
1 Cufflinks transcript 10 40 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1";
File B ( contain exon):
1 Cufflinks exon 3 22 1000 + . gene_id
"CUFF.26"; transcript_id "CUFF.26.1"; exon_number
"1";
1 Cufflinks exon 10 15 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1"; exon_number
"1";
1 Cufflinks exon 30 40 1000 - . gene_id
"CUFF.204"; transcript_id "CUFF.204.1"; exon_number
"1";
Then I use (Operate on Genomic
Intervals)->Subtract to subtract File B from File A Return Non-overlapping
pieces of intervals. I thought it will return a file
containing intron But the result is an empty
file;
2.
I convert [gtf] file to [Bed] file ,and use (Extract
Features)->Gene BED To Exon/Intron/Codon BED, and it return the same
result, an empty file.
I
think it must be something wrong with my thoughts. So I really need your help.
Thank you very much.
sincerely
yours,
John