Thank you very much for your reply.

The file contains more than 5000 transcripts so I don't pull out data per transcript.

I do as you told and make sure the format. I filter the gff file to get a new file only containing exons information (I was wrong yesterday because I used the raw gtf file as I told in the former mail), then convert gtf to bed . So I can use (Extract Features)->Gene BED To Exon/Intron/Codon BED to get a bed file containing introns such like this:

1	9162341	9162884	CUFF.1911.1	-
1	22819814	22826251	CUFF.5109.1	+
1	25887852	25895755	CUFF.5509.1	-
1	25895822	25902258	CUFF.5509.1	-
1	39783161	39786032	CUFF.8086.1	+

Then I met another problem: I got an empty file when I used Extract Genomic DNA to fetch sequence whether the file was gtf format or not. It returned a right result while I used the bed file downloaded from UCSC main. I think I have checked the format, but I found nothing wrong.

the data downloaded from UCSC main is like this:

chr1	133903980	133904133	NM_214429_exon_0_0_chr1_133903981_f	+
chr1	133914112	133914267	NM_214429_exon_1_0_chr1_133914113_f	+
chr1	133917280	133917449	NM_214429_exon_2_0_chr1_133917281_f	+

Then I suddenly found the problem when I was trying to explain it. The input file of the tool (Extract Genomic DNA) request the condition of the chromosome name which should be ,for example, 'chr1' rather than '1' .

I have tackled it all day .It is really low deficient when there is not anybody instructing in face to face.

Best,
John

From: 师云

Sent: Wednesday, August 21, 2013 6:50 PM

To: Jennifer Jackson

Subject: Re: [galaxy-user] Question about Extract intron sequences from [gtf file] + [genome FASTA file]

Hi Jen,

Thank you very much for your reply.

The file contains more than 5000 transcripts so I don't pull out data per transcript .

I do as you say and make sure the format. I filter the gff file to get a new file only containing exons information (I was wrong yesterday because I used the raw gtf file as I told in the former mail), then convert gtf to bed . So I can use (Extract Features)->Gene BED To Exon/Intron/Codon BED to get a bed file containing introns such like this:

1	9162341	9162884	CUFF.1911.1	-
1	22819814	22826251	CUFF.5109.1	+
1	25887852	25895755	CUFF.5509.1	-
1	25895822	25902258	CUFF.5509.1	-
1	39783161	39786032	CUFF.8086.1	+

the data downloaded from UCSC main is like this:

chr1	133903980	133904133	NM_214429_exon_0_0_chr1_133903981_f	+
chr1	133914112	133914267	NM_214429_exon_1_0_chr1_133914113_f	+
chr1	133917280	133917449	NM_214429_exon_2_0_chr1_133917281_f	+

I have tackled it all day .It is really of low deficiency when there is not anybody instructing in face to face. So I need some of your tips.

Best,
John

From: Jennifer Jackson

Sent: Wednesday, August 21, 2013 1:45 AM

To: 师云

Cc: galaxy-user@lists.bx.psu.edu

Subject: Re: [galaxy-user] Question about Extract intron sequences from [gtf file] + [genome FASTA file]

I am not much of a Galaxy user yet. Some days ago I know something about Galaxy and found it a really wonderful tool. And I am confused by a simple question regarding how to extract intron sequences from [gtf file];

Here is a simple of a gtf file:

1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1";

1 Cufflinks exon 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; exon_number "1";

1 Cufflinks transcript 10 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1";

1 Cufflinks exon 10 15 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1";

1 Cufflinks exon 30 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1";

I want to extract intron from the [gtf] file. I found 2 ways may solve the question but it is both useless;

1. I use (Filter and Sort) -> Filter to cut the [gtf] file into 2 files such as the follows:

File A ( contain transcript ):

1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1";

1 Cufflinks transcript 10 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1";

File B ( contain exon):

1 Cufflinks exon 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; exon_number "1";

1 Cufflinks exon 10 15 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1";

1 Cufflinks exon 30 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1";

Then I use (Operate on Genomic Intervals)->Subtract to subtract File B from File A Return Non-overlapping pieces of intervals. I thought it will return a file containing intron But the result is an empty file;

2. I convert [gtf] file to [Bed] file ,and use (Extract Features)->Gene BED To Exon/Intron/Codon BED, and it return the same result, an empty file.

I think it must be something wrong with my thoughts. So I really need your help. Thank you very much.

sincerely yours,

John