Hello, There appears to be something odd with the formatting of the GTF file - the exon counts are off in the second transcript's first exon. The exon_number "1" should be "2" (remember to count reverse, is on the negative strand). But that is a side issue. There are other things that do not quite make sense, but the entire dataset was not shared. Run this again, but do the following: 1 - make sure the files are in interval format and that the column assignments are correct (click on the pencil icon) 2 - Use strand assignment or better, separate (+) and (-) stranded transcripts into two files, at the start and run the query in two workflows from there. Some GOPS tools work best this way. Also, be aware that some of these transcripts will not have intron output. For example, the first transcript in your example is a single exon transcript. Also, if you have genes with overlapping variant transcripts, these will interfere with the query (you will lose introns or fractions of introns), but I don't know how large of a dataset you are working with. If you want to pull out data per transcript, the tools in the group "Filter and Sort" can be used to subset GFF/GTF files. The last query that you ran is the ideal way to run to obtain this information in Galaxy, but the GFF to BED converter creates a BED6, not a BED12 file, and this is why the tool produced no output (see the tool form for required input). Having this tool accept GTF formatted input might be something to consider as an enhancement - I will run it by our development team and open a Trello ticket as appropriate. Another method, which may not be available to you, (from looking at the chromosome identifiers - these are not UCSC chrom IDs) -- but could help in the future or others now, is to use the UCSC Table browser. It goes something like this: 1 - Click on "display at UCSC Main" for a GTF dataset, this loads the data as a custom track, default display in assembly viewer 2 - Once in UCSC, at the top bar, pick Tools -> Table Browser 3 - In the Table Browser, change track group to "Custom Tracks" and the user track you just loaded will be there 4 - Change region = genome, then output = bed, and make sure "Send output to Galaxy" is checked, submit 5 - On the next form, you will be given a list of regions to output in the BED6 output, Introns are one of them Best, Jen Galaxy team On 8/20/13 9:29 AM, 师云 wrote:
Dear Jen, I am not much of a Galaxy user yet. Some days ago I know something about Galaxy and found it a really wonderful tool. And I am confused by a simple question regarding how to extract intron sequences from [gtf file]; Here is a simple of a gtf file: 1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; 1 Cufflinks exon 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; exon_number "1"; 1 Cufflinks transcript 10 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; 1 Cufflinks exon 10 15 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1"; 1 Cufflinks exon 30 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1"; I want to extract intron from the [gtf] file. I found 2 ways may solve the question but it is both useless; 1. I use (Filter and Sort) -> Filter to cut the [gtf] file into 2 files such as the follows: File A ( contain transcript ): 1 Cufflinks transcript 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; 1 Cufflinks transcript 10 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; File B ( contain exon): 1 Cufflinks exon 3 22 1000 + . gene_id "CUFF.26"; transcript_id "CUFF.26.1"; exon_number "1"; 1 Cufflinks exon 10 15 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1"; 1 Cufflinks exon 30 40 1000 - . gene_id "CUFF.204"; transcript_id "CUFF.204.1"; exon_number "1"; Then I use (Operate on Genomic Intervals)->Subtract to subtract File B from File A Return Non-overlapping pieces of intervals. I thought it will return a file containing intron But the result is an empty file; 2. I convert [gtf] file to [Bed] file ,and use (Extract Features)->Gene BED To Exon/Intron/Codon BED, and it return the same result, an empty file. I think it must be something wrong with my thoughts. So I really need your help. Thank you very much. sincerely yours, John
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson http://galaxyproject.org