I have some SAM/BAM files containing the alignments of some RNA-seq reads to hg19. I'm interested in calculating some mapping statistics, specifically, the percentage of reads mapping to exons, introns, and extragenic regions.
I gather that this can be done with bedtools, but I'm finding myself a little bit stuck just figuring out what files I need to get this information. I gather that I need a GTF (or possibly GFF) file, and I downloaded one from the UCSC browser using the settings in the attached image.
The first couple lines of the resulting file are pasted below. I see that the file has exon start and end sites. Is there a way to get what I need with this file, or do I need something else?
cat gencode.gtf | head -3
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
0 ENST00000237247.6 chr1 + 66999065 67210057 67000041 67208778 27 66999065,66999928,67091529,67098752,67099762,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67149789,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755, 66999090,67000051,67091593,67098777,67099846,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67149870,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210057, 0 SGIP1 cmpl cmpl -1,0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,1,0,1,1,2,2,0,2,1,1,
0 ENST00000371039.1 chr1 + 66999274 67210768 67000041 67208778 22 66999274,66999928,67091529,67098752,67105459,67108492,67109226,67136677,67137626,67138963,67142686,67145360,67154830,67155872,67160121,67184976,67194946,67199430,67205017,67206340,67206954,67208755, 66999355,67000051,67091593,67098777,67105516,67108547,67109402,67136702,67137678,67139049,67142779,67145435,67154958,67155999,67160187,67185088,67195102,67199563,67205220,67206405,67207119,67210768, 0 SGIP1 cmpl cmpl -1,0,1,2,0,0,1,0,1,2,1,1,1,0,1,1,2,2,0,2,1,1,