Hello everyone,
I have some SAM/BAM files containing the alignments of some RNA-seq reads
to hg19. I'm interested in calculating some mapping statistics,
specifically, the percentage of reads mapping to exons, introns, and
extragenic regions.
I gather that this can be done with bedtools, but I'm finding myself a
little bit stuck just figuring out what files I need to get this
information. I gather that I need a GTF (or possibly GFF) file, and I
downloaded one from the UCSC browser using the settings in the attached
image.
The first couple lines of the resulting file are pasted below. I see that
the file has exon start and end sites. Is there a way to get what I need
with this file, or do I need something else?
Any assistance would be much appreciated,
Thanks
Alex
cat gencode.gtf | head -3
#bin name chrom strand txStart txEnd cdsStart cdsEnd
exonCount exonStarts exonEnds score name2
cdsStartStat cdsEndStat exonFrames
0 ENST00000237247.6 chr1 + 66999065 67210057
67000041 67208778 27
66999065,66999928,67091529,67098752,67099762,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67149789,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,
66999090,67000051,67091593,67098777,67099846,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67149870,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210057,
0 SGIP1 cmpl cmpl
-1,0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,1,0,1,1,2,2,0,2,1,1,
0 ENST00000371039.1 chr1 + 66999274 67210768
67000041 67208778 22
66999274,66999928,67091529,67098752,67105459,67108492,67109226,67136677,67137626,67138963,67142686,67145360,67154830,67155872,67160121,67184976,67194946,67199430,67205017,67206340,67206954,67208755,
66999355,67000051,67091593,67098777,67105516,67108547,67109402,67136702,67137678,67139049,67142779,67145435,67154958,67155999,67160187,67185088,67195102,67199563,67205220,67206405,67207119,67210768,
0 SGIP1 cmpl cmpl
-1,0,1,2,0,0,1,0,1,2,1,1,1,0,1,1,2,2,0,2,1,1,