Hello everyone,
I have some SAM/BAM files containing the alignments of some RNA-seq reads
to hg19. I'm interested in calculating some mapping statistics,
specifically, the percentage of reads mapping to exons, introns, and
extragenic regions.
I gather that this can be done with bedtools, but I'm finding myself a
little bit stuck just figuring out what files I need to get this
information. I gather that I need a GTF (or possibly GFF) file, and I
downloaded one from the UCSC browser using the settings in the attached
The first couple lines of the resulting file are pasted below. I see that
the file has exon start and end sites. Is there a way to get what I need
with this file, or do I need something else?
Any assistance would be much appreciated,
cat gencode.gtf | head -3
#bin name chrom strand txStart txEnd cdsStart cdsEnd
exonCount exonStarts exonEnds score name2
cdsStartStat cdsEndStat exonFrames
0 ENST00000237247.6 chr1 + 66999065 67210057
67000041 67208778 27
0 SGIP1 cmpl cmpl
0 ENST00000371039.1 chr1 + 66999274 67210768
67000041 67208778 22
0 SGIP1 cmpl cmpl