Dear All,
In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put the fields on the left side):
For the forward read QNAME: SRR322837.8.1 FLAG: 99 RNAME: chr1 POS: 163761156 MAPQ: 255 CIAGR: 36M MRNM: = MPOS: 163761301 ISIZE: 181 SEQ: NTGGATACTATTTTGCCATAAAAAAATGAATAAAAT QUAL: %(,,')(())@@@2235885<<22222@@@###### OPT: XA:i:1 MD:Z:0A35 NM:i:1
For the reverse read QNAME: SRR322837.8.2 FLAG: 147 RNAME: chr1 POS: 163761301 MAPQ: 255 CIAGR: 36M MRNM: = MPOS: 163761156 ISIZE: -181 SEQ: TATTATGTCAATCTATGAAGAAGGACGGCGAGGTGA QUAL: GDBE@B>EEGDB=BD-=GG>GGGEDDG<GBGD8GB? OPT: MD:Z:29A6 NM:i:1
Is the ISIZE the insert size? The difference between POS and MPOS is 145bp, which is 36bp shorter than ISIZE (181). My question is: if ISIZE does mean insert size, how should I convert INSIZE into Mean Inner Distance between Mate Pairs?
Thanks,
Jianguang Du
Hello Jianguang,
On the Bowtie tool form itself, please find this text:
Outputs
The output is in SAM format, and has the following columns:
Column Description -------- -------------------------------------------------------- 1 QNAME Query (pair) NAME 2 FLAG bitwise FLAG 3 RNAME Reference sequence NAME 4 POS 1-based leftmost POSition/coordinate of clipped sequence 5 MAPQ MAPping Quality (Phred-scaled) 6 CIGAR extended CIGAR string 7 MRNM Mate Reference sequence NaMe ('=' if same as RNAME) 8 MPOS 1-based Mate POSition 9 ISIZE Inferred insert SIZE 10 SEQ query SEQuence on the same strand as the reference 11 QUAL query QUALity (ASCII-33 gives the Phred base quality) 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE
The value of ISIZE is the total insert size for this read pair.
Hopefully this helps!
Jen Galaxy team
On 8/16/12 2:34 PM, Du, Jianguang wrote:
Dear All,
In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put the fields on the left side):
For the forward read
QNAME: SRR322837.8.1
FLAG:99
RNAME:chr1
POS:163761156
MAPQ:255
CIAGR:36M
MRNM:=
MPOS:163761301
ISIZE:181
SEQ:NTGGATACTATTTTGCCATAAAAAAATGAATAAAAT
QUAL:%(,,')(())@@@2235885<<22222@@@######
OPT:XA:i:1MD:Z:0A35NM:i:1
For the reverse read
QNAME: SRR322837.8.2
FLAG:147
RNAME:chr1
POS:163761301
MAPQ:255
CIAGR:36M
MRNM:=
MPOS:163761156
ISIZE:-181
SEQ:TATTATGTCAATCTATGAAGAAGGACGGCGAGGTGA
QUAL:GDBE@B>EEGDB=BD-=GG>GGGEDDG<GBGD8GB?
OPT:MD:Z:29A6NM:i:1
Is the ISIZE the insert size? The difference between POS and MPOS is 145bp, which is 36bp shorter than ISIZE (181). My question is: if ISIZE does mean insert size, how should I convert INSIZE into Mean Inner Distance between Mate Pairs?
Thanks,
Jianguang Du
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Howdy Jianguang,
There's a more complete description of the SAM format in "The Sequence Alignment/Map format and SAMtools", Li et al, Bioinformatics (2009). And you can find the latest specification for the format at samtools.sourceforge.net .
In the spec, the terminology for the ISIZE field has been changed to TLEN, template length, to allow for sequencing technologies that produce more than two sequenced segments. The description there is "the number of bases from the leftmost mapped base to the rightmost mapped base".
So I think to convert to "inner distance between mate pairs" you would typically take ISIZE and subtract the lengths of the mates. Note that for some technologies that value could be negative (which just means the mates overlap). You might need to take into account whether the mates have been mapped with proper orientation-- for example, if an inversion has flipped one mate it has also carried that mate closer to or farther from the other.
Bob H
Hello Jianguang,
On the Bowtie tool form itself, please find this text:
Outputs
The output is in SAM format, and has the following columns:
Column Description
1 QNAME Query (pair) NAME 2 FLAG bitwise FLAG 3 RNAME Reference sequence NAME 4 POS 1-based leftmost POSition/coordinate of clipped sequence 5 MAPQ MAPping Quality (Phred-scaled) 6 CIGAR extended CIGAR string 7 MRNM Mate Reference sequence NaMe ('=' if same as RNAME) 8 MPOS 1-based Mate POSition 9 ISIZE Inferred insert SIZE 10 SEQ query SEQuence on the same strand as the reference 11 QUAL query QUALity (ASCII-33 gives the Phred base quality) 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE
The value of ISIZE is the total insert size for this read pair.
Hopefully this helps!
Jen Galaxy team
On 8/16/12 2:34 PM, Du, Jianguang wrote:
Dear All,
In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put the fields on the left side):
For the forward read ...snip... Is the ISIZE the insert size? The difference between POS and MPOS is 145bp, which is 36bp shorter than ISIZE (181). My question is: if ISIZE does mean insert size, how should I convert INSIZE into Mean Inner Distance between Mate Pairs?
Thanks,
Jianguang Du
I am developing several tools that will need to read and write multiple data files at once. For example, Eisen Cluster produces a heatmap which consists of three files: a .cdt file, .atr file and a gtr file which are the underlying heatmap and the array tree and the gene tree. All three files need to be kept together. I guess I could wrap them in a zip file and pack and unpack them. The heatmap is not just a view only object. Some tools, such as cuttree, would extract one tree and then aggregate genes (or arrays) below a certain depth and create a new trio of files. Is there support for (or plans for creating) any aggregate data types?
Thanks Ted
CBSE, UCSC
On Tue, Aug 21, 2012 at 4:46 PM, Ted Goldstein ted@soe.ucsc.edu wrote:
I am developing several tools that will need to read and write multiple data files at once. For example, Eisen Cluster produces a heatmap which consists of three files: a .cdt file, .atr file and a gtr file which are the underlying heatmap and the array tree and the gene tree. All three files need to be kept together. I guess I could wrap them in a zip file and pack and unpack them. The heatmap is not just a view only object. Some tools, such as cuttree, would extract one tree and then aggregate genes (or arrays) below a certain depth and create a new trio of files. Is there support for (or plans for creating) any aggregate data types?
Hi Ted,
There is support for composite datatypes, so this should be possible. http://wiki.g2.bx.psu.edu/Admin/Datatypes/Composite%20Datatypes
This kind of discussion is normally directed to the galaxy-dev list (CC'd)
Peter
Hi All, Thank you for your help. I understand how to do now. Jianguang
________________________________________ From: rsharris@bx.psu.edu [rsharris@bx.psu.edu] Sent: Tuesday, August 21, 2012 11:15 AM To: galaxy-user@lists.bx.psu.edu Cc: Du, Jianguang Subject: Re: [galaxy-user] run Bowtie to estimate Mean Inner Distance between Mate Pairs
Howdy Jianguang,
There's a more complete description of the SAM format in "The Sequence Alignment/Map format and SAMtools", Li et al, Bioinformatics (2009). And you can find the latest specification for the format at samtools.sourceforge.net .
In the spec, the terminology for the ISIZE field has been changed to TLEN, template length, to allow for sequencing technologies that produce more than two sequenced segments. The description there is "the number of bases from the leftmost mapped base to the rightmost mapped base".
So I think to convert to "inner distance between mate pairs" you would typically take ISIZE and subtract the lengths of the mates. Note that for some technologies that value could be negative (which just means the mates overlap). You might need to take into account whether the mates have been mapped with proper orientation-- for example, if an inversion has flipped one mate it has also carried that mate closer to or farther from the other.
Bob H
Hello Jianguang,
On the Bowtie tool form itself, please find this text:
Outputs
The output is in SAM format, and has the following columns:
Column Description
1 QNAME Query (pair) NAME 2 FLAG bitwise FLAG 3 RNAME Reference sequence NAME 4 POS 1-based leftmost POSition/coordinate of clipped sequence 5 MAPQ MAPping Quality (Phred-scaled) 6 CIGAR extended CIGAR string 7 MRNM Mate Reference sequence NaMe ('=' if same as RNAME) 8 MPOS 1-based Mate POSition 9 ISIZE Inferred insert SIZE 10 SEQ query SEQuence on the same strand as the reference 11 QUAL query QUALity (ASCII-33 gives the Phred base quality) 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE
The value of ISIZE is the total insert size for this read pair.
Hopefully this helps!
Jen Galaxy team
On 8/16/12 2:34 PM, Du, Jianguang wrote:
Dear All,
In order to figure out the Mean Inner Distance between Mate Pairs of my paired-end RNA-seq datasets, I ran Bowtie (Map with Bowtie for Illumina) with both forward and reverse datasets and mouse mm9 as reference genome. Below I list the Bowtie output for only one pair of reads (I put the fields on the left side):
For the forward read ...snip... Is the ISIZE the insert size? The difference between POS and MPOS is 145bp, which is 36bp shorter than ISIZE (181). My question is: if ISIZE does mean insert size, how should I convert INSIZE into Mean Inner Distance between Mate Pairs?
Thanks,
Jianguang Du
galaxy-user@lists.galaxyproject.org