cufflinks FPKM problem
Hi: I use the solid PE sequencing data and mapped with the bioscope tools(AB company supported) ,which is better for solid data mapping ,so I don't use the bowtie to map . Igain the BAM file! Now ,I want use the cufflinks to calculate the gene expression. But there is a error. [15:08:06] Inspecting reads and determining fragment length distribution. BAM record error: found spliced alignment without XS attribute BAM record error: found spliced alignment without XS attribute the BAM file : 323_358_2010 73 chr1 343 0 45M5H * 0 0 CCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCT IIIIIIIIIII))C/1<DE''@DAHD379AID1;7BI+'7))I?3 RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:2 CQ:Z:A=ABA<<>@?<4)='))415'-4118-'1)9>'+1'<6+'1)85+)-+6- CS:Z:T20023010023110230100030100230100230100030000200000 423_236_1955 81 chr1 550 0 8H42M = 699451 698945 GTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGG GF>IIII%%III))8IIII?IIII%%IIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:5 SM:i:3 CQ:Z:9BA<AAB>;?AB:55;A%9?AB,4:@@*/)7>2<%5@<:3,;-.%8.*;5 CS:Z:T20302222311033322303302232133302223222131122330223 298_1884_1495 113 chr1 562 0 7H43M chr3 199392032 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG 5AI;6:>AIIII>?I7FIEIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:0 SM:i:3 CQ:Z:BB@7<AB8@ABA=2;=>82:?A388.A&28(77;64.1*-/<&0:9/%3? CS:Z:T20221231112210030222231103332200330223213312222022 62_1428_1954 89 chr1 562 1 50M * 0 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAATGC *=AIII4/CII=%%I((=EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:0 CQ:Z:@B@BABB=ABBB?@A=B>>@@?<;?>B>=<??'7(;A%&849+%0:@.4* CS:Z:T13130222022123111221003022223110331222033022321331 I have sorted the bam file and the gtf file. cufflinks -G refGene_hg18.gtf -p 3 -r human_hg18.fa -o test test.pe.bam (the version of cufflinks is v0.9.2 ) Who know the reason ,and what shoud I do! best wishes! Shiyong Li 2011-04-11 lishiyong
Hi Li, Tophat includes a custom tag 'XS' at the end of spliced read alignments which your pipeline is not aware about. The following is taken from http://cufflinks.cbcb.umd.edu/manual.html "Cufflinks takes a text file of SAM alignments as input. For more details on the SAM format, see the specification<http://samtools.sourceforge.net/SAM1.pdf>. The RNA-Seq read mapper TopHat <http://tophat.cbcb.umd.edu/> produces output in this format, and is recommended for use with Cufflinks. However Cufflinks will accept SAM alignments generated by any read mapper. Here's an example of an alignment Cufflinks will accept: s6.25mer.txt-913508 16 chr1 4482736 255 14M431N11M * 0 0 \ CAAGATGCTAGGCAAGTCTTGGAAG IIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 XS:A:- Note the use of the custom tag XS. This attribute, which must have a value of "+" or "-", indicates which strand the RNA that produced this read came from. While this tag can be applied to any alignment, including unspliced ones, it *must* be present for all spliced alignment records (those with a 'N' operation in the CIGAR string)." Kind regards, Paul 2011/4/11 lishiyong <lishiyong@genomics.org.cn>
Hi: I use the solid PE sequencing data and mapped with the bioscope tools(AB company supported) ,which is better for solid data mapping ,so I don't use the bowtie to map . Igain the BAM file! Now ,I want use the cufflinks to calculate the gene expression. But there is a error. [15:08:06] Inspecting reads and determining fragment length distribution. BAM record error: found spliced alignment without XS attribute BAM record error: found spliced alignment without XS attribute the BAM file :
323_358_2010 73 chr1 343 0 45M5H * 0 0 CCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCT IIIIIIIIIII))C/1<DE''@DAHD379AID1 ;7BI+'7))I?3 RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:2 CQ:Z:A=ABA<<>@?<4)='))415'-4118-'1)9>'+1'<6+'1)85+)-+6- CS:Z:T20023010023110230100030100230100230100030000200000
423_236_1955 81 chr1 550 0 8H42M = 699451 698945 GTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGG GF>IIII%%III))8IIII?IIII%%IIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:5 SM:i:3 CQ:Z:9BA<AAB>;?AB:55;A%9?AB,4:@ @*/)7>2<%5@ <:3,;-.%8.*;5 CS:Z:T20302222311033322303302232133302223222131122330223
298_1884_1495 113 chr1 562 0 7H43M chr3 199392032 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG 5AI;6:>AIIII>?I7FIEIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:0 SM:i:3 CQ:Z:BB@7 <AB8@ABA =2;=>82:?A388.A&28(77;64.1*-/<&0:9/%3? CS:Z:T20221231112210030222231103332200330223213312222022
62_1428_1954 89 chr1 562 1 50M * 0 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAATGC *=AIII4/CII=%%I((=EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:0 CQ:Z:@B @BABB=ABBB?@A=B>>@@?<;?>B>=<??'7(;A%&849+%0:@ .4* CS:Z:T13130222022123111221003022223110331222033022321331
I have sorted the bam file and the gtf file. cufflinks -G refGene_hg18.gtf -p 3 -r human_hg18.fa -o test test.pe.bam (the version of cufflinks is v0.9.2 ) Who know the reason ,and what shoud I do! best wishes! Shiyong Li 2011-04-11 ------------------------------ lishiyong
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Paul Korir www.paulkorir.com
Cufflinks requires an 'xs' tag on each read in the bam file. Only tophat does this. You can write a script to add this or remap with tophat. How much of a difference do you see between tophat and bioscope? Please excuse any typos -- Sent from my iPhone On Apr 11, 2011, at 9:46 AM, lishiyong <lishiyong@genomics.org.cn> wrote:
Hi: I use the solid PE sequencing data and mapped with the bioscope tools(AB company supported) ,which is better for solid data mapping ,so I don't use the bowtie to map . Igain the BAM file! Now ,I want use the cufflinks to calculate the gene expression. But there is a error. [15:08:06] Inspecting reads and determining fragment length distribution. BAM record error: found spliced alignment without XS attribute BAM record error: found spliced alignment without XS attribute the BAM file : 323_358_2010 73 chr1 343 0 45M5H * 0 0 CCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCT IIIIIIIIIII))C/1<DE''@DAHD379AID1;7BI+'7))I?3 RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:2 CQ:Z:A=ABA<<>@?<4)='))415'-4118-'1)9>'+1'<6+'1)85+)-+6- CS:Z:T20023010023110230100030100230100230100030000200000 423_236_1955 81 chr1 550 0 8H42M = 699451 698945 GTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGG GF>IIII%%III))8IIII?IIII%%IIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:5 SM:i:3 CQ:Z:9BA<AAB>;?AB:55;A%9?AB,4:@@*/)7>2<%5@<:3,;-.%8.*;5 CS:Z:T20302222311033322303302232133302223222131122330223 298_1884_1495 113 chr1 562 0 7H43M chr3 199392032 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG 5AI;6:>AIIII>?I7FIEIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:0 SM:i:3 CQ:Z:BB@7<AB8@ABA=2;=>82:?A388.A&28(77;64.1*-/<&0:9/%3? CS:Z:T20221231112210030222231103332200330223213312222022 62_1428_1954 89 chr1 562 1 50M * 0 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAATGC *=AIII4/CII=%%I((=EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:0 CQ:Z:@B@BABB=ABBB?@A=B>>@@?<;?>B>=<??'7(;A%&849+%0:@.4* CS:Z:T13130222022123111221003022223110331222033022321331
I have sorted the bam file and the gtf file. cufflinks -G refGene_hg18.gtf -p 3 -r human_hg18.fa -o test test.pe.bam (the version of cufflinks is v0.9.2 ) Who know the reason ,and what shoud I do! best wishes! Shiyong Li 2011-04-11 lishiyong ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Thank you very much for your reply! I'd like to know how to add this 'xs' tag since the amount of reads mapped to genome is much less using tophat, can we just add a '+' or '-' at the end of each line? 2011-04-11 gaohuan 发件人: Ryan Golhar 发送时间: 2011-04-11 23:19:10 收件人: lishiyong 抄送: tophat.cufflinks; galaxy-user; 高欢 主题: Re: [galaxy-user] cufflinks FPKM problem Cufflinks requires an 'xs' tag on each read in the bam file. Only tophat does this. You can write a script to add this or remap with tophat. How much of a difference do you see between tophat and bioscope? Please excuse any typos -- Sent from my iPhone On Apr 11, 2011, at 9:46 AM, lishiyong <lishiyong@genomics.org.cn> wrote: Hi: I use the solid PE sequencing data and mapped with the bioscope tools(AB company supported) ,which is better for solid data mapping ,so I don't use the bowtie to map . Igain the BAM file! Now ,I want use the cufflinks to calculate the gene expression. But there is a error. [15:08:06] Inspecting reads and determining fragment length distribution. BAM record error: found spliced alignment without XS attribute BAM record error: found spliced alignment without XS attribute the BAM file : 323_358_2010 73 chr1 343 0 45M5H * 0 0 CCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCT IIIIIIIIIII))C/1<DE''@DAHD379AID1;7BI+'7))I?3 RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:2 CQ:Z:A=ABA<<>@?<4)='))415'-4118-'1)9>'+1'<6+'1)85+)-+6- CS:Z:T20023010023110230100030100230100230100030000200000 423_236_1955 81 chr1 550 0 8H42M = 699451 698945 GTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGG GF>IIII%%III))8IIII?IIII%%IIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:5 SM:i:3 CQ:Z:9BA<AAB>;?AB:55;A%9?AB,4:@@*/)7>2<%5@<:3,;-.%8.*;5 CS:Z:T20302222311033322303302232133302223222131122330223 298_1884_1495 113 chr1 562 0 7H43M chr3 199392032 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG 5AI;6:>AIIII>?I7FIEIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:0 SM:i:3 CQ:Z:BB@7<AB8@ABA=2;=>82:?A388.A&28(77;64.1*-/<&0:9/%3? CS:Z:T20221231112210030222231103332200330223213312222022 62_1428_1954 89 chr1 562 1 50M * 0 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAATGC *=AIII4/CII=%%I((=EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:0 CQ:Z:@B@BABB=ABBB?@A=B>>@@?<;?>B>=<??'7(;A%&849+%0:@.4* CS:Z:T13130222022123111221003022223110331222033022321331 I have sorted the bam file and the gtf file. cufflinks -G refGene_hg18.gtf -p 3 -r human_hg18.fa -o test test.pe.bam (the version of cufflinks is v0.9.2 ) Who know the reason ,and what shoud I do! best wishes! Shiyong Li 2011-04-11 lishiyong ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Since SOLiD reads are strand-specific you can use the option '--library-type fr-secondstrand', and the strand information will automatically be added to the reads during the run. -Adam On Mon, Apr 11, 2011 at 8:27 AM, gaohuan <gaohuan@genomics.org.cn> wrote:
Thank you very much for your reply!
I'd like to know how to add this 'xs' tag since the amount of reads mapped to genome is much less using tophat, can we just add a '+' or '-' at the end of each line?
2011-04-11 ------------------------------ gaohuan ------------------------------ *发件人:* Ryan Golhar *发送时间:* 2011-04-11 23:19:10 *收件人:* lishiyong *抄送:* tophat.cufflinks; galaxy-user; 高欢 *主题:* Re: [galaxy-user] cufflinks FPKM problem Cufflinks requires an 'xs' tag on each read in the bam file. Only tophat does this. You can write a script to add this or remap with tophat.
How much of a difference do you see between tophat and bioscope?
Please excuse any typos -- Sent from my iPhone
On Apr 11, 2011, at 9:46 AM, lishiyong <lishiyong@genomics.org.cn> wrote:
Hi: I use the solid PE sequencing data and mapped with the bioscope tools(AB company supported) ,which is better for solid data mapping ,so I don't use the bowtie to map . Igain the BAM file! Now ,I want use the cufflinks to calculate the gene expression. But there is a error. [15:08:06] Inspecting reads and determining fragment length distribution. BAM record error: found spliced alignment without XS attribute BAM record error: found spliced alignment without XS attribute the BAM file :
323_358_2010 73 chr1 343 0 45M5H * 0 0 CCCTAACCCTACCCTAACCCTAACCCTAACCCTAACCCTAACCCT IIIIIIIIIII))C/1<DE''@DAHD379AID1 ;7BI+'7))I?3 RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:2 CQ:Z:A=ABA<<>@?<4)='))415'-4118-'1)9>'+1'<6+'1)85+)-+6- CS:Z:T20023010023110230100030100230100230100030000200000
423_236_1955 81 chr1 550 0 8H42M = 699451 698945 GTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGG GF>IIII%%III))8IIII?IIII%%IIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:5 SM:i:3 CQ:Z:9BA<AAB>;?AB:55;A%9?AB,4:@ @*/)7>2<%5@ <:3,;-.%8.*;5 CS:Z:T20302222311033322303302232133302223222131122330223
298_1884_1495 113 chr1 562 0 7H43M chr3 199392032 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGG 5AI;6:>AIIII>?I7FIEIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:2 CM:i:0 SM:i:3 CQ:Z:BB@7 <AB8@ABA =2;=>82:?A388.A&28(77;64.1*-/<&0:9/%3? CS:Z:T20221231112210030222231103332200330223213312222022
62_1428_1954 89 chr1 562 1 50M * 0 0 ACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAATGC *=AIII4/CII=%%I((=EIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII RG:Z:20110328192522421 NH:i:0 CM:i:4 SM:i:0 CQ:Z:@B @BABB=ABBB?@A=B>>@@?<;?>B>=<??'7(;A%&849+%0:@ .4* CS:Z:T13130222022123111221003022223110331222033022321331
I have sorted the bam file and the gtf file. cufflinks -G refGene_hg18.gtf -p 3 -r human_hg18.fa -o test test.pe.bam (the version of cufflinks is v0.9.2 ) Who know the reason ,and what shoud I do! best wishes! Shiyong Li 2011-04-11 ------------------------------ lishiyong
___________________________________________________________
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
<http://lists.bx.psu.edu/listinfo/galaxy-dev> http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (5)
-
Adam Roberts
-
gaohuan
-
lishiyong
-
Paul Korir
-
Ryan Golhar