Sam filtering and Header/Sorting issues
Dear All, I would like to add some filtering steps in my RNA-Seq pipeline. To do so, I used the accepted.hits from TopHat and apply a filter using NGS: SAM Tools > Filter SAM and select reads with bitwise flag 0x0002. This does the job. However, I am unable to use cufflink after this step and got the following error message that seems to indicate that the file contains no header and is unsorted. Is there a workaround ? Thanks a lot http://main.g2.bx.psu.edu/u/dputhier/h/srx011549 Error running cufflinks. return code = 1 cufflinks: /lib64/libz.so.1: no version information available (required by cufflinks) Command line: cufflinks -q --no-update-check -s 20 -I 300000 -F 0.100000 -j 0.150000 -p 8 -m 200 -g /galaxy/main_pool/pool5/files/003/858/dataset_3858145.dat /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat doesn't appear to be a valid BAM file, trying SAM... [14:11:28] Loading reference annotation. [14:11:28] Inspecting reads and determining fragment length distribution. Error: this SAM file doesn't appear to be correctly sorted! current hit is at chr10:181061, last one was at chr1:245006405 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position. -- ==================================================================== Denis Puthier laboratoire INSERM TAGC/INSERM U928 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: puthier@tagc.univ-mrs.fr Tel: (National) 04 91 82 87 11 / (International) 33 4 91 82 87 11 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01 Web: http://tagc.univ-mrs.fr/puthier http://biologie.univ-mrs.fr/view-data.php?id=245 http://tagc.univ-mrs.fr/tbrowser ====================================================================
Hi Denis, In a similar situation I was able to move forward using NGS: Picard (beta) / Replace SAM/BAM Header to copy back the header from the original unfiltered BAM or SAM file. Hope it helps, Carlos On Tue, Feb 28, 2012 at 3:15 PM, denis puthier <puthier@tagc.univ-mrs.fr> wrote:
Dear All, I would like to add some filtering steps in my RNA-Seq pipeline. To do so, I used the accepted.hits from TopHat and apply a filter using NGS: SAM Tools
Filter SAM and select reads with bitwise flag 0x0002. This does the job. However, I am unable to use cufflink after this step and got the following error message that seems to indicate that the file contains no header and is unsorted. Is there a workaround ? Thanks a lot
http://main.g2.bx.psu.edu/u/dputhier/h/srx011549
Error running cufflinks. return code = 1 cufflinks: /lib64/libz.so.1: no version information available (required by cufflinks) Command line: cufflinks -q --no-update-check -s 20 -I 300000 -F 0.100000 -j 0.150000 -p 8 -m 200 -g /galaxy/main_pool/pool5/files/003/858/dataset_3858145.dat /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat doesn't appear to be a valid BAM file, trying SAM... [14:11:28] Loading reference annotation. [14:11:28] Inspecting reads and determining fragment length distribution.
Error: this SAM file doesn't appear to be correctly sorted! current hit is at chr10:181061, last one was at chr1:245006405 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position.
-- ==================================================================== Denis Puthier laboratoire INSERM TAGC/INSERM U928 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: puthier@tagc.univ-mrs.fr Tel: (National) 04 91 82 87 11 / (International) 33 4 91 82 87 11 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01
Web: http://tagc.univ-mrs.fr/puthier http://biologie.univ-mrs.fr/view-data.php?id=245 http://tagc.univ-mrs.fr/tbrowser ====================================================================
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Hi Carlos, I had finally found a workaround by selecting header line matching ^@ and merging this with the SAM file. but I think your solution is far more elegant. I'll try. Thanks 2012/2/28 Carlos Borroto <carlos.borroto@gmail.com>
Hi Denis,
In a similar situation I was able to move forward using NGS: Picard (beta) / Replace SAM/BAM Header to copy back the header from the original unfiltered BAM or SAM file.
Hope it helps, Carlos
On Tue, Feb 28, 2012 at 3:15 PM, denis puthier <puthier@tagc.univ-mrs.fr> wrote:
Dear All, I would like to add some filtering steps in my RNA-Seq pipeline. To do so, I used the accepted.hits from TopHat and apply a filter using NGS: SAM Tools
Filter SAM and select reads with bitwise flag 0x0002. This does the job. However, I am unable to use cufflink after this step and got the following error message that seems to indicate that the file contains no header and is unsorted. Is there a workaround ? Thanks a lot
http://main.g2.bx.psu.edu/u/dputhier/h/srx011549
Error running cufflinks. return code = 1 cufflinks: /lib64/libz.so.1: no version information available (required by cufflinks) Command line: cufflinks -q --no-update-check -s 20 -I 300000 -F 0.100000 -j 0.150000 -p 8 -m 200 -g /galaxy/main_pool/pool5/files/003/858/dataset_3858145.dat /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File /galaxy/main_pool/pool1/files/003/858/dataset_3858306.dat doesn't appear to be a valid BAM file, trying SAM... [14:11:28] Loading reference annotation. [14:11:28] Inspecting reads and determining fragment length distribution.
Error: this SAM file doesn't appear to be correctly sorted! current hit is at chr10:181061, last one was at chr1:245006405 Cufflinks requires that if your file has SQ records in the SAM header that they appear in the same order as the chromosomes names in the alignments. If there are no SQ records in the header, or if the header is missing, the alignments must be sorted lexicographically by chromsome name and by position.
-- ==================================================================== Denis Puthier laboratoire INSERM TAGC/INSERM U928 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: puthier@tagc.univ-mrs.fr Tel: (National) 04 91 82 87 11 / (International) 33 4 91 82 87 11 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01
Web: http://tagc.univ-mrs.fr/puthier http://biologie.univ-mrs.fr/view-data.php?id=245 http://tagc.univ-mrs.fr/tbrowser ====================================================================
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- ==================================================================== Denis Puthier laboratoire INSERM TAGC/INSERM U928 Parc Scientifique de Luminy case 928 163, avenue de Luminy 13288 MARSEILLE cedex 09 FRANCE Mail: puthier@tagc.univ-mrs.fr Tel: (National) 04 91 82 87 11 / (International) 33 4 91 82 87 11 Fax: (National) 04 91 82 87 01 / (International) 33 4 91 82 87 01 Web: http://tagc.univ-mrs.fr/puthier http://biologie.univ-mrs.fr/view-data.php?id=245 http://tagc.univ-mrs.fr/tbrowser ====================================================================
participants (2)
-
Carlos Borroto
-
denis puthier