Re: [galaxy-dev] Tophat didn't run
Hi Sarah, On 4/11/13 8:02 AM, Sarah Maman wrote:
Thnaks Jennifer,
Excuse me, my previous mail contains an error : In fact, the reference genome from my history was a fasta format (the name was GTF file but the format was fasta...). So, when I run tophat with a reference genome from your history, here is the error message (my reference genome is a FASTA file) :
Error in tophat:
[2013-04-11 14:57:12] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-11 14:57:12] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-11 14:57:12] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-11 14:57:13] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/tmp/1078173.1.workq/tmpzxEFNK/dataset_6485.*.bt2)
Settings: blablabla OK..... Total time for backward call to driver() for mirror index: 00:00:57 TopHat v2.0.5 tophat -p 4 /tmp/1078173.1.workq/tmpzxEFNK/dataset_6485 /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at jeu. avril 11 14:57:18 CEST 2013
And here is my bowtie and tophat versions :
$ which bowtie bowtie -v 0.12.8 This is good $ which tophat tophat -v 2.0.5 This is most likely the problem. There is probably a symbolic link
OK, now this looks like a tool/index mismatch problem. Most likely rooted in a binary path issue. pointing from tophat -> tophat2. You will want to remove that. The tool wrappers will be looking for the correct binary/indexes for the version they are each dependent on. This means that if you are running Tophat2 for Illumina, you want both tophat2 and bowtie2 to be used, along with the bowtie2 indexes. This is detailed in the wikis I sent in the Tophat/Bowtie sections, for both dependencies and the index set up. My guess at this point, without seeing your exact files, is that you need to add the index path to the bowtie2 loc file, and remove/adjust the symbolic link as I stated above, then restart, and test again to see if that fixes the problem.
But we have also, available on our cluster : bowtie2 --version is 2.0.0-beta7 This is also good, if "which bowtie" == v0.12.8 and "which bowtie2" == v2.
If "bowtie" is pointing to v2 on your cluster nodes, then remove that symbolic link, so that this is instead pointing to the correct binary (v0.12.8). same for bowtie2, should point to the v2 binary. bowtie/tophat and bowtie2/tophat2 are not the same executable and use different indexes - this is most likely why you had to use the bowtie v0.12.8 loc to get tophat2 going to begin with. Hope it works this time! Please keep replies on the list to help us with tracking, Jen Galaxy team
Could you please tell me how to point to the v2 binaries (how to change symbolic links) ?
Thanks in advance, Sarah
Jennifer Jackson a écrit :
Hi Sarah,
It still sounds like there is a path problem - this is why the tools are looking in the wrong loc file. When bowtie2/tophat2 installs, it will create a symbolic link that names itself as just "bowtie" or "tophat", pointing to the v2 binaries.
When you run these, what do you get?
$ which bowtie
$ which tophat
My guess is that these are symbolic links pointing to the v2 binaries. You will want to remove these. This is noted in the NGS set-up wiki, but easy to miss.
For the custom _reference genome _portion below, there is a mix-up here. A custom _reference genome_ is in fasta format, not GTF format. I think what you are doing is using a _reference annotation_ file with the process. Both can be used with RNA-seq tools, but the _reference genome_ is the one with the indexes. The link I sent about _custom reference genomes_ explains how to use one of these, if you still what want to try.
I think it is worth reviewing the path and loc info, plus the binary commands above. Unless this helps you to solve the problem on your own now.
Thanks!
Jen Galaxy team
On 4/11/13 6:16 AM, Sarah Maman wrote:
Thanks a lot Jennifer,
Restart, full paths were OK.
I don't know why but the 2nd version of Tophat (so the tophat tool available from Galaxy) search indexs in bowtie-index.loc file and not in bowtie2-index.loc So, I've added my bowtie2 index paths in bowtie-index. loc file and tophat run...
But when I want to run tophat with a reference genome from your history, here is the error message (my reference genome is a GFT file) : Error in tophat:
[2013-04-11 14:57:12] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-11 14:57:12] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-11 14:57:12] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-11 14:57:13] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/tmp/1078173.1.workq/tmpzxEFNK/dataset_6485.*.bt2)
Settings: blablabla OK..... Total time for backward call to driver() for mirror index: 00:00:57 TopHat v2.0.5 tophat -p 4 /tmp/1078173.1.workq/tmpzxEFNK/dataset_6485 /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at jeu. avril 11 14:57:18 CEST 2013 Thanks in advance, Sarah
Jennifer Jackson a écrit :
Hi Sarah,
Let's try to sort this out. Your problem does not seem to be the same as in the question referenced, but we can see. First - just to double check - since setting up the genome, you have restarted the server? If not, do that first and check to see if that fixes the problem. Basically, you want to follow this checklist and restarting is the final step: http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup
If the problem persists, then would you please send a few more details:
1 - full paths* on you system where you keep the .bt2 indexes, sam index, and .fa file. Maybe do an "ls -l" on these dirs so we can check the symbolic links are in place and named correctly.
* as a note, these should be "hard paths" and not symbolic (except for the .fa links), and must have permissions set to be accessible to the "galaxy user"
2 - lines from your bowtie2_indices.loc and sam_fa_indices.loc file for this genome. I may have you double check your builds.txt file later. if this doesn't sounds familiar, it could be the problem, the genome must be in there, too. - see this wiki: http://wiki.galaxyproject.org/Admin/Data%20Integration
3 - full error message you get when you try to run this using a genome in fasta format from your history. It really shouldn't be the same error - something is not right with the settings and a custom genome is not actually being used if that is the case. Give it another try and see what happens, then send that info. This is a bit of a side case, we should get your basic install correct, but knowing how to do this is a good thing and easy to learn. http://wiki.galaxyproject.org/Support#Custom_reference_genome
It is OK to masked out anything like user names/groups you don't want to share. Please keep on the list in case we need other feedback.
Thanks!
Jen Galaxy team
On 4/10/13 3:15 AM, Sarah Maman wrote:
Hello,
When I run tophat ("Tophat for Illumina Find splice junctions using RNA-seq data ), the job failed with truncated files. However, index files are available and I get exactly the same error message using built-in index or one of my history.
/ Tool execution generated the following error message:
Error in tophat:
[2013-04-10 09:17:07] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-10 09:17:07] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-10 09:17:07] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-10 09:17:07] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/work/galaxy/Danio_rerio.Zv9.62.dna.chromosome.22.fa.*.bt2)
The tool produced the following additional output:
TopHat v2.0.5 tophat -p 4 /work/galaxy/Danio_rerio.Zv9.62.dna.chromosome.22.fa /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at mer. avril 10 09:17:12 CEST 2013 /
In this post (http://dev.list.galaxyproject.org/tophat-for-illumina-looking-in-wrong-direc...), the solution isn't found.
Do you have any idea, Sarah Maman -- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
Thanks Jennifer, Since I did have an error when I run tophat2 with, as reference, a fasta from my history, so I modified line 105 of tophat wrapper (bowtie2-build instead of bowtie-build in command line). Now "Tophat for Illumina Find splice junctions using RNA-seq data " runs without error. Thank you again for your help, Sarah Jennifer Jackson a écrit :
Hi Sarah,
On 4/11/13 8:02 AM, Sarah Maman wrote:
Thnaks Jennifer,
Excuse me, my previous mail contains an error : In fact, the reference genome from my history was a fasta format (the name was GTF file but the format was fasta...). So, when I run tophat with a reference genome from your history, here is the error message (my reference genome is a FASTA file) :
Error in tophat:
[2013-04-11 14:57:12] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-11 14:57:12] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-11 14:57:12] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-11 14:57:13] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/tmp/1078173.1.workq/tmpzxEFNK/dataset_6485.*.bt2)
Settings: blablabla OK..... Total time for backward call to driver() for mirror index: 00:00:57 TopHat v2.0.5 tophat -p 4 /tmp/1078173.1.workq/tmpzxEFNK/dataset_6485 /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at jeu. avril 11 14:57:18 CEST 2013
And here is my bowtie and tophat versions :
$ which bowtie bowtie -v 0.12.8 This is good $ which tophat tophat -v 2.0.5 This is most likely the problem. There is probably a symbolic link
OK, now this looks like a tool/index mismatch problem. Most likely rooted in a binary path issue. pointing from tophat -> tophat2. You will want to remove that. The tool wrappers will be looking for the correct binary/indexes for the version they are each dependent on. This means that if you are running Tophat2 for Illumina, you want both tophat2 and bowtie2 to be used, along with the bowtie2 indexes. This is detailed in the wikis I sent in the Tophat/Bowtie sections, for both dependencies and the index set up.
My guess at this point, without seeing your exact files, is that you need to add the index path to the bowtie2 loc file, and remove/adjust the symbolic link as I stated above, then restart, and test again to see if that fixes the problem.
But we have also, available on our cluster : bowtie2 --version is 2.0.0-beta7 This is also good, if "which bowtie" == v0.12.8 and "which bowtie2" == v2.
If "bowtie" is pointing to v2 on your cluster nodes, then remove that symbolic link, so that this is instead pointing to the correct binary (v0.12.8). same for bowtie2, should point to the v2 binary. bowtie/tophat and bowtie2/tophat2 are not the same executable and use different indexes - this is most likely why you had to use the bowtie v0.12.8 loc to get tophat2 going to begin with.
Hope it works this time! Please keep replies on the list to help us with tracking,
Jen Galaxy team
Could you please tell me how to point to the v2 binaries (how to change symbolic links) ?
Thanks in advance, Sarah
Jennifer Jackson a écrit :
Hi Sarah,
It still sounds like there is a path problem - this is why the tools are looking in the wrong loc file. When bowtie2/tophat2 installs, it will create a symbolic link that names itself as just "bowtie" or "tophat", pointing to the v2 binaries.
When you run these, what do you get?
$ which bowtie
$ which tophat
My guess is that these are symbolic links pointing to the v2 binaries. You will want to remove these. This is noted in the NGS set-up wiki, but easy to miss.
For the custom _reference genome _portion below, there is a mix-up here. A custom _reference genome_ is in fasta format, not GTF format. I think what you are doing is using a _reference annotation_ file with the process. Both can be used with RNA-seq tools, but the _reference genome_ is the one with the indexes. The link I sent about _custom reference genomes_ explains how to use one of these, if you still what want to try.
I think it is worth reviewing the path and loc info, plus the binary commands above. Unless this helps you to solve the problem on your own now.
Thanks!
Jen Galaxy team
On 4/11/13 6:16 AM, Sarah Maman wrote:
Thanks a lot Jennifer,
Restart, full paths were OK.
I don't know why but the 2nd version of Tophat (so the tophat tool available from Galaxy) search indexs in bowtie-index.loc file and not in bowtie2-index.loc So, I've added my bowtie2 index paths in bowtie-index. loc file and tophat run...
But when I want to run tophat with a reference genome from your history, here is the error message (my reference genome is a GFT file) : Error in tophat:
[2013-04-11 14:57:12] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-11 14:57:12] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-11 14:57:12] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-11 14:57:13] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/tmp/1078173.1.workq/tmpzxEFNK/dataset_6485.*.bt2)
Settings: blablabla OK..... Total time for backward call to driver() for mirror index: 00:00:57 TopHat v2.0.5 tophat -p 4 /tmp/1078173.1.workq/tmpzxEFNK/dataset_6485 /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at jeu. avril 11 14:57:18 CEST 2013 Thanks in advance, Sarah
Jennifer Jackson a écrit :
Hi Sarah,
Let's try to sort this out. Your problem does not seem to be the same as in the question referenced, but we can see. First - just to double check - since setting up the genome, you have restarted the server? If not, do that first and check to see if that fixes the problem. Basically, you want to follow this checklist and restarting is the final step: http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup
If the problem persists, then would you please send a few more details:
1 - full paths* on you system where you keep the .bt2 indexes, sam index, and .fa file. Maybe do an "ls -l" on these dirs so we can check the symbolic links are in place and named correctly.
* as a note, these should be "hard paths" and not symbolic (except for the .fa links), and must have permissions set to be accessible to the "galaxy user"
2 - lines from your bowtie2_indices.loc and sam_fa_indices.loc file for this genome. I may have you double check your builds.txt file later. if this doesn't sounds familiar, it could be the problem, the genome must be in there, too. - see this wiki: http://wiki.galaxyproject.org/Admin/Data%20Integration
3 - full error message you get when you try to run this using a genome in fasta format from your history. It really shouldn't be the same error - something is not right with the settings and a custom genome is not actually being used if that is the case. Give it another try and see what happens, then send that info. This is a bit of a side case, we should get your basic install correct, but knowing how to do this is a good thing and easy to learn. http://wiki.galaxyproject.org/Support#Custom_reference_genome
It is OK to masked out anything like user names/groups you don't want to share. Please keep on the list in case we need other feedback.
Thanks!
Jen Galaxy team
On 4/10/13 3:15 AM, Sarah Maman wrote:
Hello,
When I run tophat ("Tophat for Illumina Find splice junctions using RNA-seq data ), the job failed with truncated files. However, index files are available and I get exactly the same error message using built-in index or one of my history.
/ Tool execution generated the following error message:
Error in tophat:
[2013-04-10 09:17:07] Beginning TopHat run (v2.0.5) ----------------------------------------------- [2013-04-10 09:17:07] Checking for Bowtie Bowtie version: 2.0.0.7 [2013-04-10 09:17:07] Checking for Samtools Samtools version: 0.1.19.0 [2013-04-10 09:17:07] Checking for Bowtie index files Error: Could not find Bowtie 2 index files (/work/galaxy/Danio_rerio.Zv9.62.dna.chromosome.22.fa.*.bt2)
The tool produced the following additional output:
TopHat v2.0.5 tophat -p 4 /work/galaxy/Danio_rerio.Zv9.62.dna.chromosome.22.fa /work/galaxy/database/files/006/dataset_6528.dat [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. Epilog : job finished at mer. avril 10 09:17:12 CEST 2013 /
In this post (http://dev.list.galaxyproject.org/tophat-for-illumina-looking-in-wrong-direc...), the solution isn't found.
Do you have any idea, Sarah Maman -- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
-- --*-- Sarah Maman INRA - LGC - SIGENAE http://www.sigenae.org/ Chemin de Borde-Rouge - Auzeville - BP 52627 31326 Castanet-Tolosan cedex - FRANCE Tel: +33(0)5.61.28.57.08 Fax: +33(0)5.61.28.57.53 --*--
participants (2)
-
Jennifer Jackson
-
Sarah Maman