Salmon references and data manager
Dear Björn, I just installed Salmon on our Galaxy instance and I have a couple of basic questions. Currently the reference transcriptomes are put in the same data table as the genomes, would it be of interest to separate this and give the transcriptomes their own table? I could probably try to do this... There is a data manager available that unfortunately has a bug. We fixed that and it now populates the reference genome data table. I would probably modify this as well use the new table. Could this be useful? I'm not sure how to proceed...would I give you the modified Salmon wrapper for inclusion in the package? Best regards, Christopher -- *Dr. Christopher Previti* Genomics and Proteomics Core Facility High Throughput Sequencing (W190) Bioinformatician German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 580 69120 Heidelberg Germany Room: B2.102 (INF580/TP3) Phone: +49 6221 42-4661 christopher.previti@dkfz.de <http://www.dkfz.de/> www.dkfz.de <http://www.dkfz.de/> Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537 Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist. Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung. Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.
Hi Christopher!
Dear Björn,
I just installed Salmon on our Galaxy instance and I have a couple of basic questions.
Sure, thanks for getting in touch!
Currently the reference transcriptomes are put in the same data table as the genomes, would it be of interest to separate this and give the
transcriptomes their own table? I could probably try to do this...
That I don't understand? Salmon is using this one here, isn't it? https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml...
There is a data manager available that unfortunately has a bug. We fixed that and it now populates the reference genome data table.
Do you mean this one? https://github.com/ieguinoa/data_manager_salmon_index_builder
I would probably modify this as well use the new table. Could this be useful? I'm not sure how to proceed...would I give you the modified Salmon wrapper for inclusion in the package?
If you can, please feel free to create PRs to the repositories, so we can all reviewed it. And then, when we merge, it gets automatically updated to the Tool Shed :) Thanks! Bjoern
Best regards,
Christopher
-- *Dr. Christopher Previti* Genomics and Proteomics Core Facility High Throughput Sequencing (W190) Bioinformatician
German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 580 69120 Heidelberg Germany Room: B2.102 (INF580/TP3) Phone: +49 6221 42-4661
christopher.previti@dkfz.de <http://www.dkfz.de/> www.dkfz.de <http://www.dkfz.de/>
Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537
Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist. Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung. Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.
Hi Christopher and Björn, I have some comments about this because I also came up with these questions some time ago...
From: "Björn Grüning" <bjoern.gruening@gmail.com> To: "Previti" <christopher.previti@dkfz-heidelberg.de>, "galaxy-dev" <galaxy-dev@lists.galaxyproject.org> Sent: Friday, September 7, 2018 9:56:41 AM Subject: Re: [galaxy-dev] Salmon references and data manager
Hi Christopher!
Dear Björn,
I just installed Salmon on our Galaxy instance and I have a couple of basic questions.
Sure, thanks for getting in touch!
Currently the reference transcriptomes are put in the same data table as the genomes, would it be of interest to separate this and give the
transcriptomes their own table? I could probably try to do this...
That I don't understand? Salmon is using this one here, isn't it?
https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml... What he means, I think, is the table to build the index from. Data managers that take a transcriptome as input get it from the all_fasta table, I think that is what he means by the genomes table. As I said at some point I also thought it may be useful to have a separate table (e.g all_transcriptomes) so that the genome and transcriptome entries of the same build don't get mixed. I think it would be good to have a way of listing only the transcriptomes from the all_gff but that would requiere some kind of standard on the naming to filter. We had this in our instance at some point but didn't help at all so I just modified the data manger to use the all_fasta and that is what I published. So, @Christopher ...having a separate table is not the solution although it would be easier for the GUI. For now just giving the entries a descriptive name to indicate the entries correspond to a transcriptome is enough and works ok for us. In any case this is not for users and at least for us its all handled through the API so, again, it's just a matter of taking care of the entries names and you are fine with using the all_fasta table.
There is a data manager available that unfortunately has a bug. We fixed that and it now populates the reference genome data table.
Do you mean this one?
https://github.com/ieguinoa/data_manager_salmon_index_builder
I would probably modify this as well use the new table. Could this be useful? I'm not sure how to proceed...would I give you the modified Salmon wrapper for inclusion in the package?
If you can, please feel free to create PRs to the repositories, so we can all reviewed it. And then, when we merge, it gets automatically updated to the Tool Shed :)
As Björn said, if that's the one you are talking about please create a PR or an isssue or contact me. Cheers, Ignacio
Thanks! Bjoern
Best regards,
Christopher
-- *Dr. Christopher Previti* Genomics and Proteomics Core Facility High Throughput Sequencing (W190) Bioinformatician
German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 580 69120 Heidelberg Germany Room: B2.102 (INF580/TP3) Phone: +49 6221 42-4661
christopher.previti@dkfz.de <http://www.dkfz.de/> www.dkfz.de <http://www.dkfz.de/>
Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537
Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist. Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung. Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
Yeah, I got confused about the data tables. Sorry about this. I too would keep the transcriptome indices separate from the reference genomes, it just makes sense. @Ignacio, I found that you need insert the following (in red) if not os.path.exists( target_directory ): os.mkdir( target_directory ) args = ['salmon','index'] in order for anything to happen. I think that's it...but I'll test some more. Best regards, Christopher On 09/07/2018 10:46 AM, Ignacio EGUINOA wrote:
Hi Christopher and Björn,
I have some comments about this because I also came up with these questions some time ago...
------------------------------------------------------------------------
*From: *"Björn Grüning" <bjoern.gruening@gmail.com> *To: *"Previti" <christopher.previti@dkfz-heidelberg.de>, "galaxy-dev" <galaxy-dev@lists.galaxyproject.org> *Sent: *Friday, September 7, 2018 9:56:41 AM *Subject: *Re: [galaxy-dev] Salmon references and data manager
Hi Christopher!
> Dear Björn, > > I just installed Salmon on our Galaxy instance and I have a couple of > basic questions.
Sure, thanks for getting in touch!
> Currently the reference transcriptomes are put in the same data table as > the genomes, would it be of interest to separate this and give the > > transcriptomes their own table? I could probably try to do this...
That I don't understand? Salmon is using this one here, isn't it?
https://github.com/bgruening/galaxytools/blob/master/tools/salmon/salmon.xml...
What he means, I think, is the table to build the index from. Data managers that take a transcriptome as input get it from the all_fasta table, I think that is what he means by the genomes table. As I said at some point I also thought it may be useful to have a separate table (e.g all_transcriptomes) so that the genome and transcriptome entries of the same build don't get mixed. I think it would be good to have a way of listing only the transcriptomes from the all_gff but that would requiere some kind of standard on the naming to filter. We had this in our instance at some point but didn't help at all so I just modified the data manger to use the all_fasta and that is what I published. So, @Christopher ...having a separate table is not the solution although it would be easier for the GUI. For now just giving the entries a descriptive name to indicate the entries correspond to a transcriptome is enough and works ok for us. In any case this is not for users and at least for us its all handled through the API so, again, it's just a matter of taking care of the entries names and you are fine with using the all_fasta table.
> There is a data manager available that unfortunately has a bug. We fixed > that and it now populates the reference genome data table.
Do you mean this one?
https://github.com/ieguinoa/data_manager_salmon_index_builder
> I would probably modify this as well use the new table. Could this be > useful? I'm not sure how to proceed...would I give you the modified > Salmon wrapper for inclusion in the package?
If you can, please feel free to create PRs to the repositories, so we can all reviewed it. And then, when we merge, it gets automatically updated to the Tool Shed :)
As Björn said, if that's the one you are talking about please create a PR or an isssue or contact me.
Cheers, Ignacio
Thanks! Bjoern
> Best regards, > > Christopher > > > -- > *Dr. Christopher Previti* > Genomics and Proteomics Core Facility > High Throughput Sequencing (W190) > Bioinformatician > > German Cancer Research Center (DKFZ) > Foundation under Public Law > Im Neuenheimer Feld 580 > 69120 Heidelberg > Germany > Room: B2.102 (INF580/TP3) > Phone: +49 6221 42-4661 > > christopher.previti@dkfz.de <http://www.dkfz.de/> > www.dkfz.de <http://www.dkfz.de/> > > Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta > VAT-ID No.: DE143293537 > > Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die > Personen bestimmt, an die sie adressiert ist. > Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte > Informationen enthalten. Sollten Sie nicht > der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den > Absender und löschen Sie die Mitteilung. > Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist > untersagt. > > ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/
-- *Dr. Christopher Previti* Genomics and Proteomics Core Facility High Throughput Sequencing (W190) Bioinformatician German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 580 69120 Heidelberg Germany Room: B2.102 (INF580/TP3) Phone: +49 6221 42-4661 christopher.previti@dkfz.de <http://www.dkfz.de/> www.dkfz.de <http://www.dkfz.de/> Management Board: Prof. Dr. Michael Baumann, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537 Vertraulichkeitshinweis: Diese Nachricht ist ausschließlich für die Personen bestimmt, an die sie adressiert ist. Sie kann vertrauliche und/oder nur für den/die Empfänger bestimmte Informationen enthalten. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, kontaktieren Sie bitte den Absender und löschen Sie die Mitteilung. Jegliche unbefugte Verwendung der Informationen in dieser Nachricht ist untersagt.
participants (3)
-
Björn Grüning
-
Ignacio EGUINOA
-
Previti