Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus
Adding galaxy-dev list in CC as suggested by Peter. Il giorno mer, 06/02/2013 alle 16.57 +0000, Peter Cock ha scritto:
On Tue, Feb 5, 2013 at 11:45 AM, Nicola Soranzo <soranzo@crs4.it> wrote:
Dear Peter, I have created a simple Galaxy tool for DustMasker of the NCBI BLAST+ suite, which I think would be a useful addition to the ncbi_blast_plus repository you're maintaining in the Galaxy Tool Shed.
You can find it and hopefully pull it from:
https://bitbucket.org/nsoranzo/ncbi_blast_plus
Kind regards, Nicola
Hi Nicola,
Thanks for getting involved - we can discuss this on the galaxy-dev mailing list if you prefer? For now I have CC'd Edward Kirton as he is/was working on masking in BLAST databases for Galaxy.
I can see the new file tools/ncbi_blast_plus/ncbi_dustmasker_wrapper.xml however it refers to multiple new file formats - where are they defined?
* acclist * maskinfo_asn1_bin * maskinfo_asn1_text * seqloc_asn1_bin * seqloc_asn1_text
Hi Peter, I added these file formats mostly as placeholders for a future implementation. Now I have changed a bit the tool by removing acclist and seqloc_xml formats since they are not recognized by the last versions of dustmasker (I also sent an email to blast-help@ncbi.nlm.nih.gov to inform them of this bug). As before, you can find the new version at: https://bitbucket.org/nsoranzo/ncbi_blast_plus I stripped the old commit and did a new one, not a very good practice, sorry about that.
Have you looked at the (commented out) bits in the makeblastdb wrapper which would perhaps be relevant? This is something Edward Kirton wrote which I haven't integrated yet:
<!-- SEQUENCE MASKING OPTIONS --> <!-- TODO <repeat name="mask_data" title="Provide one or more files containing masking data"> <param name="file" type="data" format="asnb" label="File containing masking data" help="As produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)" /> </repeat> <repeat name="gi_mask" title="Create GI indexed masking data"> <param name="file" type="data" format="asnb" label="Masking data output file" /> </repeat> -->
Perhaps all you need to offer in ncbi_dustmasker_wrapper.xml is 'fasta' and 'asnb' (binary ASN) formats? Edward - did you have an 'asnb' definition?
'fasta' and 'interval' are the ones I'm interested for my use case. 'maskinfo_asn1_bin' is probably the one referenced as 'asnb' in the cited code (ASN1 is a general data serialization format like XML). A file in this format can be given as input to "makeblastdb -mask_data". Nicola -- Nicola Soranzo, Ph.D. CRS4 Bioinformatics Program Loc. Piscina Manna 09010 Pula (CA), Italy http://www.bioinformatica.crs4.it/
Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto:
Hi Peter, I added these file formats mostly as placeholders for a future implementation. Now I have changed a bit the tool by removing acclist and seqloc_xml formats since they are not recognized by the last versions of dustmasker (I also sent an email to blast-help@ncbi.nlm.nih.gov to inform them of this bug). As before, you can find the new version at:
https://bitbucket.org/nsoranzo/ncbi_blast_plus
I stripped the old commit and did a new one, not a very good practice, sorry about that.
Hi Peter, I've added a new commit to this repo which updates the test output files to (recommended) BLAST 2.2.26+, since functional tests were returning errors. Hope you find it useful. Nicola -- Nicola Soranzo, Ph.D. CRS4 Bioinformatics Program Loc. Piscina Manna 09010 Pula (CA), Italy http://www.bioinformatica.crs4.it/
On Fri, Feb 8, 2013 at 4:30 PM, Nicola Soranzo <soranzo@crs4.it> wrote:
Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto:
Hi Peter, I added these file formats mostly as placeholders for a future implementation. Now I have changed a bit the tool by removing acclist and seqloc_xml formats since they are not recognized by the last versions of dustmasker (I also sent an email to blast-help@ncbi.nlm.nih.gov to inform them of this bug). As before, you can find the new version at:
https://bitbucket.org/nsoranzo/ncbi_blast_plus
I stripped the old commit and did a new one, not a very good practice, sorry about that.
It seems to have confused the bitbucket page a little, but I have checked in your initial wrapper to my development repository (I use the tools branch): https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe7... Note I'm not going to include this in the Tool Shed release yet, we need to sort out the file format definitions first.
Hi Peter, I've added a new commit to this repo which updates the test output files to (recommended) BLAST 2.2.26+, since functional tests were returning errors.
Hope you find it useful.
Also applied to my branch, thank you - I'd forgotten to update that (but intend at some point to refresh the test files and dependency install to use BLAST 2.2.27+ instead): https://bitbucket.org/peterjc/galaxy-central/commits/f1f912f63bb4174f434e3f4... Sadly I've not actually got the unit tests to run at all yet, see: http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-February/013245.html Regards, Peter
Il giorno lun, 11/02/2013 alle 13.19 +0000, Peter Cock ha scritto:
On Fri, Feb 8, 2013 at 4:30 PM, Nicola Soranzo <soranzo@crs4.it> wrote:
Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto:
Hi Peter, I added these file formats mostly as placeholders for a future implementation. Now I have changed a bit the tool by removing acclist and seqloc_xml formats since they are not recognized by the last versions of dustmasker (I also sent an email to blast-help@ncbi.nlm.nih.gov to inform them of this bug). As before, you can find the new version at:
https://bitbucket.org/nsoranzo/ncbi_blast_plus
I stripped the old commit and did a new one, not a very good practice, sorry about that.
It seems to have confused the bitbucket page a little, but I have checked in your initial wrapper to my development repository (I use the tools branch): https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe7...
Note I'm not going to include this in the Tool Shed release yet, we need to sort out the file format definitions first.
Hi Peter, I implemented minimal datatypes for maskinfo ASN.1 binary and text, plus some other improvements to ncbi_blast_plus, and I sent you a pull request through Bitbucket for your development repository. I think that would be easier for you, let me know if it is not. Nicola
On Fri, Feb 15, 2013 at 6:10 PM, Nicola Soranzo <soranzo@crs4.it> wrote:
Peter wrote:
It seems to have confused the bitbucket page a little, but I have checked in your initial wrapper to my development repository (I use the tools branch): https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe7...
Note I'm not going to include this in the Tool Shed release yet, we need to sort out the file format definitions first.
Hi Peter, I implemented minimal datatypes for maskinfo ASN.1 binary and text, plus some other improvements to ncbi_blast_plus, and I sent you a pull request through Bitbucket for your development repository. I think that would be easier for you, let me know if it is not.
Nicola
That looks very useful Nicola - I hope to have time to test that next week :) Thank you, Peter
participants (2)
-
Nicola Soranzo
-
Peter Cock