Adding galaxy-dev list in CC as suggested by Peter. Il giorno mer, 06/02/2013 alle 16.57 +0000, Peter Cock ha scritto:
On Tue, Feb 5, 2013 at 11:45 AM, Nicola Soranzo <soranzo@crs4.it> wrote:
Dear Peter, I have created a simple Galaxy tool for DustMasker of the NCBI BLAST+ suite, which I think would be a useful addition to the ncbi_blast_plus repository you're maintaining in the Galaxy Tool Shed.
You can find it and hopefully pull it from:
https://bitbucket.org/nsoranzo/ncbi_blast_plus
Kind regards, Nicola
Hi Nicola,
Thanks for getting involved - we can discuss this on the galaxy-dev mailing list if you prefer? For now I have CC'd Edward Kirton as he is/was working on masking in BLAST databases for Galaxy.
I can see the new file tools/ncbi_blast_plus/ncbi_dustmasker_wrapper.xml however it refers to multiple new file formats - where are they defined?
* acclist * maskinfo_asn1_bin * maskinfo_asn1_text * seqloc_asn1_bin * seqloc_asn1_text
Hi Peter, I added these file formats mostly as placeholders for a future implementation. Now I have changed a bit the tool by removing acclist and seqloc_xml formats since they are not recognized by the last versions of dustmasker (I also sent an email to blast-help@ncbi.nlm.nih.gov to inform them of this bug). As before, you can find the new version at: https://bitbucket.org/nsoranzo/ncbi_blast_plus I stripped the old commit and did a new one, not a very good practice, sorry about that.
Have you looked at the (commented out) bits in the makeblastdb wrapper which would perhaps be relevant? This is something Edward Kirton wrote which I haven't integrated yet:
<!-- SEQUENCE MASKING OPTIONS --> <!-- TODO <repeat name="mask_data" title="Provide one or more files containing masking data"> <param name="file" type="data" format="asnb" label="File containing masking data" help="As produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker)" /> </repeat> <repeat name="gi_mask" title="Create GI indexed masking data"> <param name="file" type="data" format="asnb" label="Masking data output file" /> </repeat> -->
Perhaps all you need to offer in ncbi_dustmasker_wrapper.xml is 'fasta' and 'asnb' (binary ASN) formats? Edward - did you have an 'asnb' definition?
'fasta' and 'interval' are the ones I'm interested for my use case. 'maskinfo_asn1_bin' is probably the one referenced as 'asnb' in the cited code (ASN1 is a general data serialization format like XML). A file in this format can be given as input to "makeblastdb -mask_data". Nicola -- Nicola Soranzo, Ph.D. CRS4 Bioinformatics Program Loc. Piscina Manna 09010 Pula (CA), Italy http://www.bioinformatica.crs4.it/