Dear all I have several smallish BLAST databases that I would like to provide in a data library. I create them in a history with the makeblastdb tool and them try to add them to the library. I see that for each blast db there is an empty file created (like /path/dataset_12345.dat) and a folder with the same name (/path/dataset_12345_files/) that contains the actual db files (blastdb.n*). In my library the blastdb shows up empty and I cannot import it back to another history. I does not seem to be aware of the _files folder, despite it being the right data type (blastdbn). Any ideas what I am doing wrong? Thanks a lot for your help Ulf ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
On Wed, Jul 23, 2014 at 10:47 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear all
I have several smallish BLAST databases that I would like to provide in a data library. I create them in a history with the makeblastdb tool and them try to add them to the library. I see that for each blast db there is an empty file created (like /path/dataset_12345.dat) and a folder with the same name (/path/dataset_12345_files/) that contains the actual db files (blastdb.n*).
In my library the blastdb shows up empty and I cannot import it back to another history. I does not seem to be aware of the _files folder, despite it being the right data type (blastdbn).
Any ideas what I am doing wrong?
Thanks a lot for your help Ulf
Hi Ulf, I've never tried that. It could be a bug in Galaxy importing composite datatypes into a library, or something in the BLAST database definition which needs fixing. Does importing an HTML report (with child files like images) into a library work for you? (This is another composite datatype so a useful comparison). Rather than using Data Libraries, we just list all the locally installed shared BLAST databases via the BLAST *.loc files instead. Note using the *.loc files makes the databases available to all the Galaxy users, while with a Data Library you can control access to specific groups/roles. Regards, Peter
Dear Peter Thanks for your reply. I can import an html report (e.g. FastQC output) successfully into a new history from a data library. But the .dat file for the html is not empty like the one for the blastdb. Makes me think that I could do this with a blast db as well, if only it would not check for size 0 at the time of importing it. Thanks Ulf On 23/07/14 10:56, Peter Cock wrote:
On Wed, Jul 23, 2014 at 10:47 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear all
I have several smallish BLAST databases that I would like to provide in a data library. I create them in a history with the makeblastdb tool and them try to add them to the library. I see that for each blast db there is an empty file created (like /path/dataset_12345.dat) and a folder with the same name (/path/dataset_12345_files/) that contains the actual db files (blastdb.n*).
In my library the blastdb shows up empty and I cannot import it back to another history. I does not seem to be aware of the _files folder, despite it being the right data type (blastdbn).
Any ideas what I am doing wrong?
Thanks a lot for your help Ulf
Hi Ulf,
I've never tried that. It could be a bug in Galaxy importing composite datatypes into a library, or something in the BLAST database definition which needs fixing. Does importing an HTML report (with child files like images) into a library work for you? (This is another composite datatype so a useful comparison).
Rather than using Data Libraries, we just list all the locally installed shared BLAST databases via the BLAST *.loc files instead.
Note using the *.loc files makes the databases available to all the Galaxy users, while with a Data Library you can control access to specific groups/roles.
Regards,
Peter
************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
Interesting hypothesis - you may well be right. Galaxy guys - who is the expert to talk to on this and/or where in the code should we be looking? Thanks, Peter On Wed, Jul 23, 2014 at 11:22 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Peter
Thanks for your reply.
I can import an html report (e.g. FastQC output) successfully into a new history from a data library. But the .dat file for the html is not empty like the one for the blastdb. Makes me think that I could do this with a blast db as well, if only it would not check for size 0 at the time of importing it.
Thanks Ulf
On Jul 23, 2014, at 6:42 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Interesting hypothesis - you may well be right.
Galaxy guys - who is the expert to talk to on this and/or where in the code should we be looking?
Thanks,
Peter
I think there's a bit of a mixup here - Peter, I believe you were asking if other composite types with an html primary dataset could be imported from the history to library, but Ulf, your test was the other direction (library->history). I'd be interested in knowing the outcome of the history->library test as well. I am woefully ignorant about the blastdbn datatype. Is the primary file supposed to be html type but empty? --nate
On Wed, Jul 23, 2014 at 11:22 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Peter
Thanks for your reply.
I can import an html report (e.g. FastQC output) successfully into a new history from a data library. But the .dat file for the html is not empty like the one for the blastdb. Makes me think that I could do this with a blast db as well, if only it would not check for size 0 at the time of importing it.
Thanks Ulf
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Thu, Jul 24, 2014 at 2:50 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Jul 23, 2014, at 6:42 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Interesting hypothesis - you may well be right.
Galaxy guys - who is the expert to talk to on this and/or where in the code should we be looking?
Thanks,
Peter
I think there's a bit of a mixup here - Peter, I believe you were asking if other composite types with an html primary dataset could be imported from the history to library, but Ulf, your test was the other direction (library->history). I'd be interested in knowing the outcome of the history->library test as well.
Good catch - yes, that was what I was asking about. Ulf?
I am woefully ignorant about the blastdbn datatype. Is the primary file supposed to be html type but empty?
The BLAST databases are 'basic' composite datatypes, of which the most commonly used example is HTML (and some bits of the base class code code seem to assume HTML). This means testing if something works with HTML is a good first step. https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatype... Peter
Dear Nate, dear Peter Sorry for the delay in replying. I can import both HTML and blastdb from a history to a data library. If I try to get the data out of the library into anothre history, I am successful for the html but not for the blastdb. The problem seems to be that the primary data file (the /path/dataset_12345.dat) is empty for the blastdb, while the html primary file has something in it. When I try to import the blastdb (from library to history) there is a message along the lines of "can't import empty file". I hypothesise (admittedly without having looked at a line of code) that there is a test for file size 0 somewhere that is either altogether unnecessary or, more likely, does not take into account that for composite datatypes it might be completely legitimate for the primary file to be empty. Or is my primary blastdb file not supposed to be empty in the first place? I can blast against it just fine. Thanks a lot for your help Ulf On 24/07/14 15:02, Peter Cock wrote:
On Thu, Jul 24, 2014 at 2:50 PM, Nate Coraor <nate@bx.psu.edu> wrote:
On Jul 23, 2014, at 6:42 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Interesting hypothesis - you may well be right.
Galaxy guys - who is the expert to talk to on this and/or where in the code should we be looking?
Thanks,
Peter
I think there's a bit of a mixup here - Peter, I believe you were asking if other composite types with an html primary dataset could be imported from the history to library, but Ulf, your test was the other direction (library->history). I'd be interested in knowing the outcome of the history->library test as well.
Good catch - yes, that was what I was asking about. Ulf?
I am woefully ignorant about the blastdbn datatype. Is the primary file supposed to be html type but empty?
The BLAST databases are 'basic' composite datatypes, of which the most commonly used example is HTML (and some bits of the base class code code seem to assume HTML). This means testing if something works with HTML is a good first step.
https://github.com/peterjc/galaxy_blast/tree/master/datatypes/blast_datatype...
Peter
************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Nate, dear Peter
Sorry for the delay in replying.
I can import both HTML and blastdb from a history to a data library. If I try to get the data out of the library into anothre history, I am successful for the html but not for the blastdb. The problem seems to be that the primary data file (the /path/dataset_12345.dat) is empty for the blastdb, while the html primary file has something in it.
OK. Can you tell where Galaxy thinks the library files are on disk, and check to see if the folder of BLAST database files is actually there?
When I try to import the blastdb (from library to history) there is a message along the lines of "can't import empty file". I hypothesise (admittedly without having looked at a line of code) that there is a test for file size 0 somewhere that is either altogether unnecessary or, more likely, does not take into account that for composite datatypes it might be completely legitimate for the primary file to be empty.
This guess makes sense - but I've not yet tried to trace through the code either.
Or is my primary blastdb file not supposed to be empty in the first place? I can blast against it just fine.
The BLAST databases do not define/populate a primary file, so Galaxy seems to create a dummy empty file on its own. I have wondered about altering the BLAST database datatype definition to have a human readable text file as the "primary file" (i.e. the information currently saved as a text log file when creating a database).
Thanks a lot for your help Ulf
You too - you've found an "interesting" bug... Peter
Dear Nate, dear Peter Again, sorry for the delay in replying. Yes I can. It looks like this [galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat [galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/ blastdb.nhd blastdb.nhi blastdb.nhr blastdb.nin blastdb.nog blastdb.nsd blastdb.nsi blastdb.nsq I think the simplest solution would be to put something in the primary file. Just a short string that gets the file size above 0. I personally have followed you initial suggestion and made the dbs available globally via the .loc file. Thanks again Ulf On 28/07/14 09:43, Peter Cock wrote:
On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Nate, dear Peter
Sorry for the delay in replying.
I can import both HTML and blastdb from a history to a data library. If I try to get the data out of the library into anothre history, I am successful for the html but not for the blastdb. The problem seems to be that the primary data file (the /path/dataset_12345.dat) is empty for the blastdb, while the html primary file has something in it.
OK. Can you tell where Galaxy thinks the library files are on disk, and check to see if the folder of BLAST database files is actually there?
When I try to import the blastdb (from library to history) there is a message along the lines of "can't import empty file". I hypothesise (admittedly without having looked at a line of code) that there is a test for file size 0 somewhere that is either altogether unnecessary or, more likely, does not take into account that for composite datatypes it might be completely legitimate for the primary file to be empty.
This guess makes sense - but I've not yet tried to trace through the code either.
Or is my primary blastdb file not supposed to be empty in the first place? I can blast against it just fine.
The BLAST databases do not define/populate a primary file, so Galaxy seems to create a dummy empty file on its own. I have wondered about altering the BLAST database datatype definition to have a human readable text file as the "primary file" (i.e. the information currently saved as a text log file when creating a database).
Thanks a lot for your help Ulf
You too - you've found an "interesting" bug...
Peter
************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE **************************************************************************
On Wed, Jul 30, 2014 at 11:52 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Nate, dear Peter
Again, sorry for the delay in replying.
Yes I can. It looks like this
[galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat [galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/ blastdb.nhd blastdb.nhi blastdb.nhr blastdb.nin blastdb.nog blastdb.nsd blastdb.nsi blastdb.nsq
Good. Thanks for confirming that.
I think the simplest solution would be to put something in the primary file. Just a short string that gets the file size above 0.
That won't help with all the existing datasets out there - I think we rather need to fix something in the Galaxy code for composite files...
I personally have followed you initial suggestion and made the dbs available globally via the .loc file.
Thanks again Ulf
Great. Peter
Thanks for tracking down the problem - it sounds like it is a Galaxy bug then so I have created a Trello card (https://trello.com/c/bNEKfOWR). -John On Wed, Jul 30, 2014 at 7:06 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Wed, Jul 30, 2014 at 11:52 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Nate, dear Peter
Again, sorry for the delay in replying.
Yes I can. It looks like this
[galaxy@srv ~]$ cat /galaxy/database/files/081/dataset_81002.dat [galaxy@srv ~]$ ls /galaxy/database/files/081/dataset_81002_files/ blastdb.nhd blastdb.nhi blastdb.nhr blastdb.nin blastdb.nog blastdb.nsd blastdb.nsi blastdb.nsq
Good. Thanks for confirming that.
I think the simplest solution would be to put something in the primary file. Just a short string that gets the file size above 0.
That won't help with all the existing datasets out there - I think we rather need to fix something in the Galaxy code for composite files...
I personally have followed you initial suggestion and made the dbs available globally via the .loc file.
Thanks again Ulf
Great.
Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
On Mon, Jul 28, 2014 at 9:43 AM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
On Mon, Jul 28, 2014 at 8:28 AM, Ulf Schaefer <Ulf.Schaefer@phe.gov.uk> wrote:
Dear Nate, dear Peter
Sorry for the delay in replying.
I can import both HTML and blastdb from a history to a data library. If I try to get the data out of the library into anothre history, I am successful for the html but not for the blastdb. The problem seems to be that the primary data file (the /path/dataset_12345.dat) is empty for the blastdb, while the html primary file has something in it.
OK. Can you tell where Galaxy thinks the library files are on disk, and check to see if the folder of BLAST database files is actually there?
When I try to import the blastdb (from library to history) there is a message along the lines of "can't import empty file". I hypothesise (admittedly without having looked at a line of code) that there is a test for file size 0 somewhere that is either altogether unnecessary or, more likely, does not take into account that for composite datatypes it might be completely legitimate for the primary file to be empty.
This guess makes sense - but I've not yet tried to trace through the code either.
Or is my primary blastdb file not supposed to be empty in the first place? I can blast against it just fine.
The BLAST databases do not define/populate a primary file, so Galaxy seems to create a dummy empty file on its own. I have wondered about altering the BLAST database datatype definition to have a human readable text file as the "primary file" (i.e. the information currently saved as a text log file when creating a database).
Correction: I actually implemented this late last year (included in BLAST+ wrapper version v0.0.22 onwards, and the Galaxy BLAST datatypes version v0.0.18 onwards): https://github.com/peterjc/galaxy_blast/commit/9b3f65cddcc60de26de63272c362c... https://github.com/peterjc/galaxy_blast/commit/2ebfb790d5a1bbe310c3d7ccc2b95... The makeblastdb wrapper will send the stdout (log information) to the dummy index file, see the end of the <command> tag in: https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/nc... The display_data method for a BLAST database will show any makeblastdb log information held in the dummy index file, see https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatype... i.e. Only older BLAST databases in histories should have empty dummy index files, which will mitigate the library problem: https://trello.com/c/bNEKfOWR Peter
participants (4)
-
John Chilton
-
Nate Coraor
-
Peter Cock
-
Ulf Schaefer