I've just been setting up our internal galaxy server and following the instructions to get the NCBI BLAST+ tool working but I kept getting stuck with it passing -db "" in rather than the actual database name. It turns out, because I had copied the examples in the blastdb_p.loc file, I had put two tabs in just as in the sample doc. This doesn't work. It has to be one tab. However, I would say that it would be better for the code to be fixed so it can handle a variable number of tabs rather than being so strict.
Shane
On Thu, Oct 4, 2012 at 8:56 PM, Shane Sturrock shane@biomatters.com wrote:
I've just been setting up our internal galaxy server and following the instructions to get the NCBI BLAST+ tool working but I kept getting stuck with it passing -db "" in rather than the actual database name. It turns out, because I had copied the examples in the blastdb_p.loc file, I had put two tabs in just as in the sample doc. This doesn't work. It has to be one tab.
You're seeing a double tab in the blastdb_p.loc.sample file? Which line? I should fix that.
However, I would say that it would be better for the code to be fixed so it can handle a variable number of tabs rather than being so strict.
But that would break valid situations where you might want an empty field in a table. OK, not likely with the BLAST databases, but it could apply to other *.loc files in Galaxy.
Regards,
Peter
Hi Peter,
Thanks for the quick reply.
On 5/10/2012, at 9:11 AM, Peter Cock wrote:
On Thu, Oct 4, 2012 at 8:56 PM, Shane Sturrock shane@biomatters.com wrote:
examples in the blastdb_p.loc file, I had put two tabs in just as in the sample doc. This doesn't work. It has to be one tab.
You're seeing a double tab in the blastdb_p.loc.sample file? Which line? I should fix that.
This section:
#Your blastdb_p.loc file should include an entry per line for each "base name" #you have stored. For example: # #nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr #nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr #...etc... # #See also blastdb.loc which is for any nucleotide BLAST database.
There are two tabs between the 2010 and /data
Like an idiot I just copied one of those lines and changed it to match my local database setup and then spent literally hours trying to figure out why blast from the command line worked fine, but I couldn't get it to work with Galaxy.
However, I would say that it would be better for the code to be fixed so it can handle a variable number of tabs rather than being so strict.
But that would break valid situations where you might want an empty field in a table. OK, not likely with the BLAST databases, but it could apply to other *.loc files in Galaxy.
Fair enough, so I would suggest updating the samples to specify that there must be only one tab between each section and that the sample actually follows this. The blastdb.loc file example is fine, just the blastdb_p.loc is wrong.
Shane
Regards,
Peter
-- For urgent cases, contact support@biomatters.com
Dr Shane Sturrock shane.sturrock@biomatters.com Senior Scientist Tel: +64 (0) 9 379 5064 76 Anzac Ave Auckland New Zealand
On Thursday, October 4, 2012, Shane Sturrock wrote:
Hi Peter,
Thanks for the quick reply.
No problem.
This section:
#Your blastdb_p.loc file should include an entry per line for each "base name" #you have stored. For example: # #nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr #nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr #...etc... # #See also blastdb.loc which is for any nucleotide BLAST database.
There are two tabs between the 2010 and /data
Sigh. I've been using emacs to edit these files on our server, and it doesn't show non-printing characters like tabs in a visible way (at least, not with the default settings - any tips?). Note I'm not using xemacs, just the terminal based one. And typing a raw tab involves a tedious control sequence... there probably is an easier way, but I don't edit these loc files very often.
(Note that user defined BLAST databases are coming soon to the BLAST+ wrappers, delayed by a hardware problem on our local system meaning I haven't finished testing it)
Like an idiot I just copied one of those lines and changed it to
match my local database setup and then spent literally hours
trying to figure out why blast from the command line worked
fine, but I couldn't get it to work with Galaxy.
Sorry about that - but by reporting this and getting it fixed you've hopefully saved anyone else suffering this way.
I'm at a meeting all day so I probably won't be able to update this till next week.
Peter
Hi Peter,
On 5/10/2012, at 8:43 PM, Peter Cock wrote:
On Thursday, October 4, 2012, Shane Sturrock wrote:
There are two tabs between the 2010 and /data
Sigh. I've been using emacs to edit these files on our server, and it doesn't show non-printing characters like tabs in a visible way (at least, not with the default settings - any tips?). Note I'm not
I'm a 'vi' man myself. It doesn't show tabs in a visible way either but I can see where they are when I move along the line.
using xemacs, just the terminal based one. And typing a raw tab involves a tedious control sequence... there probably is an easier way, but I don't edit these loc files very often.
Just hit 'Tab' in vi.
(Note that user defined BLAST databases are coming soon to the BLAST+ wrappers, delayed by a hardware problem on our local system meaning I haven't finished testing it)
Like an idiot I just copied one of those lines and changed it to match my local database setup and then spent literally hours trying to figure out why blast from the command line worked fine, but I couldn't get it to work with Galaxy.
Sorry about that - but by reporting this and getting it fixed you've hopefully saved anyone else suffering this way.
Thanks again, took me a couple of hours to figure it out so it would be good to save someone else that time.
I'm at a meeting all day so I probably won't be able to update this till next week.
Glad you're on it anyway.
Shane
Shane Sturrock shane@biomatters.com
On Sun, Oct 7, 2012 at 3:02 PM, Shane Sturrock shane@biomatters.com wrote:
using xemacs, just the terminal based one. And typing a raw tab involves a tedious control sequence... there probably is an easier way, but I don't edit these loc files very often.
Just hit 'Tab' in vi.
Somehow I don't think this will be enough to make someone switch from emacs to vi, ;-). Also at least in my case I have "set tabstop=4" and "set expandtab" in my "~/.vimrc", as I also use vim for coding and most people hate tabs in code. So every time I edit a Galaxy's .loc file I have to remember to first do ":set noexpandtab". As you can imagine I also have been bitten several time by a wrong formatting in my .loc files. I wish Galaxy would change to a different separator or at least make it configurable.
Regards, Carlos
On Fri, Oct 5, 2012 at 8:43 AM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Thursday, October 4, 2012, Shane Sturrock wrote:
This section:
#Your blastdb_p.loc file should include an entry per line for each "base name" #you have stored. For example: # #nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr #nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr #...etc... # #See also blastdb.loc which is for any nucleotide BLAST database.
There are two tabs between the 2010 and /data
...
Sorry about that - but by reporting this and getting it fixed you've hopefully saved anyone else suffering this way.
Fixed on my branch: https://bitbucket.org/peterjc/galaxy-central/changeset/c05a680cbc80559b84c13...
That will be part of my next update to the BLAST+ wrappers on the Tool Shed.
Thanks,
Peter
Thanks Peter!
Shane
Sent from my iPad
On 20/10/2012, at 4:00 AM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Fri, Oct 5, 2012 at 8:43 AM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Thursday, October 4, 2012, Shane Sturrock wrote:
This section:
#Your blastdb_p.loc file should include an entry per line for each "base name" #you have stored. For example: # #nr_05Jun2010 NCBI NR (non redundant) 05 Jun 2010 /data/blastdb/05Jun2010/nr #nr_15Aug2010 NCBI NR (non redundant) 15 Aug 2010 /data/blastdb/15Aug2010/nr #...etc... # #See also blastdb.loc which is for any nucleotide BLAST database.
There are two tabs between the 2010 and /data
...
Sorry about that - but by reporting this and getting it fixed you've hopefully saved anyone else suffering this way.
Fixed on my branch: https://bitbucket.org/peterjc/galaxy-central/changeset/c05a680cbc80559b84c13...
That will be part of my next update to the BLAST+ wrappers on the Tool Shed.
Thanks,
Peter
On Oct 4, 2012, at 3:11 PM, Peter Cock p.j.a.cock@googlemail.com wrote:
On Thu, Oct 4, 2012 at 8:56 PM, Shane Sturrock shane@biomatters.com wrote:
I've just been setting up our internal galaxy server and following the instructions to get the NCBI BLAST+ tool working but I kept getting stuck with it passing -db "" in rather than the actual database name. It turns out, because I had copied the examples in the blastdb_p.loc file, I had put two tabs in just as in the sample doc. This doesn't work. It has to be one tab.
You're seeing a double tab in the blastdb_p.loc.sample file? Which line? I should fix that.
However, I would say that it would be better for the code to be fixed so it can handle a variable number of tabs rather than being so strict.
But that would break valid situations where you might want an empty field in a table. OK, not likely with the BLAST databases, but it could apply to other *.loc files in Galaxy.
Regards,
Peter
I agree with Peter. The same argument could be made for any horizontal whitespace (my editor for instance converts tabs to spaces), but the simple fact is you easily run into convoluted logic dealing with all of the edge cases when the easiest fix is to simply conform to the standard and use tabs.
chris
On Saturday, October 20, 2012, Fields, Christopher J wrote:
On Oct 4, 2012, at 3:11 PM, Peter Cock <p.j.a.cock@googlemail.comjavascript:;> wrote:
On Thu, Oct 4, 2012 at 8:56 PM, Shane Sturrock <shane@biomatters.comjavascript:;>
wrote:
I've just been setting up our internal galaxy server and following the instructions to get the NCBI BLAST+ tool working but I kept getting stuck with it passing -db "" in rather than the actual database name. It turns out, because I had copied the examples in the blastdb_p.loc file, I had put two tabs in just as in the sample doc. This doesn't work. It has to be one tab.
You're seeing a double tab in the blastdb_p.loc.sample file? Which line? I should fix that.
However, I would say that it would be better for the code to be fixed so it can handle a variable number of tabs rather than being so strict.
But that would break valid situations where you might want an empty field in a table. OK, not likely with the BLAST databases, but it could apply to other *.loc files in Galaxy.
Regards,
Peter
I agree with Peter. The same argument could be made for any horizontal whitespace (my editor for instance converts tabs to spaces), but the simple fact is you easily run into convoluted logic dealing with all of the edge cases when the easiest fix is to simply conform to the standard and use tabs.
chris
Another idea would be for Galaxy to issue a clear error if a loc file has an inconsistent number of fields/tabs per line.
Peter
That would certainly have saved me a fair bit of time since the only indication I got was an empty set of quotes in the error log. I always feel like tabs aren't the best separators for these sorts of files anyway since they are usually invisible but since that isn't likely to change, making it more obvious what the problem is would be the best approach.
Shane
On 21/10/2012, at 1:44 AM, Peter Cock wrote:
Another idea would be for Galaxy to issue a clear error if a loc file has an inconsistent number of fields/tabs per line.
Peter
-- For urgent cases, contact support@biomatters.com
Dr Shane Sturrock shane.sturrock@biomatters.com Senior Scientist Tel: +64 (0) 9 379 5064 76 Anzac Ave Auckland New Zealand
Yes, have to agree there, an error would be more informative.
chris
On Oct 20, 2012, at 2:38 PM, Shane Sturrock shane@biomatters.com wrote:
That would certainly have saved me a fair bit of time since the only indication I got was an empty set of quotes in the error log. I always feel like tabs aren't the best separators for these sorts of files anyway since they are usually invisible but since that isn't likely to change, making it more obvious what the problem is would be the best approach.
Shane
On 21/10/2012, at 1:44 AM, Peter Cock wrote:
Another idea would be for Galaxy to issue a clear error if a loc file has an inconsistent number of fields/tabs per line.
Peter
-- For urgent cases, contact support@biomatters.com
Dr Shane Sturrock shane.sturrock@biomatters.com Senior Scientist Tel: +64 (0) 9 379 5064 76 Anzac Ave Auckland New Zealand
FAO the Galaxy dev team,
I've tested a patch (at end of email) which issues a warning if loading a loc file with inconsistent numbers of tabs. In the case of blastdb_p.loc this would result in showing the warning three times, since currently Galaxy appears to reload a *.loc file for each tool using it.
(If the error happens to be on the first list, then the warning is triggered for all the following lines in the file - because they won't have the same number of fields as the first line.)
Could someone apply this to the trunk as is, or tell me if you would prefer it as a pull request on bitbucket?
Furthermore, would a more invasive change to treat this as an error condition be acceptable?
Thanks,
Peter
On Sun, Oct 21, 2012 at 5:53 AM, Fields, Christopher J cjfields@illinois.edu wrote:
Yes, have to agree there, an error would be more informative.
chris
On Oct 20, 2012, at 2:38 PM, Shane Sturrock shane@biomatters.com wrote:
That would certainly have saved me a fair bit of time since the only indication I got was an empty set of quotes in the error log. I always feel like tabs aren't the best separators for these sorts of files anyway since they are usually invisible but since that isn't likely to change, making it more obvious what the problem is would be the best approach.
Shane
On 21/10/2012, at 1:44 AM, Peter Cock wrote:
Another idea would be for Galaxy to issue a clear error if a loc file has an inconsistent number of fields/tabs per line.
Peter
Suggested patch:
$ hg diff lib/galaxy/tools/parameters/dynamic_options.py diff -r c05a680cbc80 lib/galaxy/tools/parameters/dynamic_options.py --- a/lib/galaxy/tools/parameters/dynamic_options.py Fri Oct 19 15:56:51 2012 +0100 +++ b/lib/galaxy/tools/parameters/dynamic_options.py Mon Oct 22 11:59:04 2012 +0100 @@ -471,6 +471,7 @@
def parse_file_fields( self, reader ): rval = [] + field_count = None for line in reader: if line.startswith( '#' ) or ( self.line_startswith and not line.startswith( self.line_startswith ) ): continue @@ -478,6 +479,17 @@ if line: fields = line.split( self.separator ) if self.largest_index < len( fields ): + if not field_count: + field_count = len(fields) + elif field_count != len(fields): + try: + name = reader.name + except AttributeError: + name = "a configuration file" + #Perhaps this should be an error, but even a warning is useful + log.warn("Inconsistent number of fields (%i vs %i) in %s using " + "separator %r, check line: %r" \ + % (field_count, len(fields), name, self.separator, line)) rval.append( fields ) return rval
Hello Peter,
Thanks very much for this - I'm sure many will find it very helpful. I've applied your patch to changeset revision 8096:0c56502c7fd7, which will be included in the next Galaxy release currently scheduled for next Friday (not today's currently scheduled release).
Thanks again,
Greg Von Kuster
On Oct 22, 2012, at 7:03 AM, Peter Cock wrote:
FAO the Galaxy dev team,
I've tested a patch (at end of email) which issues a warning if loading a loc file with inconsistent numbers of tabs. In the case of blastdb_p.loc this would result in showing the warning three times, since currently Galaxy appears to reload a *.loc file for each tool using it.
(If the error happens to be on the first list, then the warning is triggered for all the following lines in the file - because they won't have the same number of fields as the first line.)
Could someone apply this to the trunk as is, or tell me if you would prefer it as a pull request on bitbucket?
Furthermore, would a more invasive change to treat this as an error condition be acceptable?
Thanks,
Peter
On Sun, Oct 21, 2012 at 5:53 AM, Fields, Christopher J cjfields@illinois.edu wrote:
Yes, have to agree there, an error would be more informative.
chris
On Oct 20, 2012, at 2:38 PM, Shane Sturrock shane@biomatters.com wrote:
That would certainly have saved me a fair bit of time since the only indication I got was an empty set of quotes in the error log. I always feel like tabs aren't the best separators for these sorts of files anyway since they are usually invisible but since that isn't likely to change, making it more obvious what the problem is would be the best approach.
Shane
On 21/10/2012, at 1:44 AM, Peter Cock wrote:
Another idea would be for Galaxy to issue a clear error if a loc file has an inconsistent number of fields/tabs per line.
Peter
Suggested patch:
$ hg diff lib/galaxy/tools/parameters/dynamic_options.py diff -r c05a680cbc80 lib/galaxy/tools/parameters/dynamic_options.py --- a/lib/galaxy/tools/parameters/dynamic_options.py Fri Oct 19 15:56:51 2012 +0100 +++ b/lib/galaxy/tools/parameters/dynamic_options.py Mon Oct 22 11:59:04 2012 +0100 @@ -471,6 +471,7 @@
def parse_file_fields( self, reader ): rval = []
field_count = None for line in reader: if line.startswith( '#' ) or ( self.line_startswith and
not line.startswith( self.line_startswith ) ): continue @@ -478,6 +479,17 @@ if line: fields = line.split( self.separator ) if self.largest_index < len( fields ):
if not field_count:
field_count = len(fields)
elif field_count != len(fields):
try:
name = reader.name
except AttributeError:
name = "a configuration file"
#Perhaps this should be an error, but even a
warning is useful
log.warn("Inconsistent number of fields (%i
vs %i) in %s using "
"separator %r, check line: %r" \
% (field_count, len(fields), name,
self.separator, line)) rval.append( fields ) return rval
Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
On Wed, Oct 24, 2012 at 4:46 PM, Greg Von Kuster greg@bx.psu.edu wrote:
Hello Peter,
Thanks very much for this - I'm sure many will find it very helpful. I've applied your patch to changeset revision 8096:0c56502c7fd7, which will be included in the next Galaxy release currently scheduled for next Friday (not today's currently scheduled release).
Thanks again,
Greg Von Kuster
Excellent, and thanks for taking care of the line wrapping in my email (does the mailing list object to patches as attachments?).
Regards,
Peter
galaxy-dev@lists.galaxyproject.org