Setting job_working_directory on per-tool basis? (cuffdiff/cummerbund SQLite db error)
Hello I've encountered an issue with running the cuffdiff tool on our local production Galaxy instance. Specifically when a cummerbund SQLite database is requested as output then the tool fails with an error: Creating database ./cummeRbund.sqlite Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Calls: readCufflinks ... .local -> sqliteGetQuery -> sqliteSendQuery -> .Call Execution halted Our Galaxy is configured to use a LustreFS scratch directory for the 'job_working_directory', which I understand is where each job will have its own working directory created, and I'm wondering if the SQLite error above is a manifestation of a general problem with SQLite on LustreFS (see e.g. the answer to this BioStars question https://www.biostars.org/p/115452/). The obvious workaround is to change 'job_working_directory', preferably just for the cuffdiff tool. Does anyone know if it's possible to do this i.e. set 'job_working_directory' on a per-tool basis? (Also, has anyone else seen this specific problem with cuffdiff/cummerbund/SQLite on their local Galaxy instances?) Thanks for your help - any suggestions greatly appreciated, Best wishes Peter -- Peter Briggs peter.briggs@manchester.ac.uk Bioinformatics Core Facility University of Manchester B.1083 Michael Smith Bldg Tel: (0161) 2751482
Just to follow up on my own questions, in case it's of interest to others: -- Re specifying different job working directories on a per-tool basis: Having spent a bit of time looking at the "advanced" sample job_conf.xml and also at the code for setting up and dispatching jobs, it looks to me like it is not possible to do this (i.e. specify different job working directories within different job destinations). In fact as far as I can tell (I'm slightly lost within the Galaxy code at this point), it looks like the working directory is created within the job wrapper (via the objectstore interface) before the handler is assigned. I'm wondering if a possible workaround might be to use Pulsar running on the same server, but it looks like a lot of overhead for my current problem (especially as I don't have any experience currently with using Pulsar). -- Re cuffdiff cummeRbund SQLite database error: It looks like changing the job_working_directory to a non-lustre filesystem fixes this problem. Best wishes Peter On 08/03/16 09:37, Peter Briggs wrote:
Hello
I've encountered an issue with running the cuffdiff tool on our local production Galaxy instance.
Specifically when a cummerbund SQLite database is requested as output then the tool fails with an error:
Creating database ./cummeRbund.sqlite Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Calls: readCufflinks ... .local -> sqliteGetQuery -> sqliteSendQuery -> .Call Execution halted
Our Galaxy is configured to use a LustreFS scratch directory for the 'job_working_directory', which I understand is where each job will have its own working directory created, and I'm wondering if the SQLite error above is a manifestation of a general problem with SQLite on LustreFS (see e.g. the answer to this BioStars question https://www.biostars.org/p/115452/).
The obvious workaround is to change 'job_working_directory', preferably just for the cuffdiff tool. Does anyone know if it's possible to do this i.e. set 'job_working_directory' on a per-tool basis?
(Also, has anyone else seen this specific problem with cuffdiff/cummerbund/SQLite on their local Galaxy instances?)
Thanks for your help - any suggestions greatly appreciated,
Best wishes
Peter
-- Peter Briggs peter.briggs@manchester.ac.uk Bioinformatics Core Facility University of Manchester B.1083 Michael Smith Bldg Tel: (0161) 2751482
Yeah - your discoveries are exactly right. 16.04 has added the ability to configure a bunch of extra things on a per-job-destination basis but this isn't included yet - probably should be though. Pulsar would be a way to go - hopefully soon Pulsar will be included with Galaxy directly and it will be as easy as declaring a new job destination (https://github.com/galaxyproject/galaxy/issues/1378) but I don't think that will make it into 16.04. -John On Wed, Mar 9, 2016 at 2:41 PM, Peter Briggs <peter.briggs@manchester.ac.uk> wrote:
Just to follow up on my own questions, in case it's of interest to others:
-- Re specifying different job working directories on a per-tool basis:
Having spent a bit of time looking at the "advanced" sample job_conf.xml and also at the code for setting up and dispatching jobs, it looks to me like it is not possible to do this (i.e. specify different job working directories within different job destinations).
In fact as far as I can tell (I'm slightly lost within the Galaxy code at this point), it looks like the working directory is created within the job wrapper (via the objectstore interface) before the handler is assigned.
I'm wondering if a possible workaround might be to use Pulsar running on the same server, but it looks like a lot of overhead for my current problem (especially as I don't have any experience currently with using Pulsar).
-- Re cuffdiff cummeRbund SQLite database error:
It looks like changing the job_working_directory to a non-lustre filesystem fixes this problem.
Best wishes
Peter
On 08/03/16 09:37, Peter Briggs wrote:
Hello
I've encountered an issue with running the cuffdiff tool on our local production Galaxy instance.
Specifically when a cummerbund SQLite database is requested as output then the tool fails with an error:
Creating database ./cummeRbund.sqlite Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Error in sqliteSendQuery(con, statement, bind.data) : error in statement: disk I/O error Calls: readCufflinks ... .local -> sqliteGetQuery -> sqliteSendQuery -> .Call Execution halted
Our Galaxy is configured to use a LustreFS scratch directory for the 'job_working_directory', which I understand is where each job will have its own working directory created, and I'm wondering if the SQLite error above is a manifestation of a general problem with SQLite on LustreFS (see e.g. the answer to this BioStars question https://www.biostars.org/p/115452/).
The obvious workaround is to change 'job_working_directory', preferably just for the cuffdiff tool. Does anyone know if it's possible to do this i.e. set 'job_working_directory' on a per-tool basis?
(Also, has anyone else seen this specific problem with cuffdiff/cummerbund/SQLite on their local Galaxy instances?)
Thanks for your help - any suggestions greatly appreciated,
Best wishes
Peter
-- Peter Briggs peter.briggs@manchester.ac.uk Bioinformatics Core Facility University of Manchester B.1083 Michael Smith Bldg Tel: (0161) 2751482 ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
John Chilton
-
Peter Briggs