On Fri, Nov 2, 2012 at 6:03 PM, Oleksandr Moskalenko <om@hpc.ufl.edu> wrote:
Here is the blastn rule procedure code and the relevant snippet of the default runner procedure. I just added the database based multiplier, so this part is very simple at the moment. I just set a bogus multiplier of "4" for the "nt_*" databases as an example.
def ncbi_blastn(job): nodes = 1 ppn = 4 walltime='167:00:00' inp_data = dict( [ ( da.name, da.dataset ) for da in job.input_datasets ] ) inp_data.update( [ ( da.name, da.dataset ) for da in job.input_library_datasets ] ) query_file = inp_data[ "query" ].file_name query_size = os.path.getsize( query_file ) inp_params = dict( [ ( da.name, da.value ) for da in job.parameters ] ) inp_params.update( [ ( da.name, da.value ) for da in job.parameters ] ) db_dict = eval(inp_params['db_opts']) db = db_dict['database'] db_multiplier = 1 if db.startswith('nt'): db_multiplier = 4 if query_size <= 20 * 1024 * 1024: pmem = 500 pmem_unit = 'mb' elif query_size > 20 * 1024 * 1024 and query_size <= 50 * 1024 * 1024: pmem = 750 pmem_unit = 'mb' elif query_size > 50 * 1024 * 1024 and query_size <= 100 * 1024 * 1024: pmem = 1500 pmem_unit = 'mb' elif query_size > 100 * 1024 * 1024 and query_size <= 500 * 1024 * 1024: pmem = 2 pmem_unit = 'gb' elif query_size > 500 * 1024 * 1024 and query_size <= 1000 * 1024 * 1024: pmem = 4 pmem_unit = 'gb' elif query_size > 1000 * 1024 * 1024 and query_size <= 2000 * 1024 * 1024: pmem = 10 pmem_unit = 'gb' elif query_size > 2000 * 1024 * 1024: pmem = 20 pmem_unit = 'gb' log.debug('OM: blastn query size is in the bigmem category: %skb\n' % (query_size)) else: pmem = 5 pmem_unit = 'gb' if db_multiplier > 1: pmem = int(pmem * db_multiplier) pmem_str = "%d%s" % (pmem, pmem_unit) log.debug('OM: blastn query: %skb, db: %s, pmem: %s\n' % (query_size, db, pmem_str)) return {'nodes':nodes,'ppn':ppn,'pmem':pmem_str,'walltime':walltime}
def default_runner(tool_id, job): ... elif tool_id_src.startswith('ncbi_blastn_wrapper'): request = ncbi_blastn(job) ... drmaa = 'drmaa://%s%s%s/' % (queue_str, group_str, request_str) return drmaa
Hi Alex, This is great and definitely helped me get going!. I found a few issues related to my local configuration. Like I'm using ncbi_blastn_wrapper that was migrated to the tool_shed, so I had to use: elif 'ncbi_tblastn_wrapper' in tool_id_src Instead of: elif tool_id_src.startswith('ncbi_blastn_wrapper'): The id for the tool from the shed_tool is: toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastn_wrapper/0.0.13 Hopefully this won't break later on. I also need to go back a do a better configuration of our local grid engine( using SGE ), as I did only a very bare bone installation and I'm running into this error: DeniedByDrmException: code 17: unknown resource "nodes" Which I realize is a configuration issue in my SGE. Last, and this was my mistake. I didn't initially realize that this example you share here, assumes all tools will call default_runner(), which in turn will call an specific function to figure out the drmaa options to set. I was trying to use the lines from your previous email: ncbi_blastn_wrapper = dynamic:///python/ncbi_blastn ncbi_blastp_wrapper = dynamic:///python/ncbi_blastp But note to myself and anybody else following the thread, the function being called from universe_wsgi.ini needs to return a proper drmaa/pbs url. Thanks again!, Carlos