Re: [galaxy-dev] Inform tool interface with data specific to selected dataset

10 May 2014

      Sure.  I'll try to be concise; approach was sketched out about a month ago on the board.   I'll be uploading our generalized reporting tool which can be an example of this once it has tests, but for now the bare bones:

Background: we wanted the ability to launch a Blast search of a number of fasta sequences, and then have the results displayed in an HTML form, by query and hits, and then allow a user to select hits for particular queries and have them show up in their own datasets, each ready to have a phylogenetic tree visualization pipeline of tools.  The reason an HTML form was called for is that one can then see for each hit various columns of information, that then allow you to make a decision about whether you want that hit or not in the next stage.  

So first we have a dataset containing choice information, say this combo of BLAST nucleotide sequence search and hit info. (search query row indicated by "1" in query column):

Accession ID	pident	length	sequence	Query	Row			
Assembly_67_BCC1	-	-	AGGAC...TGCA	1	1				
gi|158343637|gb|EU057648.1|	99.55	442	AGGAC...TGCA	0	2
gi|158343987|gb|EU057686.1|	99.10	442	AGGAC...TGCA	0	3
gi|158343677|gb|EU057652.1|	98.87	387	TGGAC...TGCA	0	4 
...
Assembly_67_BCC8	-	-	ATGG...CCC	1	5
...

Tool A: "Selection Form": takes in above info, provides an HTML report in which an HTML form provides the necessary input to Tool B.

Tool B: "Selection Tool": takes in same dataset as above, but generates output file that includes only selected rows of data (and only desired columns).  (The nice thing about Tool B is that it can be set up to work directly on the above dataset without needing to be fed by Tool A, its just that when called up directly, it only offers a selection list as provided by its own XML form spec.)

Tool A:

Starting in tool XML, we indicate a) input type of data to select in history, b) html output file where form is built, c) some useful ids related to the input data file (don't confuse id with hid or dataset_id!).  "tool_input_dataset_file.id" is the one we need to pass to Tool B. 

<tool id="bccdcBLASTreporting" name="BLAST Reporting" version="1.0.4">
	...
	<command interpreter="python">
my_python.py $tool_input_dataset_file $html_file $tool_input_dataset_file.hid:$tool_input_dataset_file.dataset_id:$tool_input_dataset_file.id
-f "
	...
	</command>
	...
	<inputs>
		<param name="tool_input_dataset_file" type="data" format="[e.g. tabular, or whatever type in history]" label="My insightful results"/> 
	...
	</inputs>	
	<outputs>
		...
		<data format="html" name="html_file" label="HTML report for data $tool_input_dataset_file.hid" />
	</outputs>

Tool A builds the html form.  The only trick here is that you have to load the Tool B form in galaxy, and view its frame's source code to see the right values for tool_id and tool_state (an initial tool_state value seems to work fine).  I use a dictionary lookup to store these, and combine with string replacement in a multi-line string for simple html templating.  Below is code slightly adapted for this writeup. 

	in_file, out_html_file, selection_file_data = args
	sel_file_fields = selection_file_data.split(':')

	self.lookup = {
		'timestamp': time.strftime('%Y/%m/%d'),
		'tool_id': 'bccdcSelectSubset',
		'tool_state':'800.....................71002e',
		'select_row':0,
		'dataset_selection_id': sel_file_fields[2]
	}

	form_html = """

		<div style="float:right" id="buttonPrint" class="nonprintable">
			<button onclick="window.print()">Print</button>
		</div>

		<form id="tool_form" name="tool_form" action="../../../tool_runner"  target="galaxy_main" method="post" enctype="application/x-www-form-urlencoded">
	   		<input type="hidden" name="refresh" value="refresh"/>
	            	<input type="hidden" name="tool_id" value="%(tool_id)s"/>
	                <input type="hidden" name="tool_state" value="%(tool_state)s">
			<input type="hidden" name="input" value="%(dataset_selection_id)s"/>			
			<input type="hidden" name="incl_excl" value="1"/>

			<input type="submit" class="btn btn-primary nonprintable" name="runtool_btn" value="Submit">

			""" % self.lookup

   	with open(html_file, 'w') as fp_out:
   	   	fp_out.write(HTML_REPORT_HEADER_FILE)
   	   	fp_out.write(form_html)
   	   	...
And now write out all the table stuff for each row in input file with a checkbox selector:
   	   	with open(in_file) as f_in:
			for line in f_in:
				rowdata = line.split('\t')
				self.lookup['select_row'] +=1
				tdTags = ''
				for (col, field) in enumerate(self.display_columns):
					lookup['value'] = rowdata[col]

					if (col == 0):
						tdTags += '<td><input type="checkbox" name="select" value="%(select_row)s" />%(value)s</td>' % self.lookup
					else:
						tdTags += '<td>%(value)s</td>' % self.lookup

				fp_out.write("""\n\t\t\t<tr>%s</tr>""" % tdTags)
   	   	...

   	   	fp_out.write(HTML_REPORT_FOOTER_FILE)

Tool B:

To keep it simple this one just does a single output dataset but I can show a multiple output datset one, one for each set of query hits selected above if you want.  ' force_history_refresh="True" ' is supposed to refresh the history list after this executes all of its file writing but for some reason that doesn't seem to work on my galaxy.

 <tool id="bccdcSelectSubset" name="Select subsets" force_history_refresh="True">
	<command interpreter="python">
        select_subset.py $input $output1 $output1.id $__new_file_path__ $incl_excl $select
	</command>
	<inputs>
		<param name="input" type="data" format="tabular" label="Numbered tabular input file"/>
		<param name="incl_excl" type="select" format="text" label="Include or exclude selection?">
			<option value="1">Include selection</option>
			<option value="0">Exclude selection</option>
		</param>
		<param name="select" type="select" multiple="true" display="checkboxes" label="Select lines below">
				<options from_dataset="input">
					<column name="name" index="0"/>
					<column name="value" index="-1"/>
				</options>
		</param>
	</inputs>
	<outputs>
		<data name="output1" format="tabular" metadata_source="input" label="$tool.name on data $input.hid"/>
	</outputs>
	<help>

.. class:: infomark

**What it does**

This tool produces a tabular file with a subset of the lines in its input tabular file.
	</help>
</tool>

And the python:
'''
python select_subset.py $input $output $incl_excl $select
'''

def stop_err( msg ):
    sys.stderr.write("%s\n" % msg)
    sys.exit(1)

import sys

try:
    input, output, incl_excl, select = sys.argv[1:]
except:
    stop_err('you must provide the arguments input, output, incl_excl and select.')

lines = {}
try:
    lines = dict([(int(num), '') for num in select.split(',')])
except:
    stop_err('Did you remember to number the input dataset?')

include = bool(int(incl_excl))
if include:
    print 'Including selected lines...'
else:
    print 'Excluding selected lines...'

f_out = open(output, 'w')
with open(input) as f_in:
    for line in f_in:
        cols = line.split('\t')
        try:
            num = int(cols[-1])
        except:
            stop_err('Did you remember to number the input dataset?')
        if include:
            if num in lines:
                f_out.write('\t'.join(cols[:-1])+'\n')
        else:
            if not num in lines:
                f_out.write('\t'.join(cols[:-1])+'\n')
f_in.close()
f_out.close()

print 'Done.'

________________________________________
From: Igor Topcin [igortopcin@gmail.com]
Sent: Friday, May 09, 2014 1:05 PM
To: Dooley, Damion
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Inform tool interface with data specific to selected dataset

Hi Damion,
Would you mind sharing your approach with us all?
Thanks!
Igor

On May 9, 2014 1:51 PM, "Dooley, Damion" <Damion.Dooley@bccdc.ca<mailto:Damion.Dooley@bccdc.ca>> wrote:
Hello, Eric,

If the dynamic filters approach doesn't work out I can send you an approach that worked for me.  It involves creating a tool-generated html report that contains a form which provides selection choices; and the form is set to submit to a 2nd tool of your choice tool (it contains the necessary fields to prime the tool).  Not sure if it works on every breed of galaxy out there though.

d.