Hi, I have created a tool that will fetch sequences for selected IDs from a tabular file containing multiple IDs and additional info. I want the tool config to scan the first column of the tab file for IDs and provide the user with a selection box where they can select a single ID or multiple IDs and get output for all selected. The following method does this: <param name="tabfile" type="data" format="tabular" label="ID File"/> <param name="selection" type="select" multiple="true" accept_default="true" label="ID" > <options from_dataset="tabfile"> <column name="name" index="0"/> <column name="value" index="0"/> </options> </param> The issue is, if the top file in my history is a SAM file containing ~30,000 IDs in the first column the tool initially attempts to load these all in to the selection box and effectively crashes my local instance. I only want to use this on tab files that ill have ~100 IDs at most. I have got around this by creating a new datatype indexfile as a class of Tabular in tabular.py: class IndexFile( Tabular ): file_ext = 'indexfile' def sniff( self, filename ): return False And changing the input file to: <param name="tabfile" type="data" format="tabular" label="ID File"/> This means I must first set the tabular file to type indexfile, then it will be the only dataset shown under tabfile. Selecting options from a file is really useful, I was wondering if there is a better workaround for this or if a similar indexfile datatype could be included in Galaxy. Thanks Shaun -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On Fri, May 6, 2011 at 10:17 AM, SHAUN WEBB <swebb1@staffmail.ed.ac.uk> wrote:
Hi, I have created a tool that will fetch sequences for selected IDs from a tabular file containing multiple IDs and additional info.
I want the tool config to scan the first column of the tab file for IDs and provide the user with a selection box where they can select a single ID or multiple IDs and get output for all selected.
The following method does this:
<param name="tabfile" type="data" format="tabular" label="ID File"/> <param name="selection" type="select" multiple="true" accept_default="true" label="ID" > <options from_dataset="tabfile"> <column name="name" index="0"/> <column name="value" index="0"/> </options> </param>
The issue is, if the top file in my history is a SAM file containing ~30,000 IDs in the first column the tool initially attempts to load these all in to the selection box and effectively crashes my local instance.
I only want to use this on tab files that ill have ~100 IDs at most.
Maybe the <options from_dataset="tabfile"> tag could have a max setting? e.g. <options from_dataset="tabfile" max="100"> could load just the first 100 entries in the tabular file. That seems much more general than the new filetype idea. It could have a default max value which should useful. Thinking of the example of a tabular file of gene IDs for an organism, you might well want 20 to 30 thousand entries. Since Galaxy puts a search function on the selection, the UI should be OK. Its just the performance we need to worry about. Peter
participants (2)
-
Peter Cock
-
SHAUN WEBB