I’m toying around a little in galaxy-dist with the dataset collections feature. Since I know this is work in progress, I was wondering about some things I haven’t really found online.
It seems to work really well to run a tool on a list of datasets, and a new job is run for each list item. But when I want to reduce to a smaller amount of list items, I understand I need to write some sort of merge tool myself, dependent on the data (all proteomics data here currently). This works well for reducing a dataset to a single file, but I am not sure about how to reduce to a new smaller collection. In the tool I’m writing, I let the user choose the size of the collection.
Is there some way to tell galaxy dynamically how many outputs to expect AND put them in a collection? Something like:
<output type=“data_collection” amount_of_files=“3”/>
Where 3 is set by the user in a param also.
Also, when running with two or more lists as input, is there some sort of correlation between the lists? It seems like it takes the files in dataset no order, so just checking.
By the way, thanks very much John and everyone else involved in collections for doing and pushing this stuff. If there are smaller issues I can help with, I’d be thrilled. Can’t stress enough how much this feature means for galaxy adoption in our lab and possibly field.
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden