Hi,
I'm trying to wrap up my own tool in Galaxy. The input to my tool include the set of EST (such as the entire human collection). I tried using UCSC genome browser but it doesn't seem to let me download the whole human collection due to the size of the data.
I tried to implement my own FTP client and try to wrap that up in galaxy. I intend to have the FTP client download data from NCBI's FTP server directly, and have the downloaded files as output files to feed back into galaxy. I intend to make the FTP client somewhat generic, so as not to enforce the type of files. Though in my case, I would be download gzipped genbank files.
But galaxy support for multiple output files kind of tripped me over. I do not know exactly what to do, since it looks as if galaxy requires a strict naming convention for the outputs, according to http://gmod.827538.n3.nabble.com/Multiple-output-not-known-until-tool-run-td1734071.html (the case I have is obviously that the number of files would not be known until run time).
I guess it doesn't really, really matter, if I send those files, whatever the naming convention are, and fed it to a gzip decompressor (which I am planning to do a simple wrap up, just to be able to handle my stuff). Then it should all work out fine.
Alternatively, I can just ask user to download from NCBI ftp themselves, decompress them, and upload it to galaxy.
What's the best approach here?
And I noticed that file types does not include genbank types nor gzip types. Is there some generic type I could use? Just Data class?
Timothy