colleagues,
in our adaptation of galaxy for large-scale natural language
processing, a fairly common use pattern is to invoke a workflow on a
potentially large number of text files. hence, i am wondering about
facilities for uploading an archive (in ‘.zip’ or ‘.tgz’ format, say)
containing several files, where i would like the upload tool to
extract the files from the archive, import each individually into my
history, and (maybe optionally) create a list collection for the set
of files.
in my current galaxy instance (running version 2015.03), when i upload
a multi-file ‘.zip’ file, part of the above actually happens: however,
the upload tool only imports the first file extracted from the archive
(and helpfully shows a warning message on the corresponding history
entry). have there been relevant changes in this neighborhood in more
recent galaxy releases?
related to the above, we have started to experiment with potentially
large collections and are beginning to worry about the scalability of
the collection mechanism. in principle, we would like to operate on
collections comprised of tens or hundreds of thousands of individual
datasets. what are common collection sizes (in the number of
components, not so much in the aggregate file size) used in other
galaxy instances to date? what kind of gut reaction do galaxy
developers have to the idea of a collection containing, say, a hundred
thousand entries?
with thanks in advance,
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/