Looking for ideas to download multiple files
Hi all, I have some skilled end users that would like to download multiple files from our local galaxy data library with command line utilities (i.e. they would like to "wget" some files from histories). One way to do this is to copy the item link within the history and paste as wget argument. Unfortunately, this has to be done for every item. Another way is to select all files from the Data Library and download as archive (but you can't copy the link...) and it takes so much time when you have to download BAM files (or any other huge one)... I was looking at the source code wondering how difficult would be to add a functionality for the data library that is: For selected items "get the public link" that saves a text file with all the public links (or copies them to pasteboard, or simply displays them in another window....) so that one can use that info to script the download somewhere else (and at any time)... Any suggestion is welcome! :-) d /* Davide Cittaro Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it */
Davide Cittaro wrote:
Hi all, I have some skilled end users that would like to download multiple files from our local galaxy data library with command line utilities (i.e. they would like to "wget" some files from histories). One way to do this is to copy the item link within the history and paste as wget argument. Unfortunately, this has to be done for every item. Another way is to select all files from the Data Library and download as archive (but you can't copy the link...) and it takes so much time when you have to download BAM files (or any other huge one)... I was looking at the source code wondering how difficult would be to add a functionality for the data library that is:
For selected items "get the public link"
that saves a text file with all the public links (or copies them to pasteboard, or simply displays them in another window....) so that one can use that info to script the download somewhere else (and at any time)... Any suggestion is welcome! :-)
Hi Davide, I wrote up a proof of concept for this that basically does the latter of your suggestions and displays the links on the resulting page after clicking "Go" in the library. One problem, though, is that unless the server is using HTTP Authentication, users will not be able to access any non-public files this way. We could work around this by generating a link to the API that includes the user's API key, if the API is enabled and the user has generated a key (and the API had a library dataset download method, which it currently does not). But this worries me, since users may not be clear that sharing the URL with others would be like sharing their account details. --nate
d
/* Davide Cittaro
Cogentech - Consortium for Genomic Technologies via adamello, 16 20139 Milano Italy
tel.: +39(02)574303007 e-mail: davide.cittaro@ifom-ieo-campus.it <mailto:davide.cittaro@ifom-ieo-campus.it> */
------------------------------------------------------------------------
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On Fri, Sep 10, 2010 at 03:19:54PM -0400, Nate Coraor wrote:
Davide Cittaro wrote:
For selected items "get the public link"
that saves a text file with all the public links (or copies them to pasteboard, or simply displays them in another window....) so that one can use that info to script the download somewhere else (and at any time)... Any suggestion is welcome! :-)
Hi Davide,
I wrote up a proof of concept for this that basically does the latter of your suggestions and displays the links on the resulting page after clicking "Go" in the library. One problem, though, is that unless the server is using HTTP Authentication, users will not be able to access any non-public files this way.
Another way around that is to have the public link be a 128-random-bits-as-hex URL which gets stashed in a table suitable for lookup. Then anyone with the non-guessable URL to that download/result can pull it using a copy/pasteable/wgetable URL and if they share it they're only sharing that result, which is what they think they're doing. That is, of course, some work to code though. -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
Ry4an Brase wrote:
On Fri, Sep 10, 2010 at 03:19:54PM -0400, Nate Coraor wrote:
Davide Cittaro wrote:
For selected items "get the public link"
that saves a text file with all the public links (or copies them to pasteboard, or simply displays them in another window....) so that one can use that info to script the download somewhere else (and at any time)... Any suggestion is welcome! :-) Hi Davide,
I wrote up a proof of concept for this that basically does the latter of your suggestions and displays the links on the resulting page after clicking "Go" in the library. One problem, though, is that unless the server is using HTTP Authentication, users will not be able to access any non-public files this way.
Another way around that is to have the public link be a 128-random-bits-as-hex URL which gets stashed in a table suitable for lookup. Then anyone with the non-guessable URL to that download/result can pull it using a copy/pasteable/wgetable URL and if they share it they're only sharing that result, which is what they think they're doing.
That is, of course, some work to code though.
It makes me a bit nervous as that is essentially an end-run on our established security model. It seems like creating public links to private data is something a user should do very explicitly, with warnings, and there should be a clear way to disable the links. Perhaps the links should only be valid for 1 (or a user defined count that defaults to 1) download. --nate
On Mon, Sep 13, 2010 at 01:01:36PM -0400, Nate Coraor wrote:
It makes me a bit nervous as that is essentially an end-run on our established security model. It seems like creating public links to private data is something a user should do very explicitly, with warnings, and there should be a clear way to disable the links. Perhaps the links should only be valid for 1 (or a user defined count that defaults to 1) download.
Making the 'publish' URLs something one requests the creation of and something one can invalidate manually are certainly positives. With both of those in place you've pretty much described the "private" class of data in Google's docs, maps, and picasa products. In other contexts I've tried each of: - urls that are only valid N times - urls that are only valid N minutes after their creation - urls that are only valid N minutes after their first use - urls that are only valid X times in N minutes after their first use and concluded they're just unworkable for a bunch of annoying reasons: - people want to test URLs before they email them - content filtering proxy servers sometimes test URLs before the browser gets a crack at it - browser/wget download resume re-accesses the same URL (with a range-request) with the bug reports showing up as "worked and then didn't!". Not worth it. -- Ry4an Brase 612-626-6575 University of Minnesota Supercomputing Institute for Advanced Computational Research http://www.msi.umn.edu
participants (3)
-
Davide Cittaro
-
Nate Coraor
-
Ry4an Brase