On Thu, Sep 15, 2011 at 10:32 AM, Timothy Wu <2huggie@gmail.com> wrote:
On Thu, Sep 15, 2011 at 4:58 PM, Peter Cock <p.j.a.cock@googlemail.com> wrote:
Perhaps I have misunderstood you, but I'd just use the provided "Upload Data" tool, and paste in the FTP URL for the file, e.g. an NCBI FTP URL.
I wasn't aware that the Upload data tool could take a FTP URL, so thanks for letting me know.
Unfortunately that doesn't take a wild card.
I need to have the path specification like this "ftp://ftp.ncbi.nih.gov/genbank/gbest*.seq.gz" at the minimum.
Actually my tool is more versatile (though I don't need it for this particular application).
I could specify
ftp://ftp.ncbi.nih.gov/genomes/*/*/NC_*.fna
and grab all the fasta files for all chromosome of all species under the genomes directory. I thought it would be a nice tool to have in my galaxy arsenal.
Timothy
That volume of data shouldn't really be uploaded into individual Galaxy user's histories (not unless you have a Galaxy setup with an unusually high disk quota per user - lucky you). This seems ideal for the Galaxy data library functionality, where the Galaxy admin loads the big data sets and makes them available to all the Galaxy users (or a subset using access controls). For the user's history the files are just linked to - so there is only one copy on disk. http://wiki.g2.bx.psu.edu/Admin/Data%20Libraries/Libraries However, we'd also like easy access to (some of) the files on ftp://ftp.ncbi.nih.gov/genomes/ so a new "NCBI Genomes FTP-site Data Source Tool" as part of Galaxy would be nice (like the existing UCSC data source etc). Peter