Hi Greg, Anton and all

Just wondering if there has been any progress made on this end. I am sorry I was not able to follow it up on Assaf's suggestion due to other things at work.

I did try the latest version of galaxy and looks like the files are still transferred over HTTP before they could be used in the galaxy workspace. Also I would again like to highlight that many labs might want to use the local instance of galaxy and prefer to point to a local path where the file is being stored. That way we will have both the benefits of using a cool GUI and process data stored locally.

Let me know if you guys need some feedback or have more questions. I will be happy to discuss them.


On Tue, Jul 21, 2009 at 4:26 PM, Greg Von Kuster <ghv2@psu.edu> wrote:
Hello Abishek,

We are currently in the process of significantly enhancing the current Galaxy upload utilities, and the new version should eliminate the issue you've raised about the time needed to upload large files via HTTP ( not for making an initial copy of the file in the Galaxy environment ). However, it will probably not be ready for release for a few more weeks, so if you can take advantage of Assaf's script in the meantime, that's great.  I can't guarantee that all Galaxy features will function correctly if you do this though.

Assaf, have you found that using your script breaks anything?

Also, if you upload a file to a library rather than a history, multiple users can "import" the library dataset into their history for analysis, but there is only 1 file on disk ( users are pointing to it from their histories ).  But uploading a file to a history will create a new copy of the file each time it is uploaded.

Greg Von Kuster
Galaxy Development Team

Abhishek Pratap wrote:
Hi All

@Greg : Please find my comments below.

On Tue, Jul 21, 2009 at 10:44 AM, Greg Von Kuster<ghv2@psu.edu> wrote:
Hello Abhi,

Can you clarify the steps you took that produced the behavior? †See my
comments below.

Anton Nekrutenko wrote:

Let talk. This is the area of active current development. We are †looking
at implementing a universal fastq-like format or supporting †multiple
formats. Perhaps we should join efforts in ironing out †specifications.

galaxy team

On Jul 20, 2009, at 5:18 PM, Abhishek Pratap wrote:

Hi All

I recently came to know about NGS analysis on galaxy during ISMB.
Getting excited I tried couple of things basically to play with it.

Few comments : I may have interepretted something described below in a
wrong way. My apologies before hand.

On a standalone installation of galaxy while I was trying to explore
one FASTQ(sequence) file. It takes considerable (> 20 min) for a fastq
file to get uploaded (2 GB).
Are you using the Galaxy upload utility to create an item in your history
that points to the dataset file on disk?

Yes that is precisely correct, I am trying to upload a solexa FASTQ
file but on a standalone galaxy installation from my local file

I am not sure what is the rationale
behind that. Ideally I think there should be no need to upload such
heavy files into the workspace.
A data file that originates from a place external to Galaxy must be uploaded
into Galaxy so that the disk file can be placed in the location configured
in the Galaxy config file. †Also, when data is uploaded to Galaxy ( either
to a history or a library ), several database table settings are created
that are used by various Galaxy features.

They could actually be used straight

Thanks for the clarification but I am not sure this will help a lot of
people who are interested to install and run galaxy locally mainly for
the following reasons. May be it is just local to me.

A. We already one instance of data saved on the local file system
B. Making another copy via galaxy will eat away a lot of space in long run.
C. The time needed to import the files into galaxy space is huge

away by the path specified.
What do you mean by "the path specified"?

Well what I mean was a way to specify the path of the file/run on the
lcoal file system and galaxy could directly pick it up from there
rather than uploading it into its own space. Now I understand this
might not work based on the way the system was designed.

Also is there any way to access the
scripts for analysis on the command line. I know this undermines the
main aim of working with galaxy but rite now I am concerned about the
You should be able to run any Galaxy tool from the command line as long as
you have all of the tool's required binaries in your path. †However, running
a tool from within Galaxy should generally not be any slower than running it
outside of Galaxy, depending, of course, on what you are doing.

Ok I was under the impression that running from SHELL will eliminate
the step of uploading them into galaxy file space.

I will be happy to discuss more about this in case you have some
comments/questions for me.



Abhishek Pratap

Bioinformatics Software Engineer

Institute for Genome Sciences

School of Medicine, Univ of Maryland

801, W. Baltimore Street, Baltimore, MD 21209

Ph: (+1)-410-706-2296

galaxy-user mailing list
Anton Nekrutenko

galaxy-user mailing list