Re: [galaxy-user] Experience with Loading NGS data on standalone instance of galaxy
Hi Greg Based on the testing of the new feature I am not sure if it working as expected. I gave galaxy a path to GERALD folder (which is a analysis folder generated by Illumina pipeline) and selected file type to be just "solexafastq". It took about two hours for the process to be completed. Somehow it went through each file and uploaded everything in the library. Ideally it should look up only the relevant fastq files and upload total of 8 files (1 / lane ) present in the folder. This process should be very fast as it would just record the path. Thanks, -Abhi On Mon, Oct 5, 2009 at 11:37 AM, Abhishek Pratap <abhishek.vit@gmail.com>wrote:
I made sure I selected just one file format(fastqsolexa). The path I gave was to a GERLAD folder. Basically a folder with bunch of analysis files plus 8 fastq files. I would say it is still taking a long time to put the path of only 8 fastq files in question to galaxy system.
-Abhi
On Mon, Oct 5, 2009 at 11:21 AM, Greg Von Kuster <ghv2@psu.edu> wrote:
This logging looks correct. I replied to your previous message: regarding this issue, are you setting the datatype manually or allowing for "auto-detect"? Manually setting the datatype would probably be a good idea to speed things up. If not all files being uploaded are of the same type, pick one and then fix the others later using the "Edit this dataset's information" feature.
Abhishek Pratap wrote:
Hi Greg
I hope you had a good weekend. Carrying on from where we left it. I am not able to get to the required page from where I believe I should be able to upload datasets without them getting copied to galaxy filepath.
This is what I did
1. Admin -> Data -> Manage Data Libraries 2. Create new data library datasets -> Selected (Upload Files from Filesystem Paths) 3. Pasted the path in the text box 4. Checked the checkbox Copy Data into Galaxy ? NO
It is taking very long for the dataset pseudo upload to galaxy filesystem. I can see the following incrementing entries in the log file. Is it the way it is supposed to be ?
galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:42,189 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:43,892 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:45,598 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:47,380 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:49,022 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:50,446 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:52,029 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:53,491 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:55,134 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:56,788 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:11:58,433 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:00,112 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:01,752 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:03,712 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:05,525 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:07,235 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:09,062 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:10,864 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:12,914 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:14,567 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:16,240 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp galaxy.tools.actions.upload_common DEBUG 2009-10-05 11:12:18,057 DEBUGDEBUG: In Test_GA_Data_Set, found a folder name match: 8:Temp
Thanks, -Abhi
On Fri, Oct 2, 2009 at 4:26 PM, Abhishek Pratap <abhishek.vit@gmail.com
wrote:
I found the error. I had multiple test instance running and I was using a wrong one to see the changes. My bad.
-A
On Fri, Oct 2, 2009 at 4:20 PM, Greg Von Kuster <ghv2@psu.edu> wrote:
Hmm...ok, can you attach your universe_wsgi.ini file so I can take a look?
Abhishek Pratap wrote:
I see
Analyze Data  Workflow Data Libraries Help User
-A
On Fri, Oct 2, 2009 at 4:13 PM, Greg Von Kuster <ghv2@psu.edu>
wrote:
> > What so you see in the top menu bar? > > Do you see this? > > Analyze Data † †Workflow † † † †Data Libraries † † † > † †Admin † †Help >
Abhishek Pratap wrote:
Based on the testing of the new feature I am not sure if it working as expected. I gave galaxy a path to GERALD folder (which is a analysis folder generated by Illumina pipeline) and selected file type to be just "solexafastq". It took about two hours for the process to be completed. Somehow it went through each file and uploaded everything in the library. Ideally it should look up only the relevant fastq files and upload total of 8 files (1 / lane ) present in the folder. This process should be very fast as it would just record the path.
Hi Abhi, The file type selector doesn't determine what type of files to search for - partially because it predates any sort of multiple upload feature, and partially because it's not always possible to trust a file's extension (if it even has a recognizable extension). If such a feature would be useful, a feature request can certainly be created for it. If you only need 8 files out of a large directory of files, it'd be best to simply paste the path to those 8 files in the upload box. Also, be sure you're checking "No" to the "Copy data into Galaxy?" box to ensure data is only "linked" and not copied. --nate
Hi Nate Explicitly specifying the files names make the upload to work the way I expected it should. I am planning to discuss few of such changes with Anton/Greg sometime next month specially in context of NGS related data. Also when the data is exported from library to history for analysis, is is actually copied ? I see that part taking considerable time. Thanks, -Abhi On Thu, Oct 8, 2009 at 3:37 PM, Nate Coraor <nate@bx.psu.edu> wrote:
Abhishek Pratap wrote:
Based on the testing of the new feature I am not sure if it working as
expected. I gave galaxy a path to GERALD folder (which is a analysis folder generated by Illumina pipeline) and selected file type to be just "solexafastq". It took about two hours for the process to be completed. Somehow it went through each file and uploaded everything in the library. Ideally it should look up only the relevant fastq files and upload total of 8 files (1 / lane ) present in the folder. This process should be very fast as it would just record the path.
Hi Abhi,
The file type selector doesn't determine what type of files to search for - partially because it predates any sort of multiple upload feature, and partially because it's not always possible to trust a file's extension (if it even has a recognizable extension). If such a feature would be useful, a feature request can certainly be created for it.
If you only need 8 files out of a large directory of files, it'd be best to simply paste the path to those 8 files in the upload box.
Also, be sure you're checking "No" to the "Copy data into Galaxy?" box to ensure data is only "linked" and not copied.
--nate
Abhishek Pratap wrote:
Explicitly specifying the files names make the upload to work the way I expected it should. I am planning to discuss few of such changes with Anton/Greg sometime next month specially in context of NGS related data.
Also when the data is exported from library to history for analysis, is is actually copied ? I see that part taking considerable time.
No, it is effectively linked. In Galaxy terminology, the Library dataset object and History dataset object point to the same Dataset object, which points to the actual file on disk. --nate
participants (2)
-
Abhishek Pratap
-
Nate Coraor