Re: [galaxy-dev] Weird problem with importing datasets
On 09/03/10 15:42, Chris Cole wrote:
On 09/03/10 14:53, Nate Coraor wrote:
Chris Cole wrote:
Anyone got any ideas on this?
It's still not working (following the most recent updates) and we've got some NGS data to import into Galaxy.
Hi Chris,
To use this feature, the contents of library_import_dir should themselves be directories (or symlinks to the same). Selecting that directory from the dropdown in the library upload form will then import the contents of those directories.
Ah, right. That's not clear in the docs.
How come it works with locally symlinked files, then?
Just checked it again (sorry I shouldn't have sent the previous email) with a directory and it still isn't working. www-galaxy@ge-002: tmp> cd ~/data_import/ www-galaxy@ge-002: data_import> ls README www-galaxy@ge-002: data_import> ln -s /homes/pschofield/data/TOH www-galaxy@ge-002: data_import> ls TOH/ 0min_1.txt 30m_1.txt 60min_1.txt Cd_inuc_1.txt Cmono_1.txt Dd_inuc_1.txt Dmono_1.txt processed 0min_2.txt 30m_2.txt 60min_2.txt Cd_inuc_2.txt Cmono_2.txt Dd_inuc_2.txt Dmono_2.txt www-galaxy@ge-002: data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 www-galaxy barton 26 Mar 9 15:59 TOH -> /homes/pschofield/data/TOH I got the TOH option in the 'Upload directory of files', which I selected, but the upload failed. Again, this is the error I get when selecting one of the filenames in the dataset: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 326, in __main__() File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 318, in __main__ add_file( dataset, error Thanks for help. Chris
Chris Cole wrote:
Just checked it again (sorry I shouldn't have sent the previous email) with a directory and it still isn't working.
www-galaxy@ge-002: tmp> cd ~/data_import/ www-galaxy@ge-002: data_import> ls README www-galaxy@ge-002: data_import> ln -s /homes/pschofield/data/TOH www-galaxy@ge-002: data_import> ls TOH/ 0min_1.txt 30m_1.txt 60min_1.txt Cd_inuc_1.txt Cmono_1.txt Dd_inuc_1.txt Dmono_1.txt processed 0min_2.txt 30m_2.txt 60min_2.txt Cd_inuc_2.txt Cmono_2.txt Dd_inuc_2.txt Dmono_2.txt www-galaxy@ge-002: data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 www-galaxy barton 26 Mar 9 15:59 TOH -> /homes/pschofield/data/TOH
I got the TOH option in the 'Upload directory of files', which I selected, but the upload failed. Again, this is the error I get when selecting one of the filenames in the dataset: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 326, in __main__() File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 318, in __main__ add_file( dataset, error
Hi Chris, Chances are, the files in /homes/pschofield/data/TOH are not writable by Galaxy. Galaxy was trying to convert newlines to UNIX on those files as part of the import process. This behavior was not intentional. I've just committed 3517:0c9e154e9176, which will prevent Galaxy from ever attempting to modify the import files. When copying in to Galaxy, they will still converted, but no longer "in-place" (a temp file is used). When not copying (i.e. symlinking), no newline conversion will be performed since doing so would require modifying the import file. --nate
Thanks for help.
Chris
On 11/03/10 17:01, Nate Coraor wrote:
Chris Cole wrote:
Just checked it again (sorry I shouldn't have sent the previous email) with a directory and it still isn't working.
www-galaxy@ge-002: tmp> cd ~/data_import/ www-galaxy@ge-002: data_import> ls README www-galaxy@ge-002: data_import> ln -s /homes/pschofield/data/TOH www-galaxy@ge-002: data_import> ls TOH/ 0min_1.txt 30m_1.txt 60min_1.txt Cd_inuc_1.txt Cmono_1.txt Dd_inuc_1.txt Dmono_1.txt processed 0min_2.txt 30m_2.txt 60min_2.txt Cd_inuc_2.txt Cmono_2.txt Dd_inuc_2.txt Dmono_2.txt www-galaxy@ge-002: data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 www-galaxy barton 26 Mar 9 15:59 TOH -> /homes/pschofield/data/TOH
I got the TOH option in the 'Upload directory of files', which I selected, but the upload failed. Again, this is the error I get when selecting one of the filenames in the dataset: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 326, in __main__() File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 318, in __main__ add_file( dataset, error
Hi Chris,
Chances are, the files in /homes/pschofield/data/TOH are not writable by Galaxy. Galaxy was trying to convert newlines to UNIX on those files as part of the import process. This behavior was not intentional.
Yup. That's correct. Galaxy has read-only access to the original files. This is necessary as users have their own locations for their raw NGS data which they want to upload to Galaxy.
I've just committed 3517:0c9e154e9176, which will prevent Galaxy from ever attempting to modify the import files. When copying in to Galaxy, they will still converted, but no longer "in-place" (a temp file is used). When not copying (i.e. symlinking), no newline conversion will be performed since doing so would require modifying the import file.
Great. I'll look out for the latest updates and try it again, then. Again, thanks very much for the fixes. Cheers, Chris
On 3/12/10 11:33 AM, "Chris Cole" <chris@compbio.dundee.ac.uk> wrote:
On 11/03/10 17:01, Nate Coraor wrote:
Chris Cole wrote:
Just checked it again (sorry I shouldn't have sent the previous email) with a directory and it still isn't working.
www-galaxy@ge-002: tmp> cd ~/data_import/ www-galaxy@ge-002: data_import> ls README www-galaxy@ge-002: data_import> ln -s /homes/pschofield/data/TOH www-galaxy@ge-002: data_import> ls TOH/ 0min_1.txt 30m_1.txt 60min_1.txt Cd_inuc_1.txt Cmono_1.txt Dd_inuc_1.txt Dmono_1.txt processed 0min_2.txt 30m_2.txt 60min_2.txt Cd_inuc_2.txt Cmono_2.txt Dd_inuc_2.txt Dmono_2.txt www-galaxy@ge-002: data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 www-galaxy barton 26 Mar 9 15:59 TOH -> /homes/pschofield/data/TOH
I got the TOH option in the 'Upload directory of files', which I selected, but the upload failed. Again, this is the error I get when selecting one of the filenames in the dataset: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 326, in __main__() File "/homes/www-galaxy/galaxy_devel/tools/data_source/upload.py", line 318, in __main__ add_file( dataset, error
Hi Chris,
Chances are, the files in /homes/pschofield/data/TOH are not writable by Galaxy. Galaxy was trying to convert newlines to UNIX on those files as part of the import process. This behavior was not intentional.
Yup. That's correct. Galaxy has read-only access to the original files. This is necessary as users have their own locations for their raw NGS data which they want to upload to Galaxy.
Hi Nate, Hi Chris I was hoping this would also explain our weird problem (first mentioned in my e-mail to the list Jan 21st)...but even if the files are writable by galaxy it doesn't work, eg: if I try to import the following files: (sorry, I have to blank out the full path) galaxy@erbium:/****/test$ ls -l total 8 -rw-rw-rw- 1 haruhotz gbioinfo 841 2010-03-14 16:18 P51003.fasta -rw-r--r-- 1 galaxy galaxy 828 2010-03-14 16:18 P51004.fasta galaxy@erbium:/****/test$ it doesn't work, with the following error message (for both files): " Traceback (most recent call last): File "/***/galaxy_dist/tools/data_source/upload.py", line 311, in __main__() File "/***/galaxy_dist/tools/data_source/upload.py", line 302, in __main__ add_fil error " Only if I delete the file P51003.fasta, eg as user galaxy: galaxy@erbium:/****/test$ rm P51003.fasta galaxy@erbium:/****/test$ ls -l total 4 -rw-r--r-- 1 galaxy galaxy 828 2010-03-14 16:18 P51004.fasta galaxy@erbium:/****/test$ the remaining file get's imported (and I guess 'rewritten' by galaxy): galaxy@erbium:/****/test$ ls -l total 4 -rw------- 1 galaxy galaxy 828 2010-03-14 16:26 P51004.fasta galaxy@erbium:/****/test Nevertheless, I put my hope into the the next update (including "3517:0c9e154e9176") which I will do in the next few weeks. Regard, Hans
I've just committed 3517:0c9e154e9176, which will prevent Galaxy from ever attempting to modify the import files. When copying in to Galaxy, they will still converted, but no longer "in-place" (a temp file is used). When not copying (i.e. symlinking), no newline conversion will be performed since doing so would require modifying the import file.
Great. I'll look out for the latest updates and try it again, then. Again, thanks very much for the fixes. Cheers,
Chris
_______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
Sorry for the delay, only just got back to looking at this. Following the latest updates (tip is now 3560:4c95f1a101f1) I can now successfully update files not belonging to Galaxy. However, softlinks in the import directory aren't being dereferenced correctly. e.g. This is the contents of the import directory: caterpillar: /homes/www-galaxy/data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 ccole barton 65 Mar 25 15:47 solexa_files -> /homes/ccole/projects/Dicty_RNAseq/raw_data/NOBACK/SOLEXA results I deleted the softlink following upload, but now the files are no longer accessible. FASTQ Groomer for example reports this error: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 37, in ? if __name__ == "__main__": main() File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 18, in main for read_count, fastq_read in enumerate( fastqReader( open( input_filename ), format = input_type ) ): IOError: [Errno 2] No such file or directory: '/homes/www-galaxy/data_import/solexa_files/Ax2.txt' So, it looks like Galaxy didn't dereference the 'solexa_files' softlink (to me at least). Cheers, Chris
Chris Cole wrote:
Sorry for the delay, only just got back to looking at this.
Following the latest updates (tip is now 3560:4c95f1a101f1) I can now successfully update files not belonging to Galaxy. However, softlinks in the import directory aren't being dereferenced correctly.
e.g. This is the contents of the import directory: caterpillar: /homes/www-galaxy/data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 ccole barton 65 Mar 25 15:47 solexa_files -> /homes/ccole/projects/Dicty_RNAseq/raw_data/NOBACK/SOLEXA results
I deleted the softlink following upload, but now the files are no longer accessible. FASTQ Groomer for example reports this error: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 37, in ? if __name__ == "__main__": main() File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 18, in main for read_count, fastq_read in enumerate( fastqReader( open( input_filename ), format = input_type ) ): IOError: [Errno 2] No such file or directory: '/homes/www-galaxy/data_import/solexa_files/Ax2.txt'
So, it looks like Galaxy didn't dereference the 'solexa_files' softlink (to me at least).
Hi Chris, This wasn't part of the original design. We were handling when the contents of import directories were symlinks, but not when the import directories themselves were symlinks. This has been added in 3629:6b93e705c8a4. --nate
Cheers,
Chris _______________________________________________ galaxy-dev mailing list galaxy-dev@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-dev
On 12/04/10 22:26, Nate Coraor wrote:
Chris Cole wrote:
Sorry for the delay, only just got back to looking at this.
Following the latest updates (tip is now 3560:4c95f1a101f1) I can now successfully update files not belonging to Galaxy. However, softlinks in the import directory aren't being dereferenced correctly.
e.g. This is the contents of the import directory: caterpillar: /homes/www-galaxy/data_import> ls -l total 96 -rw-r--r-- 1 www-galaxy barton 142 Mar 9 14:14 README lrwxrwxrwx 1 ccole barton 65 Mar 25 15:47 solexa_files -> /homes/ccole/projects/Dicty_RNAseq/raw_data/NOBACK/SOLEXA results
I deleted the softlink following upload, but now the files are no longer accessible. FASTQ Groomer for example reports this error: Traceback (most recent call last): File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 37, in ? if __name__ == "__main__": main() File "/homes/www-galaxy/galaxy_devel/tools/fastq/fastq_groomer.py", line 18, in main for read_count, fastq_read in enumerate( fastqReader( open( input_filename ), format = input_type ) ): IOError: [Errno 2] No such file or directory: '/homes/www-galaxy/data_import/solexa_files/Ax2.txt'
So, it looks like Galaxy didn't dereference the 'solexa_files' softlink (to me at least).
Hi Chris,
This wasn't part of the original design. We were handling when the contents of import directories were symlinks, but not when the import directories themselves were symlinks. This has been added in 3629:6b93e705c8a4.
Oh, right. I got the impression that all the symlinks were dereferenced once. Thanks for the update. I'll check it out when it gets to the main tree. Thanks again for your help. Cheers, Chris
Chris Cole wrote:
Oh, right. I got the impression that all the symlinks were dereferenced once.
They were, but only the ones *in* the subdirectories of the import directory. It wasn't dereferencing if the actual subdirectories were symlinks. --nate
Thanks for the update. I'll check it out when it gets to the main tree. Thanks again for your help. Cheers,
Chris
participants (3)
-
Chris Cole
-
Hotz, Hans-Rudolf
-
Nate Coraor