Problem with executing larger files
Hi, Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice! Thanks Dhanushki
Dhanushki I have observed that Galaxy can not handle anything over 1 Gb...I have tried and tried without success. Maybe Jennifer or Mike can address this but it might just be for small files. Scott Scott Tighe Advanced Genome Technology Lab Vermont Cancer Center at the University of Vermont 149 Beaumont Avenue Health Science Research Bd RM 305 Burlington Vermont USA 05405 lab 802-656-AGTC (2482) cell 802-999-6666 On 4/29/2012 11:53 AM, Dhanushki Samaranayake wrote:
Hi,
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice!
Thanks
Dhanushki
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Just wanted to make sure if cofflink is working or not cufflink is not detecting my SAM or BAM files.,
Hello Ateequr, To confirm, you are working on the main public Galaxy instance at: http://usegalaxy.org (https://main.g2.bx.psu.edu/)? There are at present no known problems with Cufflinks. If your datasets are not recognized by the tool, then a double check that the datatype is assigned correctly is the first place to start. Click on the pencil icon for the datasets and on the 'Edit Attributes' form, scroll down to datatype, and assign if necessary. BAM data is somewhat difficult to load or have in a history without having it detected and assigned - so that is likely a problem. But a reassignment to tabular can occur for SAM data, if you have been performing text manipulations on the data and do not have headers. Just confirm format and change the dataset assignment back to SAM so that Cufflinks will accept it as input. A metadata error (presented in a yellow box within the dataset) may be an indicator that something is wrong, although one round of 'attempt to manually correct' using the link provided is good to try and can often correct the issue. Hopefully this helps, Jen Galaxy team On 4/30/12 7:12 AM, Ateequr Rehman wrote:
Just wanted to make sure if cofflink is working or not cufflink is not detecting my SAM or BAM files.,
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
Hi Dhanushki, To clarify, the FTP was successful, meaning the logs in Filezilla report that the entire file transferred? If the file was part of a .zip archive, it contained only a single file? (only the first file from a .zip archive is loaded, a known limitation, noted on the wiki but easy to overlook). Meaning, if you have a .bam and a .fastq in a .zip archive, only the .bam would be loaded by Galaxy. Also, you do not need to load .bam.bai indexes, Galaxy creates these for locally for all .bam datasets. If the transfer was interrupted, and the file is still in the FTP transfer area (not moved into a history yet), connect to Galaxy again (first) and then resume the transfer to completion. If you are not sure if the transfer was interrupted or the file is already moved, running FTP again and watching the log would be recommended. If the FTP logs state that the transfer was complete, but moving the file into a history produced an error, we would like to examine. Please send a bug report in from the red error dataset by clicking on the green bug icon. Please note this email address in the comments, if you use a different one for your Galaxy account, so that we can link the bug report to this question. If there was no error produced, then it would be very odd for a successfully transferred single file of size 3.5G in the FTP transfer area to then have a size of 2.5G in a Galaxy history dataset (although the reverse may occur if transferring .gzip or other compressed data). If you still think your data was truncated at this step after checking for FTP interruptions, please help us to troubleshoot by loading the file by FTP into the transfer area, but not moving it into a history. Then please send me (directly, off list) your Galaxy account user email, the file name, and the expected file size and we can troubleshoot from there. Apologies for the detail, this help is for both you and the others who may have had similar issues. Scott, if you have had problems with files over 1G and FTP, please also check for FTP interrupted transfers and try restarting. It sounds like your problem was with the original transfer, but if this was at the step where you were moving the data from the FTP transfer area into a history, sending in the same type of information would be appropriate and we can also troubleshoot. I will watch for the replies, Jen Galaxy team On 4/29/12 8:53 AM, Dhanushki Samaranayake wrote:
Hi,
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice!
Thanks
Dhanushki
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
Hi Dhanushki, Thank you for providing the original data for our testing. I worked with the first dataset (labeled "1" by you), the 3.5GB .bam datafile that FTP transferred completely, but when loaded into a history, ended with a of size 2.5Gb. First I should explain that when BAM data is loaded into a Galaxy history, two things occur: 1 - the file is sorted using Samtools sort 2 - the file is indexed to create the .bam.bai Next, I can let you know that the 2.5GB file loaded into the history is the complete original dataset. The difference is size is due to the sorting and new Samtools compression. I am not sure what tools you used to create the data,but I was incorrect is stating that the size of a .bam file would be unlikely to decrease so significantly in size and will explain how this was confirmed: I verified the content two ways: 1 - counted up the number of alignments in the original.bam and history.bam using 'samtools view -c'. Both were the same: $ samtools view -c original.bam 43232174 $ samtools view -c history.bam 43232174 2 - I directly compared the content. Because the history.bam file was sorted by the process that loaded it into the history, I decided to 'samtools sort' the original.bam file as well, so that I could compare. $ samtools sort original.bam original.bam.sorted At this point, the size of original.bam shrank from 3.5GB to 2.5GB. Meaning, it is the sorting by samtools that the reduced the overall size of the file. But, I wanted to go one step further and actually directly compare the exact contents. So, I used 'samtools view' to extract the alignments, then perform a diff. Diff will report even a single character difference between files. $ samtools view original.bam.sorted > original.bam.sorted.view $ samtools view history.bam > history.bam.view $ diff original.bam.sorted.view history.bam.view > diff.out $ more diff.out < nothing, meaning no differences, exactly the same content > The same can be done on any of your other files, by you locally, in a terminal prompt, if/after you have samtools installed. To download a large Galaxy dataset from a history, do the following command (currently, the command 'wget' is not a fetching option): 1. right click on the disk icon for the dataset and 'copy link location' 2. type into the terminal prompt $ curl -0 'paste_in_the_copied_link_location' > filename.out Samtools: http://samtools.sourceforge.net Samtools manual: http://samtools.sourceforge.net/samtools.shtml Hopefully this helps and you can proceed with your analysis with confidence that your data is intact, Best, Jen Galaxy team On 4/29/12 8:53 AM, Dhanushki Samaranayake wrote:
Hi,
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice!
Thanks
Dhanushki
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (4)
-
Ateequr Rehman
-
Dhanushki Samaranayake
-
Jennifer Jackson
-
Scott Tighe