Hi Dhanushki, Thank you for providing the original data for our testing. I worked with the first dataset (labeled "1" by you), the 3.5GB .bam datafile that FTP transferred completely, but when loaded into a history, ended with a of size 2.5Gb. First I should explain that when BAM data is loaded into a Galaxy history, two things occur: 1 - the file is sorted using Samtools sort 2 - the file is indexed to create the .bam.bai Next, I can let you know that the 2.5GB file loaded into the history is the complete original dataset. The difference is size is due to the sorting and new Samtools compression. I am not sure what tools you used to create the data,but I was incorrect is stating that the size of a .bam file would be unlikely to decrease so significantly in size and will explain how this was confirmed: I verified the content two ways: 1 - counted up the number of alignments in the original.bam and history.bam using 'samtools view -c'. Both were the same: $ samtools view -c original.bam 43232174 $ samtools view -c history.bam 43232174 2 - I directly compared the content. Because the history.bam file was sorted by the process that loaded it into the history, I decided to 'samtools sort' the original.bam file as well, so that I could compare. $ samtools sort original.bam original.bam.sorted At this point, the size of original.bam shrank from 3.5GB to 2.5GB. Meaning, it is the sorting by samtools that the reduced the overall size of the file. But, I wanted to go one step further and actually directly compare the exact contents. So, I used 'samtools view' to extract the alignments, then perform a diff. Diff will report even a single character difference between files. $ samtools view original.bam.sorted > original.bam.sorted.view $ samtools view history.bam > history.bam.view $ diff original.bam.sorted.view history.bam.view > diff.out $ more diff.out < nothing, meaning no differences, exactly the same content > The same can be done on any of your other files, by you locally, in a terminal prompt, if/after you have samtools installed. To download a large Galaxy dataset from a history, do the following command (currently, the command 'wget' is not a fetching option): 1. right click on the disk icon for the dataset and 'copy link location' 2. type into the terminal prompt $ curl -0 'paste_in_the_copied_link_location' > filename.out Samtools: http://samtools.sourceforge.net Samtools manual: http://samtools.sourceforge.net/samtools.shtml Hopefully this helps and you can proceed with your analysis with confidence that your data is intact, Best, Jen Galaxy team On 4/29/12 8:53 AM, Dhanushki Samaranayake wrote:
Hi,
Earlier I tried to upload larger bam files (3.5GB, 3.4GB and 4GB) to Galaxy account, but failed. Your advice was to use FTP upload at http://wiki.g2.bx.psu.edu/FTPUpload. I followed the screencast in the website and did exactly as it has advised. I used FileZilla ftp client, uploaded the files to Galaxy account and executed. Now the problem is at the execution step. For example, my 3.5GB file is accurately uploaded, but once I execute the file I get is 2.5GB. The file seems to be somehow truncated. Please advice!
Thanks
Dhanushki
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org