Hello, I am using the subtract (whole dataset) tool. I converted my fastq file to tabular with 2 columns: 1. Identifier and 2. sequence. I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset. This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file. Any idea why this is so? Jose
Hello, Using the 'Subtract' tool between FASTQ datasets can be memory intensive since it literally involves sorting and then comparing each character between the two files. This is likely not necessary. I have seen queries such as yours run successfully on even very large datasets by eliminating the Subtract step and instead using a 'Select' with "NOT Matching' on the original dataset. Example: current dataflow: 1 - original file A 2 - select positive match expression 'X' to create file B 3 - subtract file B from file A to create file C better: 1 - original file A 2 - select negative match expression 'X' to create file C If this failure is on the public main Galaxy server and you do not wish to change your query, then moving to a cloud instance and experimenting with larger memory options is one suggestion: http://usegalaxy.org/cloud Hopefully this helps, Jen Galaxy team On 4/29/12 6:16 PM, Xianrong Wong wrote:
Hello, I am using the subtract (whole dataset) tool. I converted my fastq file to tabular with 2 columns: 1. Identifier and 2. sequence. I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset. This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file. Any idea why this is so? Jose
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Xianrong Wong