Hello, Other types of statistics can be generated with the "Join, Subtract and Group -> Group" tool, by adding an additional file manipulation. First run "Compute sequence length". Next run "Add column to an existing dataset" and set this to be the same value for all rows, "1" or something else simple. Nest, run "Group", with a group on the column containing the "Add column" value, and then click on "Add new Operation". Set "Type" to be "Mode" and "On column" to be the the sequence length. Very large files, usually when run with multiple operations, are occasionally to too large to run with this tool on the public main Galaxy instance. If this occurs, the first option is to simplify the query. If that doesn't work, then moving to a cloud instance would be the recommendation. Good luck with your project, Jen Galaxy team On 3/15/12 3:17 PM, Elad Firnberg wrote:
Hi Jen,
Thank you, this was very helpful. Is there a way to get some more statistical information such as the mode of read lengths? The summary statistics tool only seems to provide the mean.
Thank you, Elad
On Thu, Mar 15, 2012 at 3:02 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Elad,
Start with the tool "FASTA manipulation -> Compute sequence length" to generate a length value for each sequence.
Next, use "Statistics -> Summary Statistics for any numerical column" on the result to generate specific R function statistics (see the tool form for which and how to enter the expression).
To visualize the distribution, use "Graph/Display Data -> Histogram of a numeric column".
Hopefully this helps.
Best,
Jen Galaxy team
On 3/15/12 9:24 AM, Elad Firnberg wrote:
Hi,
Is there a tool or easy way to obtain a read length distribution on a set of 454 reads in fasta format? I can't seem to find such a tool in Galaxy.
Thank you, Elad
_____________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/__listinfo/galaxy-dev <http://lists.bx.psu.edu/listinfo/galaxy-dev>
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Chemical and Biomolecular Engineering Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218 Tel# (410) 516-3937