Help with Summary Statistics
Hello, I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been organized and sorted by species. The format of the data is as follows:
HD4AU5D01BHBCQCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MCTCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
etc...for each species I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy? Thank you, Dominique Cowart User name: dac330
On Thu, Aug 2, 2012 at 7:50 PM, D. A. Cowart <dac330@psu.edu> wrote:
Hello,
I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been organized and sorted by species. The format of the data is as follows:
That was probably FASTA format (but mangled in the email).
I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy?
Use the "Compute sequence length" tool to compute the read lengths, and then you should be able to compute some statistics about the lengths. Peter
Hello, I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been divided into columns:
HD4AU5D01BHBCQC TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MC TCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy? Thank you in advance, Dominique Cowart
Hi, "Summary Statistics" is ok, but before you need to use the tool 'Compute sequence length'. Ciao, Bjoern Am 23.05.2014 13:29, schrieb Dominique Cowart:
Hello,
I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been divided into columns:
HD4AU5D01BHBCQC TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MC TCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy?
Thank you in advance,
Dominique Cowart
___________________________________________________________ The Galaxy User List is being replaced by the Galaxy Biostar User Support Forum at https://biostar.usegalaxy.org/
Posts to this list will be disabled in May 2014. In the meantime, you are encouraged to post all new questions to Galaxy Biostar.
For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
Hi Dominique, There are a few ways to do this. If you have the data in fasta format in your history already, skip converting to fasta-> tabular and just use that with the tool "FASTA manipulation -> Compute sequence length". Then run the statistics tool on the output. Or, if this is the format you have to start with (tabular), the tool "Text Manipulation -> Compute" can be used with the option "length(c2)" to generate the length of column 2. Adjust the "c2" portion as needed if this is not your complete file or if you had to do extra manipulations to isolate these columns (potentially skip those steps and use the earlier file). Thanks! Jen Galaxy team ps. This mailing list has moved to _/Galaxy Biostar/_ and will be closing soon. Please join us there! Here is how to get set up: https://wiki.galaxyproject.org/Support/Biostar On 5/23/14 4:29 AM, Dominique Cowart wrote:
Hello,
I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been divided into columns:
HD4AU5D01BHBCQC TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MC TCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy?
Thank you in advance,
Dominique Cowart
___________________________________________________________ The Galaxy User List is being replaced by the Galaxy Biostar User Support Forum at https://biostar.usegalaxy.org/
Posts to this list will be disabled in May 2014. In the meantime, you are encouraged to post all new questions to Galaxy Biostar.
For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson http://galaxyproject.org
Hi Dominique, I’d use the original fasta file and input it into the 'Fasta Manipulation > Compute Sequence Length' tool Then, using the output, run the 'Statistics > Summary Statistics for any numerical column' tool on c2. That will give you all the info you’re after. Cheers, Graham Dr. Graham Etherington Bioinformatics Support Officer, The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH. UK Tel: +44 (0)1603 450601 Twitter: @bioinformatiks From: Dominique Cowart <dac330@gmail.com<mailto:dac330@gmail.com>> Date: Friday, 23 May 2014 12:29 To: "galaxy-user@bx.psu.edu<mailto:galaxy-user@bx.psu.edu>" <galaxy-user@bx.psu.edu<mailto:galaxy-user@bx.psu.edu>> Subject: [galaxy-user] Summary Statistics Hello, I am attempting to use Galaxy to calculate the mean sequence read length and identify the range of read lengths for my 454 data. The data has already been divided into columns:
HD4AU5D01BHBCQC TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC HD4AU5D01A093MC TCTGTCGCTCTGTCTCTCTTCTCTCTCTCTCTCTCT
I have attempted to use the "Summary Statistics" button, however it appears to only be for numerical data and not sequence data. Is this tool/task available via Galaxy? Thank you in advance, Dominique Cowart
participants (6)
-
Björn Grüning
-
D. A. Cowart
-
Dominique Cowart
-
graham etherington (TSL)
-
Jennifer Jackson
-
Peter Cock