I received a server error and would like to more detailed information Server Error An error occurred. See the error logs for more information. (Turn debug on to display exception reports here) How do I 'turn debug on'?. I am using the public galaxy instance http://main.g2.bx.psu.edu/ Thanks, Peter Andrews -- -------------- Peter Andrews Programmer Computational Genetics Lab Dartmouth Hitchcock Medical Center (603) 653-9963
Hi list, Is there a tool in Galaxy to trim the end of FASTQ reads based on their quality, say to remove all base pairs at the end of a read that have a quality smaller than 20? I know about the tool that trims an arbitrary number of base pairs at the end of reads and the filter tool that can filter out sequences that have some base pairs with a quality value below some threshold but they are different from what I need. Regards, Florent
Hi Florent, You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient. Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered. Thanks, Dan On Apr 4, 2010, at 10:32 PM, Florent Angly wrote:
Hi list, Is there a tool in Galaxy to trim the end of FASTQ reads based on their quality, say to remove all base pairs at the end of a read that have a quality smaller than 20? I know about the tool that trims an arbitrary number of base pairs at the end of reads and the filter tool that can filter out sequences that have some base pairs with a quality value below some threshold but they are different from what I need. Regards, Florent _______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient
That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered.
It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo Florent
Hi all, I have a question regarding the trimming length. I would like to trim my the last part 36 bp from my 76 bp reads, then I can compare the result between 36 bp (first part of 76bp) and 76 bp (the whole part). So here I want to ask if I can download the trimed reads from Galaxy. Many thanks! Wei On 4/6/10 10:00 AM, Florent Angly wrote:
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered. It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo
Florent
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Yanwei Tan Institute of Neurobiology 1.OG, AG Bading Im Neuenheimer Feld 364 University of Heidelberg 69120 Heidelberg Germany Tel:+49-6221-548319 Fax:+49-6221-546700
Hi Wei, You can download your results from Galaxy at any time by clicking on the Save (disk) icon associated with your dataset, shown after clicking on the dataset's name to expand the view. Thanks for using Galaxy, Dan On Apr 6, 2010, at 5:00 AM, Yanwei Tan wrote:
Hi all,
I have a question regarding the trimming length.
I would like to trim my the last part 36 bp from my 76 bp reads, then I can compare the result between 36 bp (first part of 76bp) and 76 bp (the whole part). So here I want to ask if I can download the trimed reads from Galaxy.
Many thanks! Wei
On 4/6/10 10:00 AM, Florent Angly wrote:
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered. It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo
Florent
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
-- Yanwei Tan Institute of Neurobiology 1.OG, AG Bading Im Neuenheimer Feld 364 University of Heidelberg 69120 Heidelberg Germany
Tel:+49-6221-548319 Fax:+49-6221-546700
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Hi Florent, Thanks very much for the comments. A sliding window sounds like an excellent approach: allow users to specify the window size, step size, an aggregation action to perform on the window (min, max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and a threshold quality value; allowing users to specify the ends (both or only one or the other) to trim would also likely be useful. Would it also be desirable to allow specifying a number of quality scores that can be excluded from the aggregation action (the zero low quality base pairs in your example)? A window size of 1 would handle the simple case of only trimming the very ends while allowing the user to configure more complex windowing schemes. Thoughts? Thanks, Dan On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient
That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered.
It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo
Florent
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
That sounds perfect Daniel. Advanced options for advanced users, and safe defaults for everyone else will do it. Florent On 07/04/10 01:00, Daniel Blankenberg wrote:
Hi Florent,
Thanks very much for the comments. A sliding window sounds like an excellent approach: allow users to specify the window size, step size, an aggregation action to perform on the window (min, max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and a threshold quality value; allowing users to specify the ends (both or only one or the other) to trim would also likely be useful. Would it also be desirable to allow specifying a number of quality scores that can be excluded from the aggregation action (the zero low quality base pairs in your example)? A window size of 1 would handle the simple case of only trimming the very ends while allowing the user to configure more complex windowing schemes. Thoughts?
Thanks,
Dan
On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient
That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered.
It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo
Florent
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
Hi Florent, I've added a tool which will do this to the repository. It is currently available on the test server and will be on the main server next week. Thanks, Dan On Apr 7, 2010, at 9:23 PM, Florent Angly wrote:
That sounds perfect Daniel. Advanced options for advanced users, and safe defaults for everyone else will do it. Florent
On 07/04/10 01:00, Daniel Blankenberg wrote:
Hi Florent,
Thanks very much for the comments. A sliding window sounds like an excellent approach: allow users to specify the window size, step size, an aggregation action to perform on the window (min, max, sum, mean, etc ), a comparison method (<,<=, ==, etc) and a threshold quality value; allowing users to specify the ends (both or only one or the other) to trim would also likely be useful. Would it also be desirable to allow specifying a number of quality scores that can be excluded from the aggregation action (the zero low quality base pairs in your example)? A window size of 1 would handle the simple case of only trimming the very ends while allowing the user to configure more complex windowing schemes. Thoughts?
Thanks,
Dan
On Apr 6, 2010, at 4:00 AM, Florent Angly wrote:
Thanks for your reply Daniel.
You are correct that there is not currently a tool to trim directly by quality in Galaxy; currently the the Summary statistics and boxplot tools are used to determine good cut off for use in the trim by column tool; percentage of read length can be more useful on variable length reads. However, adding a tool that can directly trim reads based upon a threshold quality score seems like a natural fit for Galaxy, when uniform read length is not present at the start and/or not a requirement at the end and the percentage-of-read-length method is not sufficient
That's right... I did not even think about using the boxplot tool to find how much to trim the ends. My reads all have the same length, but still, is seems more natural to only trim as much as needed and no more. For example, I have some reads that are completely low quality and should entirely trimmed/removed, whereas some might of good quality over almost all their length.
Lets verify that you are looking for something like this, where 'x' is a low quality base and 'o' is a high quality base: Start with: xxxooooxxooooxxx after trimming ends for 'x': ooooxxoooo So that trimming happens only from the ends and stops as soon as a base above the threshold is found and internal low quality bases are not considered.
It's probaby better to use a short sliding window (of, say, 5 bp) and trim the ends until the window has no more than, say zero low quality base pairs. So, the following sequence would be converted from: xxxoxooooooxxooooooxoxxx to: ooooooxxoooooo
Florent
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
participants (4)
-
Daniel Blankenberg
-
Florent Angly
-
Peter Andrews
-
Yanwei Tan