Hi All, I have a little how to do question and was hoping somebody knows the answer? I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file? Any ideas are welcome. Cheers Thomas Dr. Thomas H.A. Haverkamp Centre for Ecological and Evolutionary Synthesis (CEES) Dept. of Biology University of Oslo P.O. Box 1066 Blindern 0316 Oslo Norway Phone: +47 22 85 44 00 Mobile: +47 48 09 49 32 E-mail: thhaverk@bio.uio.no Skype: Thomieh73
Thomas: It is a nice coincidence, but I am just about to commit a tool for this specific purpose. It will be on the test site shortly. anton galaxy team On Sep 30, 2009, at 7:32 AM, Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
Any ideas are welcome. Cheers
Thomas
Dr. Thomas H.A. Haverkamp Centre for Ecological and Evolutionary Synthesis (CEES) Dept. of Biology University of Oslo P.O. Box 1066 Blindern 0316 Oslo Norway
Phone: +47 22 85 44 00 Mobile: +47 48 09 49 32 E-mail: thhaverk@bio.uio.no Skype: Thomieh73
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Anton Nekrutenko http://nekrut.bx.psu.edu http://galaxyproject.org
Hi Thomas, Unless you can code it yourself (it's a fairly trivial thing to do), I would recommend the Fastx toolkit which includes a tool for this (amongst many other things) http://hannonlab.cshl.edu/fastx_toolkit/ Cheers, Chris Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
Any ideas are welcome. Cheers
Thomas
Dr. Thomas H.A. Haverkamp Centre for Ecological and Evolutionary Synthesis (CEES) Dept. of Biology University of Oslo P.O. Box 1066 Blindern 0316 Oslo Norway
Phone: +47 22 85 44 00 Mobile: +47 48 09 49 32 E-mail: thhaverk@bio.uio.no <mailto:thhaverk@bio.uio.no> Skype: Thomieh73
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
-- Dr Chris Cole Senior Bioinformatics Research Officer School of Life Sciences Research University of Dundee Dow Street Dundee DD1 5EH Scotland, UK url: http://network.nature.com/profile/drchriscole e-mail: chris@compbio.dundee.ac.uk Tel: +44 (0)1382 388 721 The University of Dundee is a registered Scottish charity, No: SC015096
Thanks Chris: This tool is already in Galaxy on the test site (http://test.g2.bx.psu.edu ) under "NGS: QC and manipulation" section. a. On Sep 30, 2009, at 11:07 AM, Chris Cole wrote:
Hi Thomas,
Unless you can code it yourself (it's a fairly trivial thing to do), I would recommend the Fastx toolkit which includes a tool for this (amongst many other things) http://hannonlab.cshl.edu/fastx_toolkit/
Cheers,
Chris
Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
Any ideas are welcome. Cheers
Thomas
Dr. Thomas H.A. Haverkamp Centre for Ecological and Evolutionary Synthesis (CEES) Dept. of Biology University of Oslo P.O. Box 1066 Blindern 0316 Oslo Norway
Phone: +47 22 85 44 00 Mobile: +47 48 09 49 32 E-mail: thhaverk@bio.uio.no <mailto:thhaverk@bio.uio.no> Skype: Thomieh73
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
-- Dr Chris Cole Senior Bioinformatics Research Officer School of Life Sciences Research University of Dundee Dow Street Dundee DD1 5EH Scotland, UK
url: http://network.nature.com/profile/drchriscole e-mail: chris@compbio.dundee.ac.uk Tel: +44 (0)1382 388 721
The University of Dundee is a registered Scottish charity, No: SC015096 _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Anton Nekrutenko http://nekrut.bx.psu.edu http://galaxyproject.org
Hi all, Thanks for the answers, I do see the Trim sequence tool on the galaxy test website, but when I run the tool on a small fasta file, it seems to crash constantly. Is that because it does not recognize my fasta file?? Furthermore, I am not a programmer (have to become one, I guess) so I would like to try the FastX program. I will try that too. Cheers Thomas On Sep 30, 2009, at 5:32 PM, Anton Nekrutenko wrote:
Thanks Chris:
This tool is already in Galaxy on the test site (http://test.g2.bx.psu.edu ) under "NGS: QC and manipulation" section.
a.
On Sep 30, 2009, at 11:07 AM, Chris Cole wrote:
Hi Thomas,
Unless you can code it yourself (it's a fairly trivial thing to do), I would recommend the Fastx toolkit which includes a tool for this (amongst many other things) http://hannonlab.cshl.edu/fastx_toolkit/
Cheers,
Chris
Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
Any ideas are welcome. Cheers
Thomas
Dr. Thomas H.A. Haverkamp Centre for Ecological and Evolutionary Synthesis (CEES) Dept. of Biology University of Oslo P.O. Box 1066 Blindern 0316 Oslo Norway
Phone: +47 22 85 44 00 Mobile: +47 48 09 49 32 E-mail: thhaverk@bio.uio.no <mailto:thhaverk@bio.uio.no> Skype: Thomieh73
------------------------------------------------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
-- Dr Chris Cole Senior Bioinformatics Research Officer School of Life Sciences Research University of Dundee Dow Street Dundee DD1 5EH Scotland, UK
url: http://network.nature.com/profile/drchriscole e-mail: chris@compbio.dundee.ac.uk Tel: +44 (0)1382 388 721
The University of Dundee is a registered Scottish charity, No: SC015096 _______________________________________________ galaxy-user mailing list galaxy-user@bx.psu.edu http://mail.bx.psu.edu/cgi-bin/mailman/listinfo/galaxy-user
Anton Nekrutenko http://nekrut.bx.psu.edu http://galaxyproject.org
On 30/09/09 12:32, Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
It depends on the EMBOSS tool - some read only one sequence, but many will read and process all of them. seqret -send 200 will do what you ask. It will truncate sequences after base 200 (shorter sequences stay unchanged) regards, Peter Rice EMBOSS team
participants (4)
-
Anton Nekrutenko
-
Chris Cole
-
Peter Rice
-
Thomas Haverkamp