On 30/09/09 12:32, Thomas Haverkamp wrote:
Hi All, I have a little how to do question and was hoping somebody knows the answer?
I have a metagenomic data set with reads of lengths between 100 and 1000 bp. Now I want to create a dataset from my original dataset, with sequences of exact 200bp. I know I can use the filter tool to extract all reads longer than 199bp from the original data set. But then I want to cut off all the sequence bit that is longer than 200bp. so I end up with only a dataset of exactly 200bp. Does anybody know how I can do that in Galaxy. I was thinking of some of the EMBOSS tools, but they only see the first sequence and not all the other sequences in my Fasta file?
It depends on the EMBOSS tool - some read only one sequence, but many will read and process all of them. seqret -send 200 will do what you ask. It will truncate sequences after base 200 (shorter sequences stay unchanged) regards, Peter Rice EMBOSS team