Query Fastq files for particular sequence elements
Greetings all, I've been trying to find a way to query fastq files for particular sequence elements. Our data was mapped using BWA by our collaborator, and repetitive elements 'ignored', but we are now interested in determining whether a couple specific repetitive elements of interest are differentially represented in the raw read files. Are there any tools that anyone has developed to do anything like this -- and that perhaps I'm simply missing as I explore the available tools? In the short term, I've written a very crude python script to begin exploring the question, but I'm sure there has to be a much better way. If there are no such tools available, I'm hopeful that someone might have some helpful suggestions, or that perhaps it could be explored during the upcoming conference &/or training day in July. Thanks and Best Regards, Jane -- Jane E. Dorweiler, PhD
Hi Jane I recommend mapping the data again yourself.... Alternatively, you might wanna play with 'grep' (if you have the Galaxy Unix tools installed in your Galaxy server), or use the tool 'Select lines that match an expression'. I would do a Fastq to Tab on your data first. Or you can try the emboss tool 'fuzznuc' on the Fasta version of you data. ...but assuming you are talking about 'big' fastq files, mapping the data again yourself is most likely the way to go. Regards, Hans On 06/14/2012 07:12 PM, Jane Dorweiler wrote:
Greetings all, I've been trying to find a way to query fastq files for particular sequence elements. Our data was mapped using BWA by our collaborator, and repetitive elements 'ignored', but we are now interested in determining whether a couple specific repetitive elements of interest are differentially represented in the raw read files. Are there any tools that anyone has developed to do anything like this -- and that perhaps I'm simply missing as I explore the available tools? In the short term, I've written a very crude python script to begin exploring the question, but I'm sure there has to be a much better way. If there are no such tools available, I'm hopeful that someone might have some helpful suggestions, or that perhaps it could be explored during the upcoming conference &/or training day in July. Thanks and Best Regards, Jane -- Jane E. Dorweiler, PhD
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at:
participants (2)
-
Hans-Rudolf Hotz
-
Jane Dorweiler