Hi Scott, The tool "Metagenomic analyses -> Find diagnostic hits" can be used to isolate the conserved sequences. Then, you use the tool "Join, Subtract and Group -> Compare" to find "Non Matching rows of 1st dataset" to filter out anything that you think is spurious for your analysis (put in original file first, output of diagnostic hits second) before moving forward with the other summary tools. You will probably want to run the "Find diagnostic hits" tool more than once. The choice is yours whether to do the "Compare" after each, or to "Text Manipulation -> Concatenate" all the results together first, then "Compare". The first might work faster, it just depends on the size of your datasets (how much filtering occurred before this step, etc). The "Compare" tool sorts and holds data in memory. Even if you need to break the data up and run in smaller chunks, the results should be the same in the end. None of these jobs require the data to be in one lump. Others are welcome to add to this with their own strategies, I am sure there are others ways to do this. Some of the public servers specializing in Metagenomics may also have tools for this, or options, and some of those may have donated to the Tool Shed, for local or cloud use. May be worth a look. http://wiki.galaxyproject.org/PublicGalaxyServers Good question! Jen Galaxy team On 9/18/13 7:03 AM, Scott W. Tighe wrote:
Dear Galaxy
When running HiSeq shot metagenomics sample from the environment against megablast and taxonomic representation, How do I filter/remove all the 16s and other conserved sequences.
The problem if blasting a single organism that has a fraction of conserved sequence, the results will align with E.coli 10,000 times more then the possible target organism. This data would be wrong and misleading. For example a 100mg sample that was negative for e coli using MUG test, give thousands of hits with galaxy.
1) Is there a "filter conserved sequences" setting?
2) Is there a "remove model organisms" setting?
Scott Tighe
-- Jennifer Hillman-Jackson http://galaxyproject.org