Dear Galaxy
we have a 454 metagenomic dataset. We have used barcode splitter to divide the dataset into it's constituent amplicons. We have also been using a clustering application (dnaclust) in Galaxy to subdivide the dataset
by similarity. My question is; are there Galaxy tools to allow the combining, sorting and counting of these two outputs? For example, can each cluster - and then each sequence within that cluster - be given an identifier.... so that one can then split the
output by barcode and summarise the data along the lines of amplicon/barcode X has X number of sequences within cluster 1, X number of sequences within cluster 2, ... etc? Am I making any sense?
This is the sort of problem that sounds like it is solvable in Excel and, indeed, a UK colleague of mine has been doing just this. But is there a straightforward means to do so in Galaxy? It
is not obvious to me in the Filtering or Sorting tools.
best wishes
Simon