Deconvoluting NGS samples with multiple barcodes

Hi, I have a sequence file that has 454 reads for 64 barcoded samples. I also have a second 'query' which is a file with the names of all 64 samples, and the corresponding 'sample identifier sequence' (19 bp) in the following format: (AGGTTGATTGAATGGCTTA)|(GATGAAGAACGCAGAACCT) (I need to search for the forward or the reverse identifier). I want to 'join' the two queries by searching for a match in the first query with the 'sample identifying sequence' in the second query so that I end up with a copy of the first query with a new column corresponding to the sample names. But the 'join' command only returns perfect matches between columns. How can I join two queries with a partial match? (obviously only 19bp of the total result sequence will overlap with the identifying sequence) I could use the 'manipulate fastq' command, but I would have to do 64 separate steps, as far as I can tell. I would really appreciate any help with what is probably quite a simple problem! thanks very much Pip Griffin
participants (1)
-
Pip Griffin