Deconvoluting NGS samples with multiple barcodes

21 Jun 2010

      Hi,

I have a sequence file that has 454 reads for 64 barcoded samples.

I also have a second 'query' which is a file with the names of all 64
samples, and the corresponding 'sample identifier sequence' (19 bp) in
the following format:
(AGGTTGATTGAATGGCTTA)|(GATGAAGAACGCAGAACCT)

(I need to search for the forward or the reverse identifier).

I want to 'join' the two queries by searching for a match in the first
query with the 'sample identifying sequence' in the second query so
that I end up with a copy of the first query with a new column
corresponding to the sample names.

But the 'join' command only returns perfect matches between columns.
How can I join two queries with a partial match? (obviously only 19bp
of the total result sequence will overlap with the identifying
sequence)

I could use the 'manipulate fastq' command, but I would have to do 64
separate steps, as far as I can tell.

I would really appreciate any help with what is probably quite a simple problem!

thanks very much
Pip Griffin

Pip Griffin

tags

participants (1)