Hi,
I have a sequence file that has 454 reads for 64 barcoded samples.
I also have a second 'query' which is a file with the names of all 64 samples, and the corresponding 'sample identifier sequence' (19 bp) in the following format: (AGGTTGATTGAATGGCTTA)|(GATGAAGAACGCAGAACCT)
(I need to search for the forward or the reverse identifier).
I want to 'join' the two queries by searching for a match in the first query with the 'sample identifying sequence' in the second query so that I end up with a copy of the first query with a new column corresponding to the sample names.
But the 'join' command only returns perfect matches between columns. How can I join two queries with a partial match? (obviously only 19bp of the total result sequence will overlap with the identifying sequence)
I could use the 'manipulate fastq' command, but I would have to do 64 separate steps, as far as I can tell.
I would really appreciate any help with what is probably quite a simple problem!
thanks very much Pip Griffin
galaxy-user@lists.galaxyproject.org