I have been trying to analyze some recently acquired WGS reads (re-sequencing with MiSeq) but I am having problems with both Picard and GATK tools and I don't know where the problem is.
My fastq reads are already in the sanger/illumina 1.9 format, as recognized by the FastQC tool. I have modified the attributes of the read files from fastq to fastqsanger and successfully performed a BWA mapping against my reference sequence. I have then filtered the resulting SAM file with "NGS: SAM Tools, Filter SAM" to have only paired-mapped reads and reordered the file with "NGS: Picard, Reorder SAM/BAM", allowing the option Truncate sequence names after first whitespace. Since my reads are highly duplicated (from the FastQC output), I have run the "NGS: Picard, Mark Duplicate reads" tool, obtaining the removal of only 2 duplicated reads. I went on adding a Read Group with "NGS: Picard, Add or Replace Groups" and starting the SNP calling with GATK using the tool Realigner Target Creator. And here I have obtained an empty file and I have started thinking something is wrong.
So, I have tried to perform the mapping again (as suggested by the GATK wiki when someone got an empty file like me), running the same steps on different sample reads, but I have always the same strange results from the De-duplication step and the Realigner tool. I think there is something wrong during the BWA mapping step, or even in my fastq reads, but I cannot understand what it is.
And what is the read quality format accepted by Galaxy tools? I know it's the PHRED+33, but how does it look like?
or Example 2:
I did BWA mapping with both types and it worked, but maybe the problems lies somewhere here.
I hope someone can help me!
Thank you!!!! Debora