Hi

I have received Illumina paired-end genome sequence data as a .tar file. When unpacked the data for each genome accession is split into about 100 fastq files. Total of about 37 Gpb per genome.

Can you recommend the best way to organise this data prior to mapping to reference genome?

I can concatenate unpacked files using DOS command line into forward and reverse before uploading: is this the best approach? Is there a tools that will start with the .tar file?

 

Andrew