Hi all,

i am trying to use a local instance of my galaxy to pre-format data stored at sra-ncbi.

Does anyone has a working pipeline that (s)he could share.

Here is the pipeline I am using, with some questions.

1/ download sra files to my server.

2/ transform them in fastq using the sra toolbox.

3/ upload them in galaxy, by using the 'add to data library'

4/ use the fastq groomer to enable to use the fast q in galaxy.

Note: i guess that the data at sra are already in the fastq sanger format. So it could be nice to be able to skip that point (it took 10 hours to groom a fastq of 25Gb).

5/ MAP with Bowtie --> fastq to SAM

6/ filter SAM

7/ SAM to BAM

problems:

* sra data i got are RNAseq. I heard that bowtie is not good because can't deal with the splicing (so bowtie is ok for genome but not for RNAseq) ==> what is the best way to align RNAseq? Tophat? The problem is that i heard that if tophat can deal with gaps, it looses information about deletions. Someone told me that it could be better to use BWA and then to add a further step to deal with the splicing and the gaps. Any information?

* to see my data in the IGV, an index (BAI) should be created. Normally, IGV could create it itself, but it didn't work. I heard that data should be ordered. The SAM i got from Bowtie is ordered by name and it should be ordered by chromosom and position. Is it right? In that case i could use the sort tool of galaxy and apply it on the SAM before to transform it in a BAM. Is it right?

any other/related hints.
Is there not a simple tutorial/screencast about this process that i guess most of the galaxy users have already did?

thx

colin
--

Colin Molter

University of Brussels - InSilico Team - http://insilico.ulb.ac.be/