From NCBI SRA to UCSC viewer pipeline.

7 Jul 2011

      Hi all,
i am trying to use a local instance of my galaxy to pre-format data stored
at sra-ncbi.
Does anyone has a working pipeline that (s)he could share.

Here is the pipeline I am using, with some questions.

1/ download sra files to my server.
2/ transform them in fastq using the sra toolbox.
3/ upload them in galaxy, by using the 'add to data library'
4/ use the fastq groomer to enable to use the fast q in galaxy.
Note: i guess that the data at sra are already in the fastq sanger format.
So it could be nice to be able to skip that point (it took 10 hours to groom
a fastq of 25Gb).
5/ MAP with Bowtie --> fastq to SAM
6/ filter SAM
7/ SAM to BAM

problems:
* sra data i got are RNAseq. I heard that bowtie is not good because can't
deal with the splicing (so bowtie is ok for genome but not for RNAseq) ==>
what is the best way to align RNAseq? Tophat? The problem is that i heard
that if tophat can deal with gaps, it looses information about deletions.
Someone told me that it could be better to use BWA and then to add a further
step to deal with the splicing and the gaps. Any information?

* to see my data in the IGV, an index (BAI) should be created. Normally, IGV
could create it itself, but it didn't work. I heard that data should be
ordered. The SAM i got from Bowtie is ordered by name and it should be
ordered by chromosom and position. Is it right? In that case i could use the
sort tool of galaxy and apply it on the SAM before to transform it in a BAM.
Is it right?

any other/related hints.
Is there not a simple tutorial/screencast about this process that i guess
most of the galaxy users have already did?

thx
colin
-- 
Colin Molter
University of Brussels - InSilico Team - http://insilico.ulb.ac.be/

colin molter

Peter Cock

colin molter

tags

participants (2)