Tendai:

I'm forwarding your e-mail to the dev forum. This is the best place to ask these kinds of questions.

---------- Forwarded message ----------
From: Mutangadura, Tendai <tendai@missouri.edu>
Date: Thu, Oct 15, 2015 at 7:17 PM
Subject: Seeking Help with Galaxy
To: "anton@bx.psu.edu" <anton@bx.psu.edu>
Cc: "Mutangadura, Tendai" <tendai@missouri.edu>

Dr. Nekrutenko,

I have been trying to learn how to use Galaxy using the Galaxy Main Server for some time but I have just convinced my boss to allow me to convert a computer that was laying idle as a Linux machine and I am planning to install my local instance of Galaxy there. I have been watching some of your Vimeo instructional videos on using Galaxy and I wanted to find out if I could from time to ask you or your mentees questions as I try to set up my Galaxy instance.

My first question is this: is there a difference between downloading and installing tools to the cloud and doing the same installation on my Linux desktop? (I downloaded 2 of your videos yesterday on doing this on the cloud).

My Linux computer has Centos7 installed on it, has 500GB storage and 8GB RAM. I also have 5 a TB external hard drive that I can use in conjunction with the desktop. Can I accomplish much more with this compared to what I can do on my main server Galaxy account?

The main focus of our lab is the use of whole genome sequencing (WGS) to discover disease causing mutations in dogs, and our lab has made several discoveries this way. Like many reference genomes, the dog genome reference has quite a few gaps. My immediate aim after installing my local instance of Galaxy is to use a 88200 bp bac sequence that bridges a gap in the dog reference that our collaborators made available to us (they filled the gap by Sanger Sequencing/chromosome walking) and use this as the reference sequence to align against reads from various dogs that we have whole genome sequences for. If you have the time, could you please give me some guidance as to the best way to do this?

The bac sequence is already in fasta format. I have access to WGS files of raw sequence reads in the original, unprocessed state as in the form in which the reads came off the Illumina sequencing machine, and also in versions subsequent to our bionformatician’s quality control manipulation and error correction (using MaSuRCA. My plan is to start practicing (before my local Galaxy is up and running) by

1. Uploading the bac sequence and use this as my ‘reference”

2. Upload all the paired reads for one dog

3. Align the paired reads using appropriate software (BWA?)

4. Call Variants using Freebayes?

5. Do this for several dogs and compare the VCF of the various dogs

6. Hope to identify disease-causing variants that we potentially missed due to no alignment because the reads have no sequence to align to.

7. After this I hope to come up with a workflow which I can save and export to my local Galaxy.

(I already have the above sequences uploaded into my Galaxy main server account).

Thank you for your time and for posting the instructional videos that I have already learnt so much from.

Tendai Mutangadura

(I publish papers as Tendai Mhlanga-Mutangadura)

Anton Nekrutenko
Professor of Biochemistry

and Molecular Biology
http://nekrut.bx.psu.edu
(814) 826-3051