[galaxy-dev] Fwd: Seeking Help with Galaxy

16 Oct 2015

      Tendai:

I'm forwarding your e-mail to the dev forum. This is the best place to ask
these kinds of questions.

a.

---------- Forwarded message ----------
From: Mutangadura, Tendai <tendai@missouri.edu>
Date: Thu, Oct 15, 2015 at 7:17 PM
Subject: Seeking Help with Galaxy
To: "anton@bx.psu.edu" <anton@bx.psu.edu>
Cc: "Mutangadura, Tendai" <tendai@missouri.edu>

Dr. Nekrutenko,

I have been trying to learn how to use Galaxy using the Galaxy Main Server
for some time but I have just convinced my boss to allow me to convert a
computer that was laying idle as a Linux machine and I am planning to
install my local instance of Galaxy there. I have been watching some of
your Vimeo instructional videos on using Galaxy and I wanted to find out if
I could from time to ask you or your mentees questions as I try to set up
my Galaxy instance.

My first question is this: is there a difference between downloading and
installing tools to the cloud and doing the same installation on my Linux
desktop? (I downloaded 2 of your videos yesterday on doing this on the
cloud).

My Linux computer has Centos7 installed on it, has 500GB storage and 8GB
RAM. I also have 5 a TB external hard drive that I can use in conjunction
with the desktop. Can I accomplish much more with this compared to what I
can do on my main server Galaxy account?

The main focus of our lab is the use of whole genome sequencing (WGS) to
discover disease causing mutations in dogs, and our lab has made several
discoveries this way. Like many reference genomes, the dog genome reference
has quite a few gaps. My immediate aim after installing my local instance
of Galaxy is to use a 88200 bp bac sequence that bridges a gap in the dog
reference that our collaborators made available to us (they filled the gap
by Sanger Sequencing/chromosome walking) and use this as the reference
sequence to align against reads from various dogs that we have whole genome
sequences for. If you have the time, could you please give me some guidance
as to the best way to do this?

The bac sequence is already in fasta format. I have access to WGS files of
raw sequence reads in the original, unprocessed state as in the form in
which the reads came off the Illumina sequencing machine, and also in
versions subsequent to our bionformatician’s quality control manipulation
and error correction (using MaSuRCA.  My plan is to start practicing
(before my local Galaxy is up and running) by

1.       Uploading the bac sequence and use this as my ‘reference”

2.       Upload all the paired reads for one dog

3.       Align the paired reads using appropriate software (BWA?)

4.       Call Variants using Freebayes?

5.       Do this for several dogs and compare the VCF of the various dogs

6.       Hope to identify disease-causing variants that we potentially
missed due to no alignment because the reads have no sequence to align to.

7.       After this I hope to come up with a workflow which I can save and
export to my local Galaxy.

(I already have the above sequences uploaded into my Galaxy main server
account).

Thank you for your time and for posting the instructional videos that I
have already learnt so much from.

Tendai Mutangadura

(I publish papers as Tendai Mhlanga-Mutangadura)

-- 
Anton Nekrutenko
Professor of Biochemistry
and Molecular Biology
*http://nekrut.bx.psu.edu <http://nekrut.bx.psu.edu>*
(814) 826-3051