Re: [galaxy-user] galaxy-user Digest, Vol 58, Issue 3

6 Apr 2011

      Hello,

Did you have a question we can help with? This digest posted from your 
email address to the galaxy-user mailing list. It would be great if you 
could post a new question directly without the complete digest next 
time, with a new subject line, not as a reply to a prior 
question/post/digest.

Please let us know how we can help,

Best,

Jen
Galaxy team

On 4/5/11 6:22 PM, youngor Cheung wrote:
...
<< Graduate Student: Yanfeng Zhang
Comparative Genomics Group.
Kunming Institute of Zoology,Chinese Academy of Sciences.
...
...
...
From: galaxy-user-request@lists.bx.psu.edu
Subject: galaxy-user Digest, Vol 58, Issue 3
To: galaxy-user@lists.bx.psu.edu
Date: Tue, 5 Apr 2011 11:33:23 -0400
Send galaxy-user mailing list submissions to
galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of ! galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-user Digest Vol
..." to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the
thread you are responding to.
Why?
1. This will keep the subject meaningful. People will have some idea
from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match
search queries, but that aren't actually informative.
Today's Topics:
1. Re: MAF (Eccles, David)
2. Re: Analyzing Targeted Resequencing data with Galaxy
(Anton Nekrutenko)
3. Re: MAF (Anton Nekrutenko)
4. convert formats (Sher, Falak)
5. Re: convert formats (Daniel Blankenberg)
6. Subject: Analyzing Target! ed Resequencing data with> Galaxy
(Jackie Lighten)
7. Re: Analyzing Targeted Resequencing data with Galaxy
(Anton Nekrutenko)
----------------------------------------------------------------------
Message: 1
Date: Tue, 5 Apr 2011 14:33:35 +0200
From: "Eccles, David" <david.eccles@mpi-muenster.mpg.de>
To: "Ross" <ross.lazarus@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] MAF
Message-ID:
<B4B747BF2FE2BB43A6D483192168CBD172D0D5@VSM.exc.top.gwdg.de>
Content-Type: text/plain; charset="iso-8859-1"
On Tue, Apr 5, 2011 at 5:19 AM, Laura Iacolina <liacolina@uniss.it>
wrote:
...
Considering I have to analyse 200 samples with 50K markers is there
any way
to tell R to analyse each SNP one after the other?
From: Ross [mailto:ross.lazarus@gmail.com]
...
There are some Galaxy wrappers for plink
&! gt; (http://pngu.mgh.harvard.edu/~purcell/plink/) that may be
useful for
some kinds of analysis available in the rgenetics tools if you have
linkage pedigree genotype and map files.
I would also advise using plink for this. Calculating SNP marker
statistics
[1] is the one of the things that it has been designed to do. The main
problem is getting data into a format supported by plink, either
linkage (one
line per individual), or transposed pedigree (one line per marker).
There are
details on these formats in the plink documentation [2].
[1] http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq
[2] http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr
--
David Eccles (gringer)
------------------------------
Message: 2
Date: Tue, 5 Apr 2011 09:56:44 -0400
From: Anton Nekrutenk! o <anton@bx.psu.edu>
To: Lali <laurafe@gmail.com>< br>> Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with
Galaxy
Message-ID: <8172FBD2-4CDA-4312-B54F-DCC730A40AB9@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Lali:
In your case the workflow for capture re-sequencing should look like
this:
1. QC data (groom fastq files and plot quality distribution)
2. Map the reads (use bwa)
3. Generate and filter pileup
4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy
functionality by taking a look at http://usegalaxy.org/galaxy101 and
watching *all* Illumina-related Galaxy quickies (black boxes on the
front page on Galaxy). Next, take a look at
http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid
genotyping funct! ionality in Galaxy in the next two-three months that
will include more sophisticated genotypers, recalibration and
realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton
galaxy team
On Apr 5, 2011, at 2:44 AM, Lali wrote:
...
Hi!
I am having problems with my sequencing results, but I am a newbie
at this; so I am thinking there is something wrong with my analysis. So
far, I've tried Galaxy and CLC Workbench, but with CLC I could not align
to the whole genome, only to individual chromosomes (maybe there is a
way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an
Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my
samples were indexed, and I got a series of files each representing an
Index.
What would ! be the standard workflow for this kind of data?
Which too ls/settings?
Does anyone have an example Galaxy workflow for preparing (clipping
adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage
is below a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L ___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development li! st:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
------------------------------
Message: 3
Date: Tue, 5 Apr 2011 10:09:44 -0400
From: Anton Nekrutenko <anton@bx.psu.edu>
To: Laura Iacolina <liacolina@uniss.it>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] MAF
Message-ID: <0E6E6AC6-0300-4B5E-B462-BA6715FC1F6A@bx.psu.edu>
Content-Type: text/plain; charset=windows-1252
Laura:
SNP identification and analysis is a very complex subject and without
knowing what you are trying to do i! t is very difficult to point you to
the right direction. Perhaps a goo d place to start would be a
supplement for the last year's report from 1000 Genomes Consortium
(Nature. 467(7319): p. 1061-1073). Some of the steps you can perform
through Galaxy, yet some are in development.
Thanks!
anton
galaxy team
On Apr 5, 2011, at 5:19 AM, Laura Iacolina wrote:
...
Dear all,
I?m analysing SNPs data for the first time. I tried with the few
software I found in litterature but they can only manage small datasets.
I am currently trying with ?genetics? package in R but the Geno function
takes into account a marker at a time. Considering I have to analyse 200
samples with 50K markers is there any way to tell R to analyse each SNP
one after the other?
Thank you very much for the help.
Laura
___________________________________________________________
T! he Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
------------------------------
Message: 4
Date: Tue, 5 Apr 2011 10:21:38 -0400
From: "Sher, Falak" <Falak.Sher@childrens.harvard.edu>
To: "gala! xy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] convert formats
Message-ID:
<28032B153244774DA100823F4D4764210CBC2B2D6F@CHEXCCRV4.CHBOSTON.ORG>
Content-Type: text/plain; charset="iso-8859-1"
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the
format of the resulted wig files into bigwig using Galaxy tool "convert
formats'
The job is executed but not running, I redo even then it does not
start. it is stucked with the message, job is waiting to run.
logout and re login are not helping
any suggestion/information please ?
F
------------------------------
Message: 5
Date: Tue, 5 Apr 2011 10:30:47 -0400
From: Daniel Blankenberg <dan@bx.psu.edu>
To: "Sher, Falak" <Falak.Sher@childrens.harvard.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>>
Subject: Re: [galaxy-user] convert formats
Message-ID: <B7D82135-EF7A-4DD1-81F2-9A99C1BA2023@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Hi Falak,
Due to the fact that the underlying wig to bigWig executable can use
huge amounts of RAM, a single large-memory node is allocated for these
jobs. This has the unfortunate side effect that wigToBigwig jobs may
need to wait for a significant amount of time before being executed.
Please be patient, although if you suspect a problem and have waited for
a very long period of time, please do report it.
Thanks for using Galaxy,
Dan
On Apr 5, 2011, at 10:21 AM, Sher, Falak wrote:
...
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the
format of the resulted wig files into bigwig using Galaxy tool "convert
formats'
The ! job is executed but not running, I redo even then it does not
start. i t is stucked with the message, job is waiting to run.
logout and re login are not helping
any suggestion/information please ?
F
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
---------! ---------------------
Message: 6
Date: Tue, 05 Apr 2011 10:02:12 -0300
From: Jackie Lighten <jc807177@dal.ca>
To: <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Subject: Analyzing Targeted Resequencing data
with> Galaxy
Message-ID: <C9C09924.23C2%jc807177@dal.ca>
Content-Type: text/plain; charset="US-ASCII"
You can do all the quality filtering with Galaxy, but may involve various
manipulations of the data. If I am not mistaken the "metagenomics"
workflow
may help you out a little. Its designed for 454 data but should give
you an
idea of how to go about things. There is a video tutorial on the site for
this workflow.
A good place for you to start, however, may be here: Subject:
http://edwards.sdsu.edu/prinseq_beta/#
Prinseq is easy to use and will give you a full break down of your
raw! data
and enables you to filter by quality/length etc.
FASTqc is an Illumina specialized preliminary analysis tool:
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
I my self was not very impressed with CLC at all. It lacks very
rudimentary
yet critical functions.
I am currently working on population amplicon data so couldn't really
help
you too much in the latest mapping to reference advances, but I found the
Lasergene SEQman mapping and de-nova assembler much better than the CLC
assembler.
Good luck
Jack
On 11
...
Message: 6
Date: Tue, 5 Apr 2011 08:44:06 +0200
From: Lali <laurafe@gmail.com>
To: galaxy-user@lists.bx.psu.edu
Subject: [galaxy-user] Analyzing Targeted Resequencing data with
Galaxy
Message-ID: <BANLkTin1ShWLQQ46+mFFBcxS-dO1GJuw_A@mail.gmail.com>
Content-Type: t! ext/plain; charset="iso-8859-1"
Hi!
I am having problems with my sequencing results, but I am a newbie
at this;
...
so I am thinking there is something wrong with my analysis. So far,
I've
tried Galaxy and CLC Workbench, but with CLC I could not align to
the whole
genome, only to individual chromosomes (maybe there is a way, but
by the
time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an
Illumina.
The files the lab sent me are FastQ Illumina 1.5 files, my samples were
indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data?
Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping
adapters, quality trimmin! g) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage
is below
a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L
------------------------------
Message: 7
Date: Tue, 5 Apr 2011 11:33:18 -0400
From: Anton Nekrutenko <anton@bx.psu.edu>
To: Lali <laurafe@gmail.com>
Cc: galaxy-user <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with
Galaxy
Message-ID: <C33F2B26-C075-47FC-B2F7-9D468ED3ED0E@bx.psu.edu>
Content-Type: text/plain; charset="us-ascii"
Lali:
Please, always CC mailing list when you reply.
&! gt; > My only problem with Galaxy is that I have to keep on clearing
my cache in order to get the history to display correctly, is there
another way of solving this issue?
...
Which browser/OS are your using?
Thanks,
anton
galaxy team
On Apr 5, 2011, at 11:25 AM, Lali wrote:
...
Thanks so much for the tips Anton!
I am very excited about the newer developments.
I did watch the quickies and they were very useful for a beginner
like me, I actually did my first try at the alignment by following the
Illumina single-end tutorial video step by step, but you need to watch
the paired-end too, for some of the first steps, which are explained
better on that one.
...
...
I have been playing around a lot with Galaxy, and I have several
workflows, my department just started doing sequencing, so we don't have
standard procedures set in place. I was assigned ! to evaluate Galaxy
and CLC, and so far CLC has not impressed me, excep t for the fact that
it can generate reports easily.
I think Galaxy is the way to go for me (us, if I can convince them
to run a local server), since I am not a bioinformatician, and just the
fact that you can queue up actions and just walk away is fantastic
(amongst other things).
But because I am a beginner, I am not 100% of the settings I have
chosen and my data is not looking too good so far, but I am having a
bioinformatician come over and help me on Thursday and I think your tips
will be of help.
My only problem with Galaxy is that I have to keep on clearing my
cache in order to get the history to display correctly, is there another
way of solving this issue?
Best regards,
L
On Tue, Apr 5, 2011 at 3:56 PM, Anton Nekrutenko <anton@bx.psu.edu>
wrote:
Lali:
In your case the workflow for capture re-sequencing s! hould look
like this:
1. QC data (groom fastq files and plot quality distribution)
2. Map the reads (use bwa)
3. Generate and filter pileup
4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy
functionality by taking a look at http://usegalaxy.org/galaxy101 and
watching *all* Illumina-related Galaxy quickies (black boxes on the
front page on Galaxy). Next, take a look at
http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid
genotyping functionality in Galaxy in the next two-three months that
will include more sophisticated genotypers, recalibration and
realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton
galaxy team
! > >
...
On Apr 5, 2011, at 2:44 AM, Lali wrote:
; >
...
Hi!
I am having problems with my sequencing results, but I am a
newbie at this; so I am thinking there is something wrong with my
analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I
could not align to the whole genome, only to individual chromosomes
(maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an
Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my
samples were indexed, and I got a series of files each representing an
Index.
What would be the standard workflow for this kind of data?
Which tools/settings?
Does anyone have an example Galaxy workflow for preparing
(clipping adapters, quality trimming) and mapping Targeted Resequencing
Data?
Is there ! a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the
coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L ___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions! to this and other Galaxy lists,
please use the inte rface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org

Re: [galaxy-user] galaxy-user Digest, Vol 58, Issue 3

Jennifer Jackson