New subject: galaxy-user Digest, Vol 58, Issue 3

5 Apr 2011


      << Graduate Student: Yanfeng Zhang
   Comparative Genomics Group.
   Kunming Institute of Zoology,Chinese Academy of Sciences.
...
...
...
From: galaxy-user-request@lists.bx.psu.edu
Subject: galaxy-user Digest, Vol 58, Issue 3
To: galaxy-user@lists.bx.psu.edu
Date: Tue, 5 Apr 2011 11:33:23 -0400
Send galaxy-user mailing list submissions to
  galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
  http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
  galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
  galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-user digest..."
HEY!  This is important!  If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why?
1. This will keep the subject meaningful.  People will have some idea from the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: MAF (Eccles, David)
   2. Re: Analyzing Targeted Resequencing data with Galaxy
      (Anton Nekrutenko)
   3. Re: MAF (Anton Nekrutenko)
   4. convert formats (Sher, Falak)
   5. Re: convert formats (Daniel Blankenberg)
   6. Subject: Analyzing Targeted Resequencing data with> Galaxy
      (Jackie Lighten)
   7. Re: Analyzing Targeted Resequencing data with Galaxy
      (Anton Nekrutenko)
----------------------------------------------------------------------
Message: 1
Date: Tue, 5 Apr 2011 14:33:35 +0200
From: "Eccles, David" <david.eccles@mpi-muenster.mpg.de>
To: "Ross" <ross.lazarus@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] MAF
Message-ID:
  <B4B747BF2FE2BB43A6D483192168CBD172D0D5@VSM.exc.top.gwdg.de>
Content-Type: text/plain;	charset="iso-8859-1"
On Tue, Apr 5, 2011 at 5:19 AM, Laura Iacolina <liacolina@uniss.it> wrote:
...
Considering I have to analyse 200 samples with 50K markers is there any way
to tell R to analyse each SNP one after the other?
From: Ross [mailto:ross.lazarus@gmail.com]
...
There are some Galaxy wrappers for plink
(http://pngu.mgh.harvard.edu/~purcell/plink/) that may be useful for
some kinds of analysis available in the rgenetics tools if you have
linkage pedigree genotype and map files.
I would also advise using plink for this. Calculating SNP marker statistics
[1] is the one of the things that it has been designed to do. The main
problem is getting data into a format supported by plink, either linkage (one
line per individual), or transposed pedigree (one line per marker). There are
details on these formats in the plink documentation [2].
[1] http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq
[2] http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr
--
David Eccles (gringer)
------------------------------
Message: 2
Date: Tue, 5 Apr 2011 09:56:44 -0400
From: Anton Nekrutenko <anton@bx.psu.edu>
To: Lali <laurafe@gmail.com>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with
  Galaxy
Message-ID: <8172FBD2-4CDA-4312-B54F-DCC730A40AB9@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Lali:
In your case the workflow for capture re-sequencing should look like this:
1. QC data (groom fastq files and plot quality distribution)
2. Map the reads (use bwa)
3. Generate and filter pileup 
4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy functionality by taking a look at http://usegalaxy.org/galaxy101 and watching *all* Illumina-related Galaxy quickies (black boxes on the front page on Galaxy). Next, take a look at http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid genotyping functionality in Galaxy in the next two-three months that will include more sophisticated genotypers, recalibration and realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton
galaxy team
On Apr 5, 2011, at 2:44 AM, Lali wrote:
...
Hi!
I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data?
Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L ___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
------------------------------
Message: 3
Date: Tue, 5 Apr 2011 10:09:44 -0400
From: Anton Nekrutenko <anton@bx.psu.edu>
To: Laura Iacolina <liacolina@uniss.it>
Cc: galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] MAF
Message-ID: <0E6E6AC6-0300-4B5E-B462-BA6715FC1F6A@bx.psu.edu>
Content-Type: text/plain; charset=windows-1252
Laura:
SNP identification and analysis is a very complex subject and without knowing what you are trying to do it is very difficult to point you to the right direction. Perhaps a good place to start would be a supplement for the last year's report from 1000 Genomes Consortium (Nature. 467(7319): p. 1061-1073). Some of the steps you can perform through Galaxy, yet some are in development.
Thanks!
anton
galaxy team
On Apr 5, 2011, at 5:19 AM, Laura Iacolina wrote:
...
Dear all,
I?m analysing SNPs data for the first time. I tried with the few software I found in litterature but they can only manage small datasets. I am currently trying with ?genetics? package in R but the Geno function takes into account a marker at a time. Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
Thank you very much for the help.
Laura
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
------------------------------
Message: 4
Date: Tue, 5 Apr 2011 10:21:38 -0400
From: "Sher, Falak" <Falak.Sher@childrens.harvard.edu>
To: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] convert formats
Message-ID:
  <28032B153244774DA100823F4D4764210CBC2B2D6F@CHEXCCRV4.CHBOSTON.ORG>
Content-Type: text/plain; charset="iso-8859-1"
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the format of the resulted wig files into bigwig using Galaxy tool "convert formats'
The job is executed but not running, I redo even then it does not start. it is stucked with the message, job is waiting to run.
logout and re login are not helping
any suggestion/information please ?
F
------------------------------
Message: 5
Date: Tue, 5 Apr 2011 10:30:47 -0400
From: Daniel Blankenberg <dan@bx.psu.edu>
To: "Sher, Falak" <Falak.Sher@childrens.harvard.edu>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] convert formats
Message-ID: <B7D82135-EF7A-4DD1-81F2-9A99C1BA2023@bx.psu.edu>
Content-Type: text/plain; charset=us-ascii
Hi Falak,
Due to the fact that the underlying wig to bigWig executable can use huge amounts of RAM, a single large-memory node is allocated for these jobs. This has the unfortunate side effect that wigToBigwig jobs may need to wait for a significant amount of time before being executed. Please be patient, although if you suspect a problem and have waited for a very long period of time, please do report it.
Thanks for using Galaxy,
Dan
On Apr 5, 2011, at 10:21 AM, Sher, Falak wrote:
...
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the format of the resulted wig files into bigwig using Galaxy tool "convert formats'
The job is executed but not running, I redo even then it does not start. it is stucked with the message, job is waiting to run.
logout and re login are not helping
any suggestion/information please ?
F
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
------------------------------
Message: 6
Date: Tue, 05 Apr 2011 10:02:12 -0300
From: Jackie Lighten <jc807177@dal.ca>
To: <galaxy-user@lists.bx.psu.edu>
Subject: [galaxy-user] Subject: Analyzing Targeted Resequencing data
  with> Galaxy
Message-ID: <C9C09924.23C2%jc807177@dal.ca>
Content-Type: text/plain;	charset="US-ASCII"
You can do all the quality filtering with Galaxy, but may involve various
manipulations of the data. If I am not mistaken the "metagenomics" workflow
may help you out a little. Its designed for 454 data but should give you an
idea of how to go about things. There is a video tutorial on the site for
this workflow.
A good place for you to start, however, may be here: Subject:
http://edwards.sdsu.edu/prinseq_beta/#
Prinseq is easy to use and will give you a full break down of your raw data
and enables you to filter by quality/length etc.
FASTqc is an Illumina specialized preliminary analysis tool:
http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
I my self was not very impressed with CLC at all. It lacks very rudimentary
yet critical functions.
I am currently working on population amplicon data so couldn't really help
you too much in the latest mapping to reference advances, but I found the
Lasergene SEQman mapping and de-nova assembler much better than the CLC
assembler.
Good luck
Jack
On 11
...
Message: 6
Date: Tue, 5 Apr 2011 08:44:06 +0200
From: Lali <laurafe@gmail.com>
To: galaxy-user@lists.bx.psu.edu
Subject: [galaxy-user] Analyzing Targeted Resequencing data with
Galaxy
Message-ID: <BANLkTin1ShWLQQ46+mFFBcxS-dO1GJuw_A@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
Hi!
I am having problems with my sequencing results, but I am a newbie at this;
so I am thinking there is something wrong with my analysis. So far, I've
tried Galaxy and CLC Workbench, but with CLC I could not align to the whole
genome, only to individual chromosomes (maybe there is a way, but by the
time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina.
The files the lab sent me are FastQ Illumina 1.5 files, my samples were
indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data?
Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping
adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below
a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L
------------------------------
Message: 7
Date: Tue, 5 Apr 2011 11:33:18 -0400
From: Anton Nekrutenko <anton@bx.psu.edu>
To: Lali <laurafe@gmail.com>
Cc: galaxy-user <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with
  Galaxy
Message-ID: <C33F2B26-C075-47FC-B2F7-9D468ED3ED0E@bx.psu.edu>
Content-Type: text/plain; charset="us-ascii"
Lali:
Please, always CC mailing list when you reply.
...
My only problem with Galaxy is that I have to keep on clearing my cache in order to get the history to display correctly, is there another way of solving this issue?
Which browser/OS are your using?
Thanks,
anton
galaxy team
On Apr 5, 2011, at 11:25 AM, Lali wrote:
...
Thanks so much for the tips Anton!
I am very excited about the newer developments.
I did watch the quickies and they were very useful for a beginner like me, I actually did my first try at the alignment by following the Illumina single-end tutorial video step by step, but you need to watch the paired-end too, for some of the first steps, which are explained better on that one.
I have been playing around a lot with Galaxy, and I have several workflows, my department just started doing sequencing, so we don't have standard procedures set in place. I was assigned to evaluate Galaxy and CLC, and so far CLC has not impressed me, except for the fact that it can generate reports easily.
I think Galaxy is the way to go for me (us, if I can convince them to run a local server), since I am not a bioinformatician, and just the fact that you can queue up actions and just walk away is fantastic (amongst other things).
But because I am a beginner, I am not 100% of the settings I have chosen and my data is not looking too good so far, but I am having a bioinformatician come over and help me on Thursday and I think your tips will be of help.
My only problem with Galaxy is that I have to keep on clearing my cache in order to get the history to display correctly, is there another way of solving this issue?
Best regards,
L
On Tue, Apr 5, 2011 at 3:56 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote:
Lali:
In your case the workflow for capture re-sequencing should look like this:
1. QC data (groom fastq files and plot quality distribution)
2. Map the reads (use bwa)
3. Generate and filter pileup
4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy functionality by taking a look at http://usegalaxy.org/galaxy101 and watching *all* Illumina-related Galaxy quickies (black boxes on the front page on Galaxy). Next, take a look at http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid genotyping functionality in Galaxy in the next two-three months that will include more sophisticated genotypers, recalibration and realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton
galaxy team
On Apr 5, 2011, at 2:44 AM, Lali wrote:
...
Hi!
I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data?
Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost.
Any help is appreciated.
L ___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists,
please use the interface at:
http://lists.bx.psu.edu/
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org

Re: [galaxy-user] galaxy-user Digest, Vol 58, Issue 3

youngor Cheung

Jennifer Jackson

tags

participants (2)