 
            Hello, Did you have a question we can help with? This digest posted from your email address to the galaxy-user mailing list. It would be great if you could post a new question directly without the complete digest next time, with a new subject line, not as a reply to a prior question/post/digest. Please let us know how we can help, Best, Jen Galaxy team On 4/5/11 6:22 PM, youngor Cheung wrote:
<< Graduate Student: Yanfeng Zhang Comparative Genomics Group. Kunming Institute of Zoology,Chinese Academy of Sciences.
From: galaxy-user-request@lists.bx.psu.edu Subject: galaxy-user Digest, Vol 58, Issue 3 To: galaxy-user@lists.bx.psu.edu Date: Tue, 5 Apr 2011 11:33:23 -0400
Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of ! galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: MAF (Eccles, David) 2. Re: Analyzing Targeted Resequencing data with Galaxy (Anton Nekrutenko) 3. Re: MAF (Anton Nekrutenko) 4. convert formats (Sher, Falak) 5. Re: convert formats (Daniel Blankenberg) 6. Subject: Analyzing Target! ed Resequencing data with> Galaxy (Jackie Lighten) 7. Re: Analyzing Targeted Resequencing data with Galaxy (Anton Nekrutenko)
----------------------------------------------------------------------
Message: 1 Date: Tue, 5 Apr 2011 14:33:35 +0200 From: "Eccles, David" <david.eccles@mpi-muenster.mpg.de> To: "Ross" <ross.lazarus@gmail.com> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] MAF Message-ID: <B4B747BF2FE2BB43A6D483192168CBD172D0D5@VSM.exc.top.gwdg.de> Content-Type: text/plain; charset="iso-8859-1"
On Tue, Apr 5, 2011 at 5:19 AM, Laura Iacolina <liacolina@uniss.it> wrote:
Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
From: Ross [mailto:ross.lazarus@gmail.com]
There are some Galaxy wrappers for plink &! gt; (http://pngu.mgh.harvard.edu/~purcell/plink/) that may be useful for some kinds of analysis available in the rgenetics tools if you have linkage pedigree genotype and map files.
I would also advise using plink for this. Calculating SNP marker statistics [1] is the one of the things that it has been designed to do. The main problem is getting data into a format supported by plink, either linkage (one line per individual), or transposed pedigree (one line per marker). There are details on these formats in the plink documentation [2].
[1] http://pngu.mgh.harvard.edu/~purcell/plink/summary.shtml#freq [2] http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#tr
-- David Eccles (gringer)
------------------------------
Message: 2 Date: Tue, 5 Apr 2011 09:56:44 -0400 From: Anton Nekrutenk! o <anton@bx.psu.edu> To: Lali <laurafe@gmail.com>< br>> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with Galaxy Message-ID: <8172FBD2-4CDA-4312-B54F-DCC730A40AB9@bx.psu.edu> Content-Type: text/plain; charset=us-ascii
Lali:
In your case the workflow for capture re-sequencing should look like this:
1. QC data (groom fastq files and plot quality distribution) 2. Map the reads (use bwa) 3. Generate and filter pileup 4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy functionality by taking a look at http://usegalaxy.org/galaxy101 and watching *all* Illumina-related Galaxy quickies (black boxes on the front page on Galaxy). Next, take a look at http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid genotyping funct! ionality in Galaxy in the next two-three months that will include more sophisticated genotypers, recalibration and realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton galaxy team
On Apr 5, 2011, at 2:44 AM, Lali wrote:
Hi! I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would ! be the standard workflow for this kind of data? Which too ls/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost. Any help is appreciated.
L ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development li! st:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
------------------------------
Message: 3 Date: Tue, 5 Apr 2011 10:09:44 -0400 From: Anton Nekrutenko <anton@bx.psu.edu> To: Laura Iacolina <liacolina@uniss.it> Cc: galaxy-user@lists.bx.psu.edu Subject: Re: [galaxy-user] MAF Message-ID: <0E6E6AC6-0300-4B5E-B462-BA6715FC1F6A@bx.psu.edu> Content-Type: text/plain; charset=windows-1252
Laura:
SNP identification and analysis is a very complex subject and without knowing what you are trying to do i! t is very difficult to point you to the right direction. Perhaps a goo d place to start would be a supplement for the last year's report from 1000 Genomes Consortium (Nature. 467(7319): p. 1061-1073). Some of the steps you can perform through Galaxy, yet some are in development.
Thanks!
anton galaxy team
On Apr 5, 2011, at 5:19 AM, Laura Iacolina wrote:
Dear all, I?m analysing SNPs data for the first time. I tried with the few software I found in litterature but they can only manage small datasets. I am currently trying with ?genetics? package in R but the Geno function takes into account a marker at a time. Considering I have to analyse 200 samples with 50K markers is there any way to tell R to analyse each SNP one after the other?
Thank you very much for the help.
Laura
___________________________________________________________ T! he Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
------------------------------
Message: 4 Date: Tue, 5 Apr 2011 10:21:38 -0400 From: "Sher, Falak" <Falak.Sher@childrens.harvard.edu> To: "gala! xy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] convert formats Message-ID: <28032B153244774DA100823F4D4764210CBC2B2D6F@CHEXCCRV4.CHBOSTON.ORG> Content-Type: text/plain; charset="iso-8859-1"
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the format of the resulted wig files into bigwig using Galaxy tool "convert formats' The job is executed but not running, I redo even then it does not start. it is stucked with the message, job is waiting to run. logout and re login are not helping
any suggestion/information please ?
F
------------------------------
Message: 5 Date: Tue, 5 Apr 2011 10:30:47 -0400 From: Daniel Blankenberg <dan@bx.psu.edu> To: "Sher, Falak" <Falak.Sher@childrens.harvard.edu> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>> Subject: Re: [galaxy-user] convert formats Message-ID: <B7D82135-EF7A-4DD1-81F2-9A99C1BA2023@bx.psu.edu> Content-Type: text/plain; charset=us-ascii
Hi Falak,
Due to the fact that the underlying wig to bigWig executable can use huge amounts of RAM, a single large-memory node is allocated for these jobs. This has the unfortunate side effect that wigToBigwig jobs may need to wait for a significant amount of time before being executed. Please be patient, although if you suspect a problem and have waited for a very long period of time, please do report it.
Thanks for using Galaxy,
Dan
On Apr 5, 2011, at 10:21 AM, Sher, Falak wrote:
Hi colleagues,
I used MACS for peak finding through Galaxy, I want to convert the format of the resulted wig files into bigwig using Galaxy tool "convert formats' The ! job is executed but not running, I redo even then it does not start. i t is stucked with the message, job is waiting to run. logout and re login are not helping
any suggestion/information please ?
F
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
---------! ---------------------
Message: 6 Date: Tue, 05 Apr 2011 10:02:12 -0300 From: Jackie Lighten <jc807177@dal.ca> To: <galaxy-user@lists.bx.psu.edu> Subject: [galaxy-user] Subject: Analyzing Targeted Resequencing data with> Galaxy Message-ID: <C9C09924.23C2%jc807177@dal.ca> Content-Type: text/plain; charset="US-ASCII"
You can do all the quality filtering with Galaxy, but may involve various manipulations of the data. If I am not mistaken the "metagenomics" workflow may help you out a little. Its designed for 454 data but should give you an idea of how to go about things. There is a video tutorial on the site for this workflow.
A good place for you to start, however, may be here: Subject: http://edwards.sdsu.edu/prinseq_beta/#
Prinseq is easy to use and will give you a full break down of your raw! data and enables you to filter by quality/length etc.
FASTqc is an Illumina specialized preliminary analysis tool: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
I my self was not very impressed with CLC at all. It lacks very rudimentary yet critical functions.
I am currently working on population amplicon data so couldn't really help you too much in the latest mapping to reference advances, but I found the Lasergene SEQman mapping and de-nova assembler much better than the CLC assembler. Good luck
Jack
On 11
Message: 6 Date: Tue, 5 Apr 2011 08:44:06 +0200 From: Lali <laurafe@gmail.com> To: galaxy-user@lists.bx.psu.edu Subject: [galaxy-user] Analyzing Targeted Resequencing data with Galaxy Message-ID: <BANLkTin1ShWLQQ46+mFFBcxS-dO1GJuw_A@mail.gmail.com> Content-Type: t! ext/plain; charset="iso-8859-1"
Hi! I am having problems with my sequencing results, but I am a newbie
so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to
at this; the whole
genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data? Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimmin! g) and mapping Targeted Resequencing Data?
Is there a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost. Any help is appreciated.
L
------------------------------
Message: 7 Date: Tue, 5 Apr 2011 11:33:18 -0400 From: Anton Nekrutenko <anton@bx.psu.edu> To: Lali <laurafe@gmail.com> Cc: galaxy-user <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] Analyzing Targeted Resequencing data with Galaxy Message-ID: <C33F2B26-C075-47FC-B2F7-9D468ED3ED0E@bx.psu.edu> Content-Type: text/plain; charset="us-ascii"
Lali:
Please, always CC mailing list when you reply.
&! gt; > My only problem with Galaxy is that I have to keep on clearing my cache in order to get the history to display correctly, is there another way of solving this issue?
Which browser/OS are your using?
Thanks,
anton galaxy team
On Apr 5, 2011, at 11:25 AM, Lali wrote:
Thanks so much for the tips Anton! I am very excited about the newer developments. I did watch the quickies and they were very useful for a beginner
I have been playing around a lot with Galaxy, and I have several workflows, my department just started doing sequencing, so we don't have standard procedures set in place. I was assigned ! to evaluate Galaxy and CLC, and so far CLC has not impressed me, excep t for the fact that it can generate reports easily. I think Galaxy is the way to go for me (us, if I can convince them to run a local server), since I am not a bioinformatician, and just the fact that you can queue up actions and just walk away is fantastic (amongst other things). But because I am a beginner, I am not 100% of the settings I have chosen and my data is not looking too good so far, but I am having a bioinformatician come over and help me on Thursday and I think your tips will be of help. My only problem with Galaxy is that I have to keep on clearing my cache in order to get the history to display correctly, is there another way of solving this issue?
Best regards,
L
On Tue, Apr 5, 2011 at 3:56 PM, Anton Nekrutenko <anton@bx.psu.edu> wrote: Lali:
In your case the workflow for capture re-sequencing s! hould look
like me, I actually did my first try at the alignment by following the Illumina single-end tutorial video step by step, but you need to watch the paired-end too, for some of the first steps, which are explained better on that one. like this:
1. QC data (groom fastq files and plot quality distribution) 2. Map the reads (use bwa) 3. Generate and filter pileup 4. Intersect pileup with coordinates of sure select bates.
However, before you dive in please understand basic Galaxy
functionality by taking a look at http://usegalaxy.org/galaxy101 and watching *all* Illumina-related Galaxy quickies (black boxes on the front page on Galaxy). Next, take a look at http://usegalaxy.org/heteroplasmy.
Note, that we are working on bringing "industrial-strength" diploid
genotyping functionality in Galaxy in the next two-three months that will include more sophisticated genotypers, recalibration and realignment tools, and novel visualization approaches.
Thank for using Galaxy.
anton galaxy team
! > >
On Apr 5, 2011, at 2:44 AM, Lali wrote: ; >
Hi! I am having problems with my sequencing results, but I am a newbie at this; so I am thinking there is something wrong with my analysis. So far, I've tried Galaxy and CLC Workbench, but with CLC I could not align to the whole genome, only to individual chromosomes (maybe there is a way, but by the time the trial ended I had not found it).
I used SureSelect capture kit and did single end sequencing on an Illumina. The files the lab sent me are FastQ Illumina 1.5 files, my samples were indexed, and I got a series of files each representing an Index.
What would be the standard workflow for this kind of data? Which tools/settings?
Does anyone have an example Galaxy workflow for preparing (clipping adapters, quality trimming) and mapping Targeted Resequencing Data?
Is there ! a way to obtain a coverage report through Galaxy?
Is it possible to ignore/discard the reads mapped when the coverage is below a certain threshold?
I know, I know, a lot of things, but I am very lost. Any help is appreciated.
L ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions! to this and other Galaxy lists, please use the inte rface at:
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org
Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org