Begin forwarded message:
> From: Dan Jones <djones(a)psu.edu>
> Date: July 13, 2010 11:02:50 AM EDT
> To: Anton Nekrutenko <anton(a)bx.psu.edu>
> Subject: galaxy tool suggestion
> Hi Anton,
> This is Dan (from your bioinformatics class a couple years ago). I have been playing around on galaxy with a couple of new 454 metagenomics datasets. I have been going back and forth between the tools 'Build base quality distribution' and 'filter FASTQ' to assess quality of my data and determine how it is affected by filtering certain length and quality sequences (using FASTQ lets me simultaneously operate on the seq and qual scores). I am mainly trying to understand a systematic decrease in quality that occurs after about 50% sequence length. But, in order to go back and build a base quality distribution boxplot, I need to extract the qual scores from the fastq file, and I currently can't find a way to do this on Galaxy (unless I am missing something obvious, very possible! I see an option to convert fastq to fasta, but I don't get the .qual file with it). I wrote a short py script to do this (attached), and I think that something like it to extract a .qual file from FASTQ would be a nice addition to the galaxy toolbox.
> Hope all is well!
> Daniel Jones
> PhD Candidate, Penn State University
> Department of Geosciences
> 242 Deike Building
> University Park, PA 16802
> cell: 651-245-2775
> lab: 814-865-9340
I have a suggestion to make Galaxy even more useful
You could enable direct data loading of Next-Gen sequence data from NCBI/SRA. As it is now, if someone wants to make use of the NextGen tools, they have to upload a big data file from their desktop. You have enabled very rapid data loading from UCSC, so perhaps you can work the same magic to get data from SRA.
— Stuart Brown
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
I'm working for Dr. Richard McCombie in Cold Spring Harbor Laboratory.
I have tried to install Galaxy locally in my Mac, but there are some issues and it doesn't complete the installation.
Are there any special instructions for mac or could you specify key points to review during the installation process?
Well, thanks and I am looking forward to reading your answer.
Cold Spring Harbor Laboratory
We have or local galaxy instance running which works fine.
In the get data section the Microbes tool has no local ncbi data. The public instance has it.
What is the best/easiest way to get that data into our local instance of galaxy. Have been browsing the wikis and looked through library and dataset documentations but was unable to resolve this at first glance.
Any help/guidance appreciated.
Arun, if you have a plink format genotype file (pbed is best -
compressed) in your history, it should appear among the input files
for the eigenstrat tool - it seems to be ok on Main right now.
They can be uploaded using the get data upload tool - you have to
manually set the datatype to pbed so the three file parts can be
The eigensoft tool uses a subset since WGA SNPs contain a lot of
redundant information - an LD independent set is more efficient and
loses little information.
If you select a pbed it should be automatically converted - and run
more quickly the second time
On Tue, Jul 13, 2010 at 12:03 AM, <galaxy-user-request(a)lists.bx.psu.edu> wrote:
> Send galaxy-user mailing list submissions to
> Message: 7
> Date: Fri, 9 Jul 2010 13:37:16 -0400
> From: "Arun Tiwari" <Arun_Tiwari(a)camh.net>
> To: <galaxy-user(a)bx.psu.edu>
> Subject: [galaxy-user] Rgenetics tool
> Message-ID: <ACF8150B8AE2E042834D0198F418098040AB99(a)camhems-4.camh.ca>
> Content-Type: text/plain; charset="iso-8859-1"
> I was wondering in what format should I upload my genotype data file obtained from PLINK. I tried uploading the binary file (bed, bim, fam) as wellas the ped/map file but I am unable to run any analysis as they never appear in the input file section. I wanted to run eigenstrat using the rgenetics tools for my data.
> Thanks a lot for your help,
> Arun Tiwari
> Centre for Addiction and Mental Health,
> Toronto, Canada
I am pleased to point out that GMOD has a large presence at the BOSC
and ISMB meetings this year. The following SIGs, posters and talks
are related to GMOD projects, and there will likely be an informal
GMOD lunch during ISMB (I'm still working on the details).
NGS Analysis with Galaxy on the Cloud
Monday 3:30-3:55, TT26, Anton Nekrutenko
Sample Tracking and automated data processing in Galaxy
Monday 4:00-4:25, TT28, Anton Nekrutenko
Saturday 4:00-4:15, Lincoln Stein, part of BOSC SIG
GMOD Presents GBrowse 2.0 and JBrowse
Tuesday 11:15-11:40, TT32, Scott Cain
Demonstration of the Pathway Tools Software and BioCyc Databases,
Tuesday 2:45-3:10, TT38, Peter Karp
ISGA - An Intuitive Web Server for Prokaryotic Genome Annotation and
Mon 12:40-2:30, I10, Christopher Hemmerich
Online Quantitative Transcriptome Analysis,
Mon 12:40-2:30, J60, Regina Bohnert
Galaxy NGS functionality from sample tracking to SNP calling: An
Mon 12:40-2:30, U60, Ramakrishna Chakrabarty
AGeS: A Software System for Annotation and Analysis of Genome Sequences,
Sun 12:40-2:30, I01, Nela Zavaljevski
GBrowse and Next Generation Sequencing Data,
I15, Scott Cain
ZFNGenome: A GBrowse-based tool for identifying Zinc Finger Nuclease
target sites in model organisms,
Mon 12:40-2:30, E18, Deepak Reyon
Choosing a Genome Browser for a Model Organism Database: Surveying the
Mon 12:40-2:30, E30, Taner Sen
WebGBrowse - A Web Server for GBrowse,
Mon 12:30-2:30, Z02, Ram Podicheti
An Advanced Web Query Interface for Biological Database,
Monday 12:40-2:30, E02, Peter Karp
Galaxy: Analyze, Visualize, Communicate
Saturday 1:30-5:30, Galaxy Team
I look forward to seeing all GMODers at the meeting!
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
I was wondering in what format should I upload my genotype data file obtained from PLINK. I tried uploading the binary file (bed, bim, fam) as wellas the ped/map file but I am unable to run any analysis as they never appear in the input file section. I wanted to run eigenstrat using the rgenetics tools for my data.
Thanks a lot for your help,
Centre for Addiction and Mental Health,
This email has been scanned by the CAMH Email Security System.
I am having probs uploading large files to a galaxy local install. How
can I troubleshoot this? Whereare the uploaded files stored?
I am on 64 bit Linux with more than ample disk space. So am not sure
what went wrong.
Sent from my iPod
I am trying to find SNPs and/or indel variants that differ between two
groups of samples. The data are 454 sequence capture results spanning a
region of interest for a non-model organism without a complete genome. I
have mapped these to my reference sequence of the region and generated
BAM and pileup files for them. Does anyone know of a tool or method
(either in galaxy or elsewhere) that will allow me to compare datasets
and pick out variation between them (as opposed to things that differ
from the reference sequence).
our metagenomics group is experimenting with spliting up assemblies into several chunks, and then combining the outputs. Is it possible to model in Galaxy a process that splits up, and then converges again after all the branches have finished?
Genome Science/Joint Genome Institute (B-6)
Bioscience Division MS M888
Los Alamos National Laboratory
Los Alamos, NM 87545
phone: (505) 606-2153
fax: (505) 665-3024