I'm cc'ing galaxy-user because your questions are more about use than bugs.
> Do you have any suggestiong for mapping cDNA reads using Lastz?
> I imagine the gaps caused by splicing cause problems for mapping cDNA reads using the commonly used setting?
You're correct here.
> Maybe setting up a "cDNA" mapping setting would be helpful, if such a thing is possible with LastZ.
I looked through the LastZ documentation but didn't see anything that would support mapping cDNA to a reference genome.
> I suppose an alternative would be to map with bowtie, trimming all the reads to be equal length?
You could try this, but splicing is likely to prevent good mapping. A better alternative is to use Tophat: http://tophat.cbcb.umd.edu/ Tophat is a splice junction mapper; Tophat, among other outputs, provides a list of mapped reads in SAM format. We have a very simple version of Tophat running on our test server that you try out: http://test.g2.bx.psu.edu/ We'll have a more complete version of Tophat up in the next day or two.
A caveat: Tophat is designed for Illumina data, so you may not get optimal results when using 454 data.
Jelle, thanks for pointing out this dead-end link - now fixed - for
the record, bitbucket marks any link preceded by 'wiki:' to be an
For the most up-to-date information on the WGA/SNP tools
http://rgenetics.org is a good place to look.
> Message: 6
> Date: Mon, 26 Apr 2010 09:03:14 +0200
> From: Jelle Scholtalbers <j.scholtalbers(a)gmail.com>
> To: galaxy <galaxy-user(a)bx.psu.edu>
> Subject: [galaxy-user] rgenetics and atlas
> Content-Type: text/plain; charset=ISO-8859-1
> I was wondering where I could find all the rgenetics dependencies as
> the link at the bottom of:
> doesn't seem to bring me anywhere.
> Furthermore it seems that I'm missing the module atlas for genetrack,
> although I haven't seen anything about the need for this dependency.
> Should this be added as a dependency or is this an egg that should be
> automatically fetched?
I am a new Galaxy user and I am trying to find out more about creation and
usage of workflows. Once you have a workflow, its usage seems to be quite
straighforward. But for the workflow creation, I am still looking for more
documentation. I am especially interested in these topics:
* How can I start a workflow with data from my computer if the upload step
is not available for workflows?
* What should I do for my new tools (added in my own Galaxy installation) to
be able to participate in workflows. Anything specila to add in their XML
description file perhaps? [Perhaps this question is more for the galaxy-dev
* Is there perhaps more documentation on Galaxy workflows anywhere?
Thanks for any help,
I was wondering where I could find all the rgenetics dependencies as
the link at the bottom of:
doesn't seem to bring me anywhere.
Furthermore it seems that I'm missing the module atlas for genetrack,
although I haven't seen anything about the need for this dependency.
Should this be added as a dependency or is this an egg that should be
i am using mapping with Bowtie and its taking unexcepted time..i usally mapp
data with bowtie in 30 mis or so but this time it is taking more than 18
so i wanted to know galaxy server is down or at my end problem ???
I am forwarding your e-mail to galaxy-user list. To answer your
question - you will soon be able to upload tarred gzipped datasets
that may contain multiple files.
On Apr 16, 2010, at 10:29 AM, Caiti Smukowski wrote:
> Hi Anton,
> I am a graduate student in Mohamed Noor's lab at Duke University - I
> believe you met at a conference where the Galaxy tool was discussed.
> Mohamed suggested I contact you with a quick question. I am
> interested in using Galaxy to do some linear regressions, but I am
> encountering a problem. I have 640 individual datasets I would like
> to look at. It seems I would have to individually upload each one or
> alternatively upload them all as one file. I have decided to upload
> them as one file, and then I was hoping to find a tool where I could
> separate the file by row into different files once in Galaxy (file
> consists of 3 columns and thousands of rows, the first column has
> names in it that I would like to sort). So ideally, I would like to
> sort the file by colum one (name) into individual files. Is there a
> way to do this?
> Caiti Smukowski
> I am asking this question because I used to use Maq's
> sol2sanger (I guess it is just similar to your "Solexa") to convert all
> data generated by Illumina 1.5.
The different fastq formats are broadly summarised by:
S - Sanger Phred+33, 41 values (0, 40)
I - Illumina 1.3 Phred+64, 41 values (0, 40)
X - Solexa Solexa+64, 68 values (-5, 62)
However, at least in my version of the MAQ software (some months old),
sol2sanger conversion converts from X to S and NOT from I to X. So if
you feed I to the MAQ converter you are going to get slightly
incorrect Sanger qualities (because it is expecting the input
qualities to have been calculated using the Solexa formula but they
have in fact been calculated using Phred). If you search on
seqanswers.com you will find a post that details how you need to
modify the MAQ conversion script to make the conversion from I to S.
Could this explain the discrepancies you observe?
> Message: 4
> Date: Fri, 16 Apr 2010 11:20:09 -0400
> From: "Yao, Jianchao" <jyao(a)cshl.edu>
> To: <galaxy-user(a)bx.psu.edu>
> Subject: [galaxy-user] Question about FASTQ Groomer
> Content-Type: text/plain; charset="us-ascii"
> To Whom It May Concern:
> I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed
> there is an option for "Input FASTQ quality scores type". My question is
> what different conversions you will do when I choose "Sloexa" or
> "Illumina 1.3+". I am asking this question because I used to use Maq's
> sol2sanger (I guess it is just similar to your "Solexa") to convert all
> data generated by Illumina 1.5. It seems like, based on your options, I
> should have chosen other conversion (e.g., your "Illumina 1.3+") to
> convert data generated by Illumina 1.5
> Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the
> quality score calculation. But, when I use BWA and SAMtools to do
> mapping and call SNPs, I notice the size of the bam or pileup files are
> very different between those two different conversions. Also, it looks
> like even the coverage for some of the bases are different when choosing
> different conversions.
> Can you tell me how the conversion can affect the final result in terms
> of coverage?
> All your help will be greatly appreciated!
> -Jianchao Yao
To Whom It May Concern:
I am a new user to Galaxy. In the function of "FASTQ Groomer", I noticed
there is an option for "Input FASTQ quality scores type". My question is
what different conversions you will do when I choose "Sloexa" or
"Illumina 1.3+". I am asking this question because I used to use Maq's
sol2sanger (I guess it is just similar to your "Solexa") to convert all
data generated by Illumina 1.5. It seems like, based on your options, I
should have chosen other conversion (e.g., your "Illumina 1.3+") to
convert data generated by Illumina 1.5
Also, it looks like "Sloexa" and "Illumina 1.3+" just differ in the
quality score calculation. But, when I use BWA and SAMtools to do
mapping and call SNPs, I notice the size of the bam or pileup files are
very different between those two different conversions. Also, it looks
like even the coverage for some of the bases are different when choosing
Can you tell me how the conversion can affect the final result in terms
All your help will be greatly appreciated!
I followed the advice for multiple installed python versions to set
~galaxy/galaxy-python/python and set the environment for the galaxy user
to have that in the PATH
Problem is, when a job is started on GridEngine, it does not use the
"-V" flag to inherit the environment. So it runs with a different python
version and more important it does not set LD_LIBRARY_PATH and fails for
some tools which need some special lib. (libRblas in this case)
Where can I fix that?
I ran into a researcher the other day who said her 'boss does not want
me to put our data on galaxy'.
What can I tell her to reassure them of privacy and safety?
Computational Genetics Lab
Dartmouth Hitchcock Medical Center