Galaxy Roadshow | San Diego, CA | Jan 2011
by Anton Nekrutenko
Dear Galaxy Users and Developers:
Dan Blankenberg (dan(a)bx.psu.edu) from the Galaxy Team will be in San Diego between 14 and 20 of January. If anyone in SD area runs local Galaxy installs and would like a bit face-to-face time with Dan (he is one of the senior developers on the project) e-mail him directly.
Thanks!
anton
Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org
11 years, 6 months
Re: [galaxy-dev] [galaxy-user] Read shuffler and code contributions
by Florent Angly
On 16/12/10 18:50, Peter wrote:
>> Hi Peter,
>>
>>> Are you asking for a tool to interleave to FASTQ or FASTA files with
>>> matching entries (with matching names in the same order) into one
>>> file which alternates forward then reverse read?
>> Yes, indeed, this is what I am proposing.
>>
>>> Would you prefer it with or without error checking?
>> Error checking is best.
> I'd agree.
>
>>> I think the scripts in velvet are fast but will fail horribly with
>>> bad input... note there is a simple Biopython script to do this
>>> included with velvet already (simple version with no error checking,
>>> I have written a more robust version too - it looks like I haven't
>>> sent it to Daniel to include in velvet though).
>> I rolled my own FASTQ paired read interlacer and deinterlacer today, using
>> the Galaxy Python modules in lib/galaxy_utils/. I must say these modules
>> made it quite convenient and efficient to implement error-checking in the
>> (de)interlacing. You can find the scripts here if you're interested:
>> http://bitbucket.org/fangly/galaxy-central
>> I'll make the XML wrappers tomorrow and test them. Hopefully after this is
>> done, my changes can be pulled into the official Galaxy repository.
> For the deinterlacer, I previously offered to write something like
> that for Galaxy and was told to submit it to the Tool Shed initially
> (although it may be merged into the official repository at some
> point). See "Divide FASTQ file into paired and unpaired reads"
> on http://community.g2.bx.psu.edu/ for my tool.
> I also note you've changed the return behaviour of the Galaxy
> FASTQ library method get_paired_identifier - that API change
> could break other parts of Galaxy or 3rd party tools.
Yes, you're right about breaking the API. I realized that and reverted
my change. I am now running the Galaxy tests to make sure everything is
alright.
> Looking at that Galaxy lib, perhaps I can offer some of my
> code for identifying Sanger read pairs and the .f .r suffices
> to enhance the class fastqJoiner (look like it only does
> Illumina /1 and /2 right now which I think is too narrow).
I am not familiar with the nomenclature for Sanger mate pairs /
paired-read, but that's a good point.
Florent
11 years, 6 months
rpy2 integration
by henry@mpi-cbg.de
I have a local installation of galaxy running and I'm trying to run a
custom python script "sequence_logo.py" from galaxy. We have many other
custom scripts installed and running perfectly well, but this is the first
to import rpy2 modules.
If galaxy runs the code it gives the following error:
Traceback (most recent call last):
File "/home/galaxy/galaxy_dist/tools/MPItools/sequence_logo.py", line 3, in
import rpy2.robjects as robjects
File
"/usr/local/lib/python2.6/dist-packages/rpy2-2.1.9_20101216-py2.6-linux-x86_64.egg/rpy2/robjects/__init__.py",
line 14, in
import rpy2.rinterface as rinterface
File
"/usr/local/lib/python2.6/dist-packages/rpy2-2.1.9_20101216-py2.6-linux-x86_64.egg/rpy2/rinterface/__init__.py",
line 79, in
from rpy2.rinterface.rinterface import *
ImportError: libR.so: cannot open shared object file: No such file or
directory
However if I run the script from the command line on our galaxy machine
the import statements work and the code runs.
Originally when I installed rpy2 via easy_install the script ran neither
from the command line nor galaxy. I read from Rpy help that this was
because RPy could not find libR.so and I followed the following fix:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH\:R_HOME/bin
As a result I can now run the script from the command line but still not
from galaxy. The export is now in my .bashrc too.
Does anyone have any ideas why the rpy2 import statements work at the
command line and python console, but not from within Galaxy. Any help
would be greatly apprecriated.
The import line that fails is as follows:
import rpy2.robjects as robjects
Thanks,
Ian Henry
11 years, 6 months
Re: [galaxy-dev] [galaxy-user] Setting up a local Galaxy instance
by Kelly Vincent
Michael,
(I'm CCing galaxy-dev because this has to do with a local
installation, not our public server.)
There is a wiki page discussing how to install several of the NGS
tools, including Bowtie: https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup
. Note that this page is slightly out-of-date regarding working with
the loc files. We have introduced data tables since the page was
written and it hasn't yet been updated. There is a basic page on data
tables: https://bitbucket.org/galaxy/galaxy-central/wiki/DataTables.
TopHat and Cufflinks are not detailed on that page, but the process
should be very similar to Bowtie.
Let us know if you have further questions.
Regards,
Kelly
On Dec 16, 2010, at 4:12 PM, Weiner, Michael wrote:
> I have been asked by a research fellow to setup a local instance of
> Galaxy. In addition, he would like the latest versions of packages
> like
> bowtie, cufflinks, tophat, etc to be installed and available through
> that instance. Being a newbie to Galaxy, and just a systems
> administrator responsible for building this environment, I am unsure
> how
> exactly to go about doing this.
>
> I have an instance running, and worked my way through the fastx-
> toolkit
> installation/update but I cannot seem to find a similar how-to for the
> other packages. Could someone point me in the right direction please?
>
> Thank you in advance
> Michael Weiner
> UNIX Systems Administrator
> Lerner Research Institute
> Cleveland Clinic
>
> ===================================
>
> P Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S.News & World Report (2009).
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
>
>
> Confidentiality Note: This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law. If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If
> you have received this communication in error, please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy. Thank you.
>
>
> _______________________________________________
> galaxy-user mailing list
> galaxy-user(a)lists.bx.psu.edu
> http://lists.bx.psu.edu/listinfo/galaxy-user
11 years, 6 months
boolean type param (for an input tickbox) does not work correctly
by Marina Gourtovaia
Hello
The boolean widget in the following tool config always works as if not
ticked while the select drop-down box in the same situation (teh
commented out xml) works ok. What's wrong with my tickbox?
<tool id="bam_to_fastq" name="BAM-to-FASTQ" version="1.0.0">
<requirements>
<requirement type="package">picard</requirement>
</requirements>
<description>converts BAM format to FASTQ format</description>
<command>
java -jar ${GALAXY_DATA_INDEX_DIR}/shared/jars/SamToFastq.jar
VALIDATION_STRINGENCY=SILENT
QUIET=true
INPUT=$bam_in
FASTQ=$fastq1_out
#if $sPaired == "paired":
SECOND_END_FASTQ=$fastq2_out
#end if
</command>
<inputs>
<param name="sPaired" type="boolean" truevalue="paired"
falsevalue="single" checked="yes" label="Reads are paired" />
<!--<param name="sPaired" type="select" label="Is this library
mate-paired?">
<option value="single">Single-end</option>
<option value="paired">Paired-end</option>
</param>-->
<param name="bam_in" type="data" format="bam" label="BAM File to Convert" />
</inputs>
<outputs>
<data name="fastq1_out" format="fastqsanger" />
<data name="fastq2_out" format="fastqsanger" >
<filter>sPaired == "paired"</filter>
</data>
</outputs>
<tests>
</tests>
<help>
</help>
</tool>
Regards
Marina Gourtovaia
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
11 years, 6 months
fastqx-toolkit related problem in Galaxy?
by Marina Gourtovaia
Hello
I am using teh galaxy instance on http://main.g2.bx.psu.edu/root
For an input fastq file starting with
@IL14_1008:2:1:800:71/1
AAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAA
+
>>>>>>>>>>>>>>>>>>>><>>>>->><<>><>><<
I get the following error when running FASTQ to FASTA on this input:
'An error occurred running this job: fastq_to_fasta: Invalid quality
score value (char '-' ord 45 quality value -19) on line 4'
When I try to use the 'Clip adapter sequences' with this input file I
get the following error against the 'Library to clip' widget:
'History does not include a dataset of the required format / build'
When I run the command-line version of the fastqx-toolkit tools on a
Linux 64 bit machine, I have similar problems:
mg8@sf-2-1-01:~/gal_data$ fastq_to_fasta -v -n -i 10000.fastq -o my.fa
fastq_to_fasta: Invalid quality score value (char '-' ord 45 quality
value -19) on line 4
and
mg8@sf-2-1-01:~/gal_data$ fastx_clipper -i 10000.fastq -o clipped.fastq
-a ACACTCTTTCCCTACACGACGCTCTTCCGATCT
fastx_clipper: Invalid quality score value (char '-' ord 45 quality
value -19) on line 4
Therefore, the problem seems to be with the fastqx-toolkit tools. My
file is in the Sanger fastq format. Galaxy does allow me to move it to
the Illumina format without problems, but that does not help with
clipping the adaptors functionality
Marina
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
11 years, 6 months
How can I require users to log in to Galaxy?
by Peter
Hi all,
We're planning to use Galaxy on two local servers. One will be
restricted to those on the institute network, and here the default
Galaxy behaviour is fine (anyone can use it without logging in).
However, the second server will be accessible from the internet,
but we only want authorised users to be able to use Galaxy on it.
I don't see anything about configuring this on the security page:
https://bitbucket.org/galaxy/galaxy-central/wiki/SecurityFeatures
Is there some documentation I've overlooked?
Thanks,
Peter
11 years, 6 months
Fwd: [galaxy-user] Read shuffler and code contributions
by Peter
Forwarding to the Galaxy Dev mailing list, which is probably
a better place to discuss changes to the Galaxy FASTQ code.
Peter
---------- Forwarded message ----------
From: Peter <peter(a)maubp.freeserve.co.uk>
Date: Thu, Dec 16, 2010 at 8:50 AM
Subject: Re: [galaxy-user] Read shuffler and code contributions
To: Florent Angly <florent.angly(a)gmail.com>
Cc: galaxy-user(a)bx.psu.edu
> Hi Peter,
>
>> Are you asking for a tool to interleave to FASTQ or FASTA files with
>> matching entries (with matching names in the same order) into one
>> file which alternates forward then reverse read?
>
> Yes, indeed, this is what I am proposing.
>
>> Would you prefer it with or without error checking?
>
> Error checking is best.
I'd agree.
>> I think the scripts in velvet are fast but will fail horribly with
>> bad input... note there is a simple Biopython script to do this
>> included with velvet already (simple version with no error checking,
>> I have written a more robust version too - it looks like I haven't
>> sent it to Daniel to include in velvet though).
>
> I rolled my own FASTQ paired read interlacer and deinterlacer today, using
> the Galaxy Python modules in lib/galaxy_utils/. I must say these modules
> made it quite convenient and efficient to implement error-checking in the
> (de)interlacing. You can find the scripts here if you're interested:
> http://bitbucket.org/fangly/galaxy-central
> I'll make the XML wrappers tomorrow and test them. Hopefully after this is
> done, my changes can be pulled into the official Galaxy repository.
For the deinterlacer, I previously offered to write something like
that for Galaxy and was told to submit it to the Tool Shed initially
(although it may be merged into the official repository at some
point). See "Divide FASTQ file into paired and unpaired reads"
on http://community.g2.bx.psu.edu/ for my tool.
I also note you've changed the return behaviour of the Galaxy
FASTQ library method get_paired_identifier - that API change
could break other parts of Galaxy or 3rd party tools.
Looking at that Galaxy lib, perhaps I can offer some of my
code for identifying Sanger read pairs and the .f .r suffices
to enhance the class fastqJoiner (look like it only does
Illumina /1 and /2 right now which I think is too narrow).
Peter
11 years, 6 months
BLAST+ enhancements, was: blastxml to tabular bug fix
by Peter Cock
On Wed, Nov 24, 2010 at 11:02 PM, Bossers, Alex wrote:
> Peter,
>
> a nice extra feature welcomed by myself would be to allow the
> optional inclusion of the Hit_defline in the output table. In many
> workflows we would need to blast, get the id from the table, use
> id to get human readible name and insert/use it.... which is silly
> of course since that data is available in the xml anyway.
>
> I don't know python and about hg changesets but I modified
> your python and xml file to incorporate this (see attachement).
> By default its normal blast tabular output but optionally it can
> include the defline.
> The hit_defline needed to be split (I hope I did it in a python
> way) to eliminate multiple discriptions separated by >gi (nt
> and nr) or plain semicolons for swissprot.... maybe there
> are more but not sure...
>
> Have a look and test and maybe it will find the way in some
> form into your suite. Anyway its very useful in this way to us.
>
> cheers
> Alex
Hi Alex,
I'm glad to see the BLAST+ wrappers being used already,
and to get positive feedback.
I had a quick look at your modifications - I think it could
be made more beautiful, but it looks like it would work
fine. I understand the aim behind your suggested change,
but I have another solution in mind.
I was already planning to write another tool for splitting a
column in a tabular file - e.g. splitting on the pipe character
could be very useful to extract the GI number from a typical
NCBI identifier string. Such a tool could also be used on the
BLAST output to do what you are asking for (splitting the hit
IDs), or to grab a particular word from formatted text (by
spitting on spaces). I'm surprised this isn't in Galaxy already
to be honest - maybe it is and I haven't found it yet ;)
I'd also like to explain that I deliberately kept the provided
XML to tabular functionality simple to start with - all it tried
to do is recreate the default tabular output, but even that
turned out to be non-trivial. I have several ideas for
extension which I will try to outline here.
The BLAST+ suite actually lets you ask for certain other
predefined columns in the tabular output. I am wondering
about offering a "full" tabular output option in the BLAST+
wrappers - this seems simpler than making the user pick
and choose which columns they want. e.g. for blastp:
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
Note that calculating and recording of the above will
add computation cost and IO load - so keeping the
default std set of columns as the default in the Galaxy
wrapper makes sense to me.
Potentially the BLAST XML output can be converted
into this full tabular output too - I expect so but it may
not be so easy.
Another avenue by which to extend the BLAST+ suite
is to teach Galaxy about the BLAST ASN.1 output
format, and wrap the new blast_formatter application
for turning ASN.1 into another BLAST output format.
Regards,
Peter
11 years, 6 months