December 2010 - galaxy-dev - lists.galaxyproject.org

Galaxy Roadshow | San Diego, CA | Jan 2011
by Anton Nekrutenko 17 Dec '10

17 Dec '10

Dear Galaxy Users and Developers: Dan Blankenberg (dan(a)bx.psu.edu) from the Galaxy Team will be in San Diego between 14 and 20 of January. If anyone in SD area runs local Galaxy installs and would like a bit face-to-face time with Dan (he is one of the senior developers on the project) e-mail him directly. Thanks! anton Anton Nekrutenko http://nekrut.bx.psu.edu http://usegalaxy.org

1 0

Re: [galaxy-dev] [galaxy-user] Read shuffler and code contributions
by Florent Angly 17 Dec '10

17 Dec '10

On 16/12/10 18:50, Peter wrote: >> Hi Peter, >> >>> Are you asking for a tool to interleave to FASTQ or FASTA files with >>> matching entries (with matching names in the same order) into one >>> file which alternates forward then reverse read? >> Yes, indeed, this is what I am proposing. >> >>> Would you prefer it with or without error checking? >> Error checking is best. > I'd agree. > >>> I think the scripts in velvet are fast but will fail horribly with >>> bad input... note there is a simple Biopython script to do this >>> included with velvet already (simple version with no error checking, >>> I have written a more robust version too - it looks like I haven't >>> sent it to Daniel to include in velvet though). >> I rolled my own FASTQ paired read interlacer and deinterlacer today, using >> the Galaxy Python modules in lib/galaxy_utils/. I must say these modules >> made it quite convenient and efficient to implement error-checking in the >> (de)interlacing. You can find the scripts here if you're interested: >> http://bitbucket.org/fangly/galaxy-central >> I'll make the XML wrappers tomorrow and test them. Hopefully after this is >> done, my changes can be pulled into the official Galaxy repository. > For the deinterlacer, I previously offered to write something like > that for Galaxy and was told to submit it to the Tool Shed initially > (although it may be merged into the official repository at some > point). See "Divide FASTQ file into paired and unpaired reads" > on http://community.g2.bx.psu.edu/ for my tool. > I also note you've changed the return behaviour of the Galaxy > FASTQ library method get_paired_identifier - that API change > could break other parts of Galaxy or 3rd party tools. Yes, you're right about breaking the API. I realized that and reverted my change. I am now running the Galaxy tests to make sure everything is alright. > Looking at that Galaxy lib, perhaps I can offer some of my > code for identifying Sanger read pairs and the .f .r suffices > to enhance the class fastqJoiner (look like it only does > Illumina /1 and /2 right now which I think is too narrow). I am not familiar with the nomenclature for Sanger mate pairs / paired-read, but that's a good point. Florent

1 0

rpy2 integration
by henry＠mpi-cbg.de 16 Dec '10

16 Dec '10

I have a local installation of galaxy running and I'm trying to run a custom python script "sequence_logo.py" from galaxy. We have many other custom scripts installed and running perfectly well, but this is the first to import rpy2 modules. If galaxy runs the code it gives the following error: Traceback (most recent call last): File "/home/galaxy/galaxy_dist/tools/MPItools/sequence_logo.py", line 3, in import rpy2.robjects as robjects File "/usr/local/lib/python2.6/dist-packages/rpy2-2.1.9_20101216-py2.6-linux-x86_64.egg/rpy2/robjects/__init__.py", line 14, in import rpy2.rinterface as rinterface File "/usr/local/lib/python2.6/dist-packages/rpy2-2.1.9_20101216-py2.6-linux-x86_64.egg/rpy2/rinterface/__init__.py", line 79, in from rpy2.rinterface.rinterface import * ImportError: libR.so: cannot open shared object file: No such file or directory However if I run the script from the command line on our galaxy machine the import statements work and the code runs. Originally when I installed rpy2 via easy_install the script ran neither from the command line nor galaxy. I read from Rpy help that this was because RPy could not find libR.so and I followed the following fix: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH\:R_HOME/bin As a result I can now run the script from the command line but still not from galaxy. The export is now in my .bashrc too. Does anyone have any ideas why the rpy2 import statements work at the command line and python console, but not from within Galaxy. Any help would be greatly apprecriated. The import line that fails is as follows: import rpy2.robjects as robjects Thanks, Ian Henry

2 2

Re: [galaxy-dev] [galaxy-user] Setting up a local Galaxy instance
by Kelly Vincent 16 Dec '10

16 Dec '10

Michael, (I'm CCing galaxy-dev because this has to do with a local installation, not our public server.) There is a wiki page discussing how to install several of the NGS tools, including Bowtie: https://bitbucket.org/galaxy/galaxy-central/wiki/NGSLocalSetup . Note that this page is slightly out-of-date regarding working with the loc files. We have introduced data tables since the page was written and it hasn't yet been updated. There is a basic page on data tables: https://bitbucket.org/galaxy/galaxy-central/wiki/DataTables. TopHat and Cufflinks are not detailed on that page, but the process should be very similar to Bowtie. Let us know if you have further questions. Regards, Kelly On Dec 16, 2010, at 4:12 PM, Weiner, Michael wrote: > I have been asked by a research fellow to setup a local instance of > Galaxy. In addition, he would like the latest versions of packages > like > bowtie, cufflinks, tophat, etc to be installed and available through > that instance. Being a newbie to Galaxy, and just a systems > administrator responsible for building this environment, I am unsure > how > exactly to go about doing this. > > I have an instance running, and worked my way through the fastx- > toolkit > installation/update but I cannot seem to find a similar how-to for the > other packages. Could someone point me in the right direction please? > > Thank you in advance > Michael Weiner > UNIX Systems Administrator > Lerner Research Institute > Cleveland Clinic > > =================================== > > P Please consider the environment before printing this e-mail > > Cleveland Clinic is ranked one of the top hospitals > in America by U.S.News & World Report (2009). > Visit us online at http://www.clevelandclinic.org for > a complete listing of our services, staff and > locations. > > > Confidentiality Note: This message is intended for use > only by the individual or entity to which it is addressed > and may contain information that is privileged, > confidential, and exempt from disclosure under applicable > law. If the reader of this message is not the intended > recipient or the employee or agent responsible for > delivering the message to the intended recipient, you are > hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. If > you have received this communication in error, please > contact the sender immediately and destroy the material in > its entirety, whether electronic or hard copy. Thank you. > > > _______________________________________________ > galaxy-user mailing list > galaxy-user(a)lists.bx.psu.edu > http://lists.bx.psu.edu/listinfo/galaxy-user

2 1

boolean type param (for an input tickbox) does not work correctly
by Marina Gourtovaia 16 Dec '10

16 Dec '10

Hello The boolean widget in the following tool config always works as if not ticked while the select drop-down box in the same situation (teh commented out xml) works ok. What's wrong with my tickbox? <tool id="bam_to_fastq" name="BAM-to-FASTQ" version="1.0.0"> <requirements> <requirement type="package">picard</requirement> </requirements> <description>converts BAM format to FASTQ format</description> <command> java -jar ${GALAXY_DATA_INDEX_DIR}/shared/jars/SamToFastq.jar VALIDATION_STRINGENCY=SILENT QUIET=true INPUT=$bam_in FASTQ=$fastq1_out #if $sPaired == "paired": SECOND_END_FASTQ=$fastq2_out #end if </command> <inputs> <param name="sPaired" type="boolean" truevalue="paired" falsevalue="single" checked="yes" label="Reads are paired" />  <param name="bam_in" type="data" format="bam" label="BAM File to Convert" /> </inputs> <outputs> <data name="fastq1_out" format="fastqsanger" /> <data name="fastq2_out" format="fastqsanger" > <filter>sPaired == "paired"</filter> </data> </outputs> <tests> </tests> <help> </help> </tool> Regards Marina Gourtovaia -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

3 4

Confusion logging in (via "Workflow" or via "User")
by Peter 16 Dec '10

16 Dec '10

Hi, I'm having some trouble with our local Galaxy install with logging in / logging out. As part of this I noticed an odd inconsistency which is also present on the public Galaxy instance. Via the top ribbon "Workflow" link, I get told "You must be logged in to use Galaxy workflows." and the URL goes here - as the whole page, not the middle frame: http://main.g2.bx.psu.edu/user/login?webapp=galaxy Via the top ribbon "User", "Login" I get sent to the following URL as a new window: http://main.g2.bx.psu.edu/user/login Why is there this difference? Thanks, Peter

2 1

fastqx-toolkit related problem in Galaxy?
by Marina Gourtovaia 16 Dec '10

16 Dec '10

Hello I am using teh galaxy instance on http://main.g2.bx.psu.edu/root For an input fastq file starting with @IL14_1008:2:1:800:71/1 AAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAA + >>>>>>>>>>>>>>>>>>>><>>>>->><<>><>><< I get the following error when running FASTQ to FASTA on this input: 'An error occurred running this job: fastq_to_fasta: Invalid quality score value (char '-' ord 45 quality value -19) on line 4' When I try to use the 'Clip adapter sequences' with this input file I get the following error against the 'Library to clip' widget: 'History does not include a dataset of the required format / build' When I run the command-line version of the fastqx-toolkit tools on a Linux 64 bit machine, I have similar problems: mg8@sf-2-1-01:~/gal_data$ fastq_to_fasta -v -n -i 10000.fastq -o my.fa fastq_to_fasta: Invalid quality score value (char '-' ord 45 quality value -19) on line 4 and mg8@sf-2-1-01:~/gal_data$ fastx_clipper -i 10000.fastq -o clipped.fastq -a ACACTCTTTCCCTACACGACGCTCTTCCGATCT fastx_clipper: Invalid quality score value (char '-' ord 45 quality value -19) on line 4 Therefore, the problem seems to be with the fastqx-toolkit tools. My file is in the Sanger fastq format. Galaxy does allow me to move it to the Illumina format without problems, but that does not help with clipping the adaptors functionality Marina -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

2 2

How can I require users to log in to Galaxy?
by Peter 16 Dec '10

16 Dec '10

Hi all, We're planning to use Galaxy on two local servers. One will be restricted to those on the institute network, and here the default Galaxy behaviour is fine (anyone can use it without logging in). However, the second server will be accessible from the internet, but we only want authorised users to be able to use Galaxy on it. I don't see anything about configuring this on the security page: https://bitbucket.org/galaxy/galaxy-central/wiki/SecurityFeatures Is there some documentation I've overlooked? Thanks, Peter

2 2

Fwd: [galaxy-user] Read shuffler and code contributions
by Peter 16 Dec '10

16 Dec '10

Forwarding to the Galaxy Dev mailing list, which is probably a better place to discuss changes to the Galaxy FASTQ code. Peter ---------- Forwarded message ---------- From: Peter <peter(a)maubp.freeserve.co.uk> Date: Thu, Dec 16, 2010 at 8:50 AM Subject: Re: [galaxy-user] Read shuffler and code contributions To: Florent Angly <florent.angly(a)gmail.com> Cc: galaxy-user(a)bx.psu.edu > Hi Peter, > >> Are you asking for a tool to interleave to FASTQ or FASTA files with >> matching entries (with matching names in the same order) into one >> file which alternates forward then reverse read? > > Yes, indeed, this is what I am proposing. > >> Would you prefer it with or without error checking? > > Error checking is best. I'd agree. >> I think the scripts in velvet are fast but will fail horribly with >> bad input... note there is a simple Biopython script to do this >> included with velvet already (simple version with no error checking, >> I have written a more robust version too - it looks like I haven't >> sent it to Daniel to include in velvet though). > > I rolled my own FASTQ paired read interlacer and deinterlacer today, using > the Galaxy Python modules in lib/galaxy_utils/. I must say these modules > made it quite convenient and efficient to implement error-checking in the > (de)interlacing. You can find the scripts here if you're interested: > http://bitbucket.org/fangly/galaxy-central > I'll make the XML wrappers tomorrow and test them. Hopefully after this is > done, my changes can be pulled into the official Galaxy repository. For the deinterlacer, I previously offered to write something like that for Galaxy and was told to submit it to the Tool Shed initially (although it may be merged into the official repository at some point). See "Divide FASTQ file into paired and unpaired reads" on http://community.g2.bx.psu.edu/ for my tool. I also note you've changed the return behaviour of the Galaxy FASTQ library method get_paired_identifier - that API change could break other parts of Galaxy or 3rd party tools. Looking at that Galaxy lib, perhaps I can offer some of my code for identifying Sanger read pairs and the .f .r suffices to enhance the class fastqJoiner (look like it only does Illumina /1 and /2 right now which I think is too narrow). Peter

1 0

BLAST+ enhancements, was: blastxml to tabular bug fix
by Peter Cock 15 Dec '10

15 Dec '10

On Wed, Nov 24, 2010 at 11:02 PM, Bossers, Alex wrote: > Peter, > > a nice extra feature welcomed by myself would be to allow the > optional inclusion of the Hit_defline in the output table. In many > workflows we would need to blast, get the id from the table, use > id to get human readible name and insert/use it.... which is silly > of course since that data is available in the xml anyway. > > I don't know python and about hg changesets but I modified > your python and xml file to incorporate this (see attachement). > By default its normal blast tabular output but optionally it can > include the defline. > The hit_defline needed to be split (I hope I did it in a python > way) to eliminate multiple discriptions separated by >gi (nt > and nr) or plain semicolons for swissprot.... maybe there > are more but not sure... > > Have a look and test and maybe it will find the way in some > form into your suite. Anyway its very useful in this way to us. > > cheers > Alex Hi Alex, I'm glad to see the BLAST+ wrappers being used already, and to get positive feedback. I had a quick look at your modifications - I think it could be made more beautiful, but it looks like it would work fine. I understand the aim behind your suggested change, but I have another solution in mind. I was already planning to write another tool for splitting a column in a tabular file - e.g. splitting on the pipe character could be very useful to extract the GI number from a typical NCBI identifier string. Such a tool could also be used on the BLAST output to do what you are asking for (splitting the hit IDs), or to grab a particular word from formatted text (by spitting on spaces). I'm surprised this isn't in Galaxy already to be honest - maybe it is and I haven't found it yet ;) I'd also like to explain that I deliberately kept the provided XML to tabular functionality simple to start with - all it tried to do is recreate the default tabular output, but even that turned out to be non-trivial. I have several ideas for extension which I will try to outline here. The BLAST+ suite actually lets you ask for certain other predefined columns in the tabular output. I am wondering about offering a "full" tabular output option in the BLAST+ wrappers - this seems simpler than making the user pick and choose which columns they want. e.g. for blastp: The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame Note that calculating and recording of the above will add computation cost and IO load - so keeping the default std set of columns as the default in the Galaxy wrapper makes sense to me. Potentially the BLAST XML output can be converted into this full tabular output too - I expect so but it may not be so easy. Another avenue by which to extend the BLAST+ suite is to teach Galaxy about the BLAST ASN.1 output format, and wrap the new blast_formatter application for turning ASN.1 into another BLAST output format. Regards, Peter

2 5