November 2012 - galaxy-user - lists.galaxyproject.org

Help to identify variants with clinical/phenotype associations
by Luis Santomé 15 Nov '12

15 Nov '12

Hi all, I have a dataset with potential pathological variants and I'd like to combine them to a dataset with known clinical association variants to identify those responsible for the phenotype. I'll thank a lot any suggestion. -- *J. Luis Santomé Collazo*

2 1

Repeated Error with Tophat
by Karen Sears 15 Nov '12

15 Nov '12

Hello- I am trying to use Galaxy to investigate gene expression differences among datasets. The first time I used Tophat, everything worked fine and it returned a nice set of assembled transcripts. However, I have since tried to use Tophat on additional datasets (in the same format as the first), and have repeatedly received the following error message, "An error occurred running this job:Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error." Is this something that I am doing wrong, or a problem with the system? Thanks!

2 1

Map with Bowtie problem
by andrewto＠wp.pl 15 Nov '12

15 Nov '12

2 1

Galaxy November 14, 2012 Distribution & News Brief
by Jennifer Jackson 14 Nov '12

14 Nov '12

*Galaxy November 14, 2012 Distribution & News Brief <http://wiki.galaxyproject.org/News/2012_11_14_DistributionNewsBrief>* * Complete News Brief <http://wiki.galaxyproject.org/DevNewsBriefs/2012_11_14> * *Highlights:* * *NGS: Picard (beta)* have moved from the *Galaxy distribution <https://bitbucket.org/galaxy/galaxy-dist>* to the*Galaxy Main Tool Shed* <http://toolshed.g2.bx.psu.edu/>. * The *Galaxy Project* is now using /Sphinx <http://sphinx-doc.org/> Python/ to document the *galaxy-central <http://galaxy-central.readthedocs.org>* and *galaxy-dist* <http://galaxy-dist.readthedocs.org> code base. * The *Intergalactic Utilities Commission* <http://wiki.g2.bx.psu.edu/ReviewingToolShedRepositories> will soon begin reviewing repositories in the *Galaxy Main Tool Shed <http://toolshed.g2.bx.psu.edu/>* (better repos, better tools) * *Tool Shed <http://toolshed.g2.bx.psu.edu/>* *"best practice"* advice: a single tool or a suite of tools per repository? <http://wiki.galaxyproject.org/AToolOrASuitePerRepository> * *Tophat <http://tophat.cbcb.umd.edu/>*, *Tophat2 <http://tophat.cbcb.umd.edu/manual.html>*, and *Cuffdiff <http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff>* *updated* to accept gene annotations in *GFF3 <http://wiki.galaxyproject.org/Learn/Datatypes#GFF3>* format. * Multiple *enhancements* to the *API* <http://wiki.galaxyproject.org/DevNewsBriefs/2012_11_14#API>targeting user and history actions. * Plus *updates to* *CloudLaunch <http://usegalaxy.org/cloud>*, new *Security Fixes* <http://wiki.galaxyproject.org/DevNewsBriefs/2012_11_14#Security_Fixes>, and several usability enhancements for *Datasets*, *Datatypes*, *Tools*, and *Tool Shed* functions. / http://getgalaxy.org/////// /http://bitbucket.org/galaxy/galaxy-dist/ /http://galaxy-dist.readthedocs.org/ new: $ hg clone http://www.bx.psu.edu/hg/galaxy galaxy-dist upgrade: $ hg pull -u -r 5dcbbdfe1087 *Thanks for using Galaxy!* Jennifer Jackson <http://wiki.galaxyproject.org/JenniferJackson> & the Galaxy Team <http://wiki.galaxyproject.org/Galaxy%20Team> /http://galaxyproject.org/

1 0

Downgrading toolshed tools
by Paul-Michael Agapow 14 Nov '12

14 Nov '12

Just today, we had a case where we wanted to rollback to a previous version of a tool that we'd installed from a toolshed. (No big reason except that our latest "fix" actually broke the tool.) Is there a way to do this? I could have sworn there was but couldn't find it in the admin interface. -- Paul Agapow (pma(a)agapow.net)

2 1

stderr on Samtools sort
by Masaki MS 14 Nov '12

14 Nov '12

Dear all, I'm trying to set up samtools sort command on local Galaxy server. But I can not success to get sorted files from its result. First time, I got error message from Galaxy as.. error at.. [bam_sort_core] merging from 13 files... To resolve this problem, I use discard_stderr_wrapper.sh for modifying my samtools_sort.xml. (http://wiki.galaxyproject.org/Future/Job%20Failure%20When%20stderr) in samtools_sort.xml############################################ <command interpreter="sh"> discard_stderr_wrapper.sh samtools sort '$input' '$output' </command> ################################################################## Then I can success to finish the samtools sort job. But I can not get sorted file from Galaxy interface. Sorted file exists in the results folder as "dataset_97.dat.bam". But it could not link on Galaxy. Do you have any suggestion for this problem? thanks Masaki MS.

2 2

Warning message
by Rolando Mantilla 13 Nov '12

13 Nov '12

I'm having issues with the FASTQ_Groomer. What I have done it first I downloaded an SRA file created by an Ion torrent sequencer from the NCBI site. Then used the fastq-dump app from the NCBI site to covert the .sra file to .fastq file. When I uploaded the data into galaxy it recognized it as a fastq(as it should) but when I try to run the FASTQ groomer I get the message and warnings below. I also have already downloaded the the blast_datatypes tool from the tool_shed. I truly don't know what the issue is, any help An error occurred running this job: *Groomed 12376 sanger reads into sanger reads. *WARNING:galaxy.datatypes.registry:Error loading datatype with extension 'blastxml': 'module' object has no attribute 'BlastXml' WARNING:galaxy.datatypes.registry:Overriding conflicting datatype with extension 'blastxml', using datatype from /mnt/galaxyData/tmp/tmpdP_cZ7.

2 1

cuffdiff
by Vevis, Christis 13 Nov '12

13 Nov '12

Hi, I got confused while trying to perform Cuffdiff for my RNA sequencing analysis. So I have five different samples which were sequenced. I used tophat to create the bam files and cufflink to create the assembled trancripts. Then I uded Cuffmerge to merge them in one file and then I wanted to do Cuffdiff with that merged file. What shall I choose for the ''SAM or BAM file of aligned RNA-Seq'' option? I have the 5 options from the 5 tophat actions on my 5 samples. All I want in the end is an excel table showing the number of hits from each sample (and not necessary a comparison of them). Regards Kristis Vevis, PhD Student Cell Biology UCL Institute of Ophthalmology 11-43 Bath Street London EC1V 9EL, UK 020 7608 4067

2 1

Trouble Installing Galaxy on Cluster
by greg 12 Nov '12

12 Nov '12

So here is what I've installed so far. (/usr/local/galaxy) is a directory that all nodes in the cluster can see. And the web server machine can also see it.) $ls -l /usr/local/galaxy drwxrwxr-- 20 galaxy scicomp 2206 Nov 8 15:53 galaxy-dist -rwxrwxr-x 1 galaxy scicomp 1882 Nov 8 15:34 galaxy.fedora-init drwxr-xr-x 5 galaxy scicomp 67 Nov 8 10:08 galaxy_python -rw-r--r-- 1 galaxy scicomp 80 Nov 8 15:32 job_environment_setup_file drwxrwxr-- 2 galaxy scicomp 28 Nov 8 15:43 logs Here are are the changes I made to galaxy-dist/universe_wsgi.ini http://pastie.org/5351347 Here is what I put in my job_environment_setup_file: export TEMP=/scratch/galaxy source /usr/local/galaxy/galaxy_python/bin/activate The galaxy_python directory contains my virtual env based off of Python 2.7. Finally here are the contents of my galaxy.fedora-init file: http://pastie.org/5351396 (Note: I created a sym link in /etc/init.d/ to /usr/local/galaxy/galaxy.fedora-init Then ran chkconfig --add galaxy.fedora-init So I can use sudo /sbin/service galaxy.fedora-init start/stop) Here are the results from running: $sudo /sbin/service galaxy.fedora-init start http://pastie.org/5351308 So my first question is why does it appear to be using Python 2.6? Is it not using my virtual env? Am I using virtual env incorrectly? Thanks, Greg

2 4

primer contamination, miranalyzer
by Rosie Griffiths 12 Nov '12

12 Nov '12

Hi Galaxy, Ive got 2 problems for you; 1) Ive got microRNA Illumina NGS data that I want to analyse, I put it through fastQC on galaxy and it showed that 71% of the reads in one overrepresented sequence; Sequence Count Percentage Possible Source GAATTCCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAG 16896622 71.06413061961005 RNA PCR Primer, Index 1 (100% over 29bp) CCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCTTGTAATCTC 525614 2.2106372475809497 RNA PCR Primer, Index 12 (100% over 44bp) CCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACC 416041 1.7497930632000402 RNA PCR Primer, Index 2 (100% over 34bp) What would be the best way to remove this contamination? Also is is still ok to use that data despite such high contamination? Ive currently been trying to remove the sequence by using the clip adaptor tool, using the following options; library to clip 2: FASTQ Groomer on data H1 Minimum sequence length (after clipping, sequences shorter than this length will be discarded) 15 Enter custom clipping sequence GAATTCCACCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAG enter non-zero value to keep the adapter sequence and x bases that follow it 0 Discard sequences with unknown (N) bases No Output options Output only non-clipped sequences (i.e. sequences which did not contained the adapter) Clipped reads - discarded. Input: 23776583 reads. Output: 3091831 reads. discarded 1287140 too-short reads. discarded 18984774 adapter-only reads. discarded 412838 clipp but then I'm only left with 13% of the reads. 2) After I've filtered and clipped the adapter I want to analyse the frequency of each miR. I've been using miranalyzer to do this, I use the following workflow data=>groomer=>clip adapter=>filter FastQ (min quality 20)=>fastq to fasta=>collapse the collapse file is like this; >1-17285268 GAATTCCACCACGTTCCCGTGG >2-522760 CCACCACGTTCCCGTGG >3-101198 TATTGCACTTGTCCCGGCCTGT >4-88745 Then upload the collapse file to miranalyzer however the total reads in the miranalyzer output is the same as the total number of sequences in the collapse file, it doesn't seem to recognise the count number. miranalyzer says the following; 2.1 Input formats miRanalyzer requires a single file containing the unique reads and their counts. The application accepts two different input formats: 2.1.1 A tab or space separated file as in the following example (read-count format): GAGGTAGTAGGTTGTA 49862 ACCCGTAGAACCGACC 15490 ... ... GGAGCATCTCTCGGTC 13762 2.1.2 A multifasta file: >ID1 49862 GAGGTAGTAGGTTGTA >ID2 15490 ACCCGTAGAACCGACC .... >ID 13762 GGAGCATCTCTCGGTC The description field must hold the read count. If not set, it is supposed to be 1. The file must have extension ’fa’, ’fasta’ or ’mfa’. Do you know how I could change my format so it can recognise the read count e.g. maybe change the '-' to a space? 3) Ive recently got the local install of galaxy but encounter the following error when I try to add a file to my data libary Error attempting to display contents of library (New data library): (OperationalError) no such column: True u'SELECT dataset_permissions.id AS dataset_permissions_id, dataset_permissions.create_time AS dataset_permissions_create_time, dataset_permissions.update_time AS dataset_permissions_update_time, dataset_permissions.action AS dataset_permissions_action, dataset_permissions.dataset_id AS dataset_permissions_dataset_id, dataset_permissions.role_id AS dataset_permissions_role_id XnFROM dataset_permissions XnWHERE True AND dataset_permissions.action = ?' ['access']. Ive got the latest version of galaxy and am using chrome and mountain lion os x changeset: 7986:12fcd068b12e tag: tip user: Daniel Blankenberg <dan(a)bx.psu.edu> date: Thu Oct 18 11:22:12 2012 -0400 summary: Do not hide failed datasets with HideDatasetAction post job action. Any help will be greatly appreciated Thank you Rosie Griffiths

2 1