April 2012 - galaxy-user - lists.galaxyproject.org

input for GATK DepthOfCoverage
by Lilach Friedman 03 Apr '12

03 Apr '12

Hi Galaxy Team, I'm trying to use GATK DepthOfCoverage, and to give a BAM file of alined results as input. My steps were: 1. BWA alignment 2. SAM-to-BAM 3. rmdup. when I'm trying to give 3 (rmdup results) as an input to GATK DepthOfCoverage, an error appears: " Sequences are not currently available for the specified build." What did I do wrong? Many thanks! Lilach

1 1

Blast2GO local instance Re: Table with gene count reads
by Luciano Cosme 03 Apr '12

03 Apr '12

Howdy, Thanks Jen, I will try it tomorrow. I installed Blast2Go from the Toolshed in my local instance of Galaxy and when I try to run it I get the following error: Index file named 'blast2go.loc' is required by tool but not available. I logged as admin and the installation did not gave me any error. From the terminal: galaxy.util.shed_util DEBUG 2012-03-23 17:23:37,088 Installing repository 'blast2go' galaxy.util.shed_util DEBUG 2012-03-23 17:23:37,088 Cloning http://testtoolshed.g2.bx.psu.edu/repos/peterjc/blast2go destination directory: blast2go requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 6 changes to 3 files updating to branch default 3 files updated, 0 files merged, 0 files removed, 0 files unresolved galaxy.util.shed_util DEBUG 2012-03-23 17:23:42,798 Updating cloned repository to revision "7b53cc52e7ed" Anyway, I was thinking to use it because most of my differentially expressed genes are unknown. I was thinking to use Blast2GO to get them at least clustered in functional groups. I am not sure if that would be the best approach to find what might be the function of these genes. I also checked the list of public services that might have this tool, and Berkeley BOP is listed, but it seems that they no longer have the server or it was down when I checked (or the link is broken http://galaxy.berkeleybop.org/) Thank you. Luciano On Fri, Mar 23, 2012 at 8:43 AM, Jennifer Jackson <jen(a)bx.psu.edu> wrote: > Hello Luciano, > > There is no single tool do to this operation (although there has been some > discussion about including one in the Tool Shed), but the same information > can be obtained by using a combination of existing tools. > > First, start by converting both starting datasets to interval format. > > mapped reads: > - for TopHat output, "NGS: SAM Tools -> Convert SAM to interval" > features: > - for GFF file (convert to tabular if necessary), subtract "1" > from the start position's value using tool "Text Manipulation -> > Compute" > - cut columns chrom, new start, stop, strand, name, and score from > this result file using "Text Manipulation -> Cut" > - set the data type to "interval" using the 'Edit attributes form > (pencil icon) > > Next, use a tool in the group "Operate on Genomic Intervals" to compare > these intervals for overlap. The tool "Cluster" with the option "Find" is > mostly likely the one you will want to use. > > As a final step, summarize the data by feature using the tool "Join, > Subtract and Group -> Group". > > Hopefully this helps, > > Best, > > Jen > Galaxy team > > > On 3/19/12 4:36 PM, Luciano Cosme wrote: > >> Hi, >> I was wondering if there is any tool on Galaxy were I can obtain a >> table with how many reads have been mapped to a given sample and to a >> given gene (for example, use a Tophat output and use a GFF file to >> obtain the table). I am using HTSeq to get it (htseq-count). There is >> also GenomicRanges and easyRNASeq packages in bioconductor. >> Thank you. >> >> Luciano >> >> >> >> ______________________________**_____________________________ >> The Galaxy User list should be used for the discussion of >> Galaxy analysis and other features on the public server >> at usegalaxy.org. Please keep all replies on the list by >> using "reply all" in your mail client. For discussion of >> local Galaxy instances and the Galaxy source code, please >> use the Galaxy Development list: >> >> http://lists.bx.psu.edu/**listinfo/galaxy-dev<http://lists.bx.psu.edu/listinfo/galaxy-dev> >> >> To manage your subscriptions to this and other Galaxy lists, >> please use the interface at: >> >> http://lists.bx.psu.edu/ >> >

3 4

Training on NGS data analysis
by Aarti Desai 02 Apr '12

02 Apr '12

Hi Jennifer, We are working on developing NGS data analysis pipeline. Does your institution have a training program where one or two people from my team can be trained on NGS data analysis, particularly de novo genome assembly? Regards, Aarti Dr. Aarti Desai | Domain Specialist - Life Sciences Domain aarti_desai(a)persistent.co.in<mailto:aarti_desai@persistent.co.in> | Cell: +91-9673009492 | Tel: +91-20-30236348 Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

2 1

Google Summer of Code 2012 - Application Deadline on Friday 6th
by Robin Haw 02 Apr '12

02 Apr '12

The Genome Informatics group <http://gmod.org/wiki/GSoC>, is taking part in the Google Summer of Code (GSoC) <http://code.google.com/soc/> program. This year, the Genome Informatics group, will be organizing the joint efforts of Galaxy <http://galaxy.psu.edu/>, GBrowse <http://gbrowse.org/index.html>, Generic Model Organism Database (GMOD) <http://gmod.org/>, JBrowse<http://jbrowse.org/> , Reactome <http://www.reactome.org/>, Wormbase <http://www.wormbase.org/>, and PortEco <http://porteco.org/>. GSoC is a global program that funds student programmers around the world to write code for open source projects. More information about our project ideas and the GSoC program is available on the GMOD website <http://gmod.org/wiki/GSoC>. If you are a student interested in coding for our open source projects, then check out the links above and submit your project proposal as soon as possible. The application deadline is 6th April 2012. Robin Haw Genome Informatics Group Admin http://gmod.org/wiki/GSoC

1 0

April 2012 Galaxy Update
by Dave Clements 02 Apr '12

02 Apr '12

Hello all, The April 2012 Galaxy Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04>is now available. *Galaxy Update <http://wiki.g2.bx.psu.edu/GalaxyUpdates>* is a (mostly) monthly summary of what is going on in the Galaxy community. *Galaxy Updates * complements the *Galaxy Development News Briefs<http://wiki.g2.bx.psu.edu/DevNewsBriefs> * which accompany new Galaxy releases and focus on Galaxy code updates. *Highlights:* - 28 New Papers<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04#New_Papers> - Open Positions<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04#Who.27s_Hiring>at six different institutions - Upcoming Events and Deadlines<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04#Upcoming_Events_and_Deadlin…> - GCC2012 Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04#GCC2012_Update>, including - Abstract submission deadline is April 16. - Early registration is now open. - Tool Shed Contributions<http://wiki.g2.bx.psu.edu/GalaxyUpdates/2012_04#Tool_Shed_Contributions>(at least 15 new repositories) If you have anything you would like to see in the May *Galaxy Update<http://wiki.g2.bx.psu.edu/GalaxyUpdates> *, please let me know. Thanks, Dave C. -- http://galaxyproject.org/GCC2012 <http://galaxyproject.org/wiki/GCC2012> http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/

1 0

embl 66 genome for RNAseq analysis
by Karthik Srinivasan 02 Apr '12

02 Apr '12

Hi, How do I run Tophat and RNA-Seq analysis using the GRCH37- embl 66 genome? I noticed there is no input for this genome version. Can I construct a reference genome from the following embl format source sequences: ftp://ftp.ensembl.org/pub/release-66/embl/homo_sapiens/, and map it against my RNAseq data? Regards, Karthik Karthik Srinivasan | Senior Application Engineer P:HYPERLINK "tel:+912242554282"+912242554282 | M:HYPERLINK "tel:+919987014704"+919987014704 Oracle Health Sciences Global Business Unit 6'th Floor, Silver Metropolis, W.E.Highway, Goregaon(E) | 400063 Mumbai

2 1