November 2011 - galaxy-user - lists.galaxyproject.org

mapping tags
by Soetkin Versteyhe 29 Nov '11

29 Nov '11

Dear all, I would like to map (e.g. with Bowtie) collapsed sequences (tags) instead of individual sequence reads. Does anyone know if this is possible in Galaxy? Thank you in advance. Best regards, Soetkin Versteyhe, PhD PostDoc University of Copenhagen Faculty of Health Sciences The Novo Nordisk Foundation Center for Basic Metabolic Research Integrative Physiology Blegdamsvej 3B 2200 København N Denmark PHONE +45 35337116 soetkin.versteyhe(a)sund.ku.dk http://sund.ku.dk http://metabol.ku.dk<http://metabol.ku.dk/> [cid:image001.gif@01CCA385.44DE02C0]

2 1

Patch for better FASTQ description handling
by Florent Angly 29 Nov '11

29 Nov '11

Hi, I have found some issue with the way FASTQ read description is handled by Galaxy utilities: https://bitbucket.org/galaxy/galaxy-central/issue/665/paired-end-code-misha… Please consider pulling my patch, thanks, Florent

4 8

Postdoctoral opening in College Park
by Charles Delwiche 29 Nov '11

29 Nov '11

I have a postdoc opening in my lab that could be an excellent opportunity for members of this list. The project is extremely cool, and will incorporate elements of ecology and systems biology. The position is open now, and I would like to fill it ASAP. Charles Delwiche Genomic basis of species diversity and ecosystem functioning We seek a postdoctoral scholar to collaborate on a project funded by the U.S. National Science Foundation to study the evolutionary basis of species diversity and ecosystem functioning in freshwater green algae. The successful candidate will take intellectual lead on the genomics portion of laboratory and field experiments that will (1) determine how much genetic differentiation is required for species to stably coexist, and (2) determine how mechanisms that allow for coexistence also impact community-level processes such as primary production. The candidate will also pursue his or her own research interests within the broader context of the grant proposal. The candidate will work in Dr. Charles Delwiche's lab at the University of Maryland – College Park, and will collaborate with researchers in the labs of Drs. Bradley Cardinale (an ecologist at the University of Michigan) and Todd Oakley (a phylogeneticist at UC-Santa Barbara). The position requires a Ph.D. in the biological sciences, bioinformatics, or a related field. Experience with high-throughput sequencing, sequence analysis, algal/plant biodiversity, or RNA biology are desirable. This is a three-year position, with the initial appointment being for one year and renewals contingent on successful progress in research. The starting salary will be $ 42,000 with full benefits. The University of Maryland is located in College Park, a suburb of the Washington, D.C. Metropolitan Area, and provides a vibrant cultural and academic environment with easy access to a vast array of Federal research facilities. The position is available immediately. To apply formally, send a curriculum vitae, the names of 3 references, and a brief statement of how your research goals fit with research on algal biodiversity, systematics, and evolutionary biology to: aaalgeee(a)gmail.com -- Charles F. Delwiche Professor, Cell Biology and Molecular Genetics CBMG , 0101J Biosciences Research University of Maryland Building #413 College Park, MD 20742-4407 http://www.life.umd.edu/labs/delwiche tel: 301-405-8286 fax: 301-314-1248 There is still only you, listening to the music of the wind and lost in the stagnant cries of the miseries lost in the eyes of others laughing at you. How can someone so different be so much the same? - Liu Sola

1 0

Re: [galaxy-user] Unable to run SICER or Find Peaks
by Daniel Blankenberg 29 Nov '11

29 Nov '11

Hi AP, Please keep all replies on list, this will allow the community to assist and benefit from these correspondences. SICER requires BED input. To go from BAM to BED: 1.) Convert BAM to SAM 2.) Convert SAM to Interval (Convert SAM to interval) 3.) Convert interval to BED(6+). This can be done by implicitly (by selecting the Interval dataset, which will be marked with '(as bed)' in the SICER input box) or by clicking on the pencil icon and explicitly converting uder the section "Convert to new format". Please let us know if we can provide additional assistance. Thanks for using Galaxy, Dan On Nov 29, 2011, at 1:23 PM, Anupam Paliwal wrote: > Hi Daniel, > > Thanks for your kind attention and advice. > > I have followed the following workflow: I aligned my query sequences to > the reference genome using Bowtie; the Bowtie aligned SAM file was > subjected to filter-SAM before converting it to BAM. I have re-BAM-to-SAM > converted the BAM-file before subjecting it to pileup. > > However, now I do have the Input format file (after pileup of SAM) but am > unable to convert it to BAM format to be able to submit it ti SICER. > > > Please see if you can suggest how to convert the Input files back to BAM. > I have tried changing directly through edit-attributes, but it shows > error. > > AP > > > >> Hi AP, >> >> SICER requires BED formatted input with at least 6 columns (for strand >> information). You can convert your BAM files into SAM and then into >> interval and BED format. Once you have your input in the BED (6+) format, >> you should be able to use these tools. Please let us know if we can >> provide additional information. >> >> >> Thanks for using Galaxy, >> >> Dan >> >> >> >> On Nov 23, 2011, at 12:26 PM, Anupam Paliwal wrote: >> >>> Hi, >>> >>> I want to use SICER or Find Peaks for peak calling on GALAXY. >>> >>> I am using my aligned ChIP-seq tag .BAM files. However for both the >>> tools >>> the history is unable to pick the Bowtie-ligned SAM to BAM converted >>> files. >>> >>> On the other hand, using MACS the same files are working nicely for peak >>> calling. >>> >>> Thanks, >>> >>> AP >>> >>> ___________________________________________________________ >>> The Galaxy User list should be used for the discussion of >>> Galaxy analysis and other features on the public server >>> at usegalaxy.org. Please keep all replies on the list by >>> using "reply all" in your mail client. For discussion of >>> local Galaxy instances and the Galaxy source code, please >>> use the Galaxy Development list: >>> >>> http://lists.bx.psu.edu/listinfo/galaxy-dev >>> >>> To manage your subscriptions to this and other Galaxy lists, >>> please use the interface at: >>> >>> http://lists.bx.psu.edu/ >> >> >

1 0

Re: [galaxy-user] Names for genes in RNA-Seq analysis (Emilie Chautard)
by Emilie Chautard 29 Nov '11

29 Nov '11

Hi Olivier, Did you try to run Cuffcompare (part of Cufflinks) on your results? According to the Cufflinks manual (http://cufflinks.cbcb.umd.edu/manual.html ): >Cufflinks includes a program that you can use to help analyze the transfrags you assemble. The program cuffcompare helps you: > - Compare your assembled transcripts to a reference annotation > [...] In the Galaxy version of Cuffcompare, I think that you can provide a reference annotation file using "Use Reference Annotation:", which will be compared to your results with Cufflinks. It makes an "union" of the transcripts obtained with Cufflinks with the annotation file (both in *.gtf format). You can then obtain a transcript identifier for those already annotated. It also provides a class code for the transcripts, which can inform about a potential isoform for example. Hope this helps. Emilie -- Emilie Chautard, PhD Postdoctoral Fellow Ontario Institute for Cancer Research MaRS Centre, South Tower 101 College Street, Suite 800 Toronto, Ontario, Canada M5G 0A3 Tel: 416-673-8518 Toll-free: 1-866-678-6427 www.oicr.on.ca > Message: 7 > Date: Thu, 20 Oct 2011 15:12:45 +0200 > From: GANDRILLON OLIVIER <olivier.gandrillon(a)univ-lyon1.fr> > To: "galaxy-user(a)bx.psu.edu" <galaxy-user(a)bx.psu.edu> > Subject: [galaxy-user] Names for genes in RNA-Seq analysis > Message-ID: <CAC5EAED.8E99%olivier.gandrillon(a)univ-lyon1.fr> > Content-Type: text/plain; charset="windows-1252" > > Hello > > I am using Galaxy to analyse RNA-seq libraries made from chicken cells. > > I just groomed my sequences, passed them through TopHat and then Cufflinks. > > This worked well and in the end I get a list of genes and their respective > FPKM values. > > My only problem is that the names of the genes do not appears in the > listing, they are simply reference as "CUFF.1, CUFF.2, " etc? > > Could you please tell me how I could obtain gene names? (I went through the > FAQ and could not get the answer). > > Sincerely > > Olivier >

3 2

'Draw quality score Boxplot' error
by Caroline Proux 29 Nov '11

29 Nov '11

Hi, I have illumina ChipSeq data and I want to use the "Draw quality score Boxplot" I run the"quality format converter (ASCII numeric)". But the "Draw quality score Boxplot" do an error "An error occurred running this job:Could not find/open font when opening font "arial", .." where is my problem? thank you so much Caroline Proux

4 3

barcode splitter and clustering analyses?
by Simon Bulman 28 Nov '11

28 Nov '11

Dear Galaxy we have a 454 metagenomic dataset. We have used barcode splitter to divide the dataset into it's constituent amplicons. We have also been using a clustering application (dnaclust) in Galaxy to subdivide the dataset by similarity. My question is; are there Galaxy tools to allow the combining, sorting and counting of these two outputs? For example, can each cluster - and then each sequence within that cluster - be given an identifier.... so that one can then split the output by barcode and summarise the data along the lines of amplicon/barcode X has X number of sequences within cluster 1, X number of sequences within cluster 2, ... etc? Am I making any sense? This is the sort of problem that sounds like it is solvable in Excel and, indeed, a UK colleague of mine has been doing just this. But is there a straightforward means to do so in Galaxy? It is not obvious to me in the Filtering or Sorting tools. best wishes Simon The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.

2 1

An update to Galaxy CloudMan
by Enis Afgan 28 Nov '11

28 Nov '11

A new version of CloudMan for running Galaxy on Amazon cloud has been released today. Any new cluster will automatically use this version. Existing clusters will have a link displayed at the top of the CloudMan console offering to perform an automated update. The new version brings the following updates/features: - Added ability to specify a path where Galaxy is installed as part of user data (using galaxy_home key). This allows custom Galaxy application to be installed and picked up by CloudMan instead of the default one. This works across cluster invocations as well as for shared clusters. For a complete list of user data options see http://wiki.g2.bx.psu.edu/Admin/Cloud/UserData - Use /etc/profile instead of /etc/bash.bashrc for system wide shell logins - Support for 3.0 Kernel on Ubuntu 11.10 for SGE. Contributed by Brad Chapman. - Fix for SGE install after cloud-init has run and changed /etc/hosts - post_start_service now runs if the script exists in the cluster bucket even if no URL was provided as part of current user data - Fix recognition of existing and attached file system volumes on instance reboot

2 1

Question regarding: FASTQ Quality Trimmer
by Rahul Kanwar 28 Nov '11

28 Nov '11

Hello, I am running Galaxy locally and it has been performing flawlessly! I wanted to get more insight about this flag in the FASTQ Quality Trimmer program: Maximum number of bases to exclude from the window during aggregation Does it mean the number of 5' bases to exclude while the doing the trimming step [i.e. the sliding window starts this many bp after the read start] ? I would really appreciate if someone could shed more light on this. Thanks. regards, Rahul

2 1

Unable to run SICER or Find Peaks
by Anupam Paliwal 28 Nov '11

28 Nov '11

Hi, I want to use SICER or Find Peaks for peak calling on GALAXY. I am using my aligned ChIP-seq tag .BAM files. However for both the tools the history is unable to pick the Bowtie-ligned SAM to BAM converted files. On the other hand, using MACS the same files are working nicely for peak calling. Thanks, AP

2 1