March 2011 - galaxy-user - lists.galaxyproject.org

Question regarding quality filtering of 454 amplicons
by Jackie Lighten 01 Jul '11

01 Jul '11

Hi, I have a question for you guys regarding quality filtering. I have a data set of double MID tagged 454 amplicons, from which I wish to select high quality sequences above Q20. The 454 quality filtering system seems to work differently from that given for the Illumina sequencing i.e. 454 filtering takes high quality segments, while Illumina (FASTQ) can select high quality full reads based on certain parameters. OK, so I know that the total length of my amplicon, including primers and barcodes is around 260bp. If I then set the 454 quality filtering tool to extract contiguous high quality sequence of >260, it gives me back around 45% of my raw data as hitting this criterion i.e. All 260bp are above Q20. I don¹t necessarily need this high stringency as most bases may not be informative. But if I convert my 454 data to FASTQ format and then run the Illumina filtering system which also allows me to set the number of bases allowed to deviate from the Q20 criteria, I get back over 90% of my data (allowing 10bp to deviate from Q20). I then need to go ahead and convert back to 454 format. Can you tell me if this is OK? Will I loose /confuse information somewhere along these conversions? It seems that if I do this, my barcodes are removed, as amplicons do not sort properly when I parse them through my barcode filtering program. Does anyone know of a program to filter 454 data based on average sequence quality score, which doesn¹t involve Linux and the Roche off instrument program (I have no experience in Linux! ) Thanks! -- Jack Lighten, Ph.D. Candidate, Bentzen Lab, Room 6078, Department of Biology, Dalhousie University, Halifax, NS, B3H 4J1 Canada Office:(902) 494-1398 Email: Jackie.Lighten(a)Dal.Ca Profile: www.marinebiodiversity.ca/CHONe/Members/lightenj/profile/bio

2 1

Mapping to only 3 genes / targeted resequencing / SOLiD4 / short reads
by Jochen Seggewiß 25 Apr '11

25 Apr '11

Hi! Following situation: 10 barcoded "samples". Each sample consists of a mix of the sequences 3 independent genes (á 2 alleles). I would like to map the SOLiD4 reads only to the sequences of those 3 genes, patient by patient. First, the 10 barcoded samples have to be separated from each other. Then, the short reads have to be mapped to the sequences of the 3 genes, which are available in FASTA-format (single) or multi-FASTA-format (all sequences in one file). Is this possible using the available GALAXY tools? How? Thank you in advance. Jose

2 1

get wig file after tophat
by Ying Zhang 21 Apr '11

21 Apr '11

Hi: I am using tophat in galaxy to analyze my paired-end RNA-seq data and find out that after the tophat analysis, we can not get the wig file from it anymore which is used to be able to. Do you have any idea of how to still be able to get the wig file after tophat analysis? Thanks a lot! Best Ying Zhang, M.D., Ph.D. Postdoctoral Associate Department of Genetics, Yale University School of Medicine 300 Cedar Street,S320 New Haven, CT 06519 Tel: (203)737-2616 Fax: (203)737-2286

9 13

2011 Galaxy Community Conference, May 25-26, Lunteren, The Netherlands
by Dave Clements 11 Apr '11

11 Apr '11

Hello all, We are pleased to announce the *2011 Galaxy Community Conference*, being held *May 25-26 in Lunteren, The Netherlands*. The meeting will feature two full days of presentations and discussion on extending Galaxy to use new tools and data sources, deploying Galaxy at your organization, and best practices for using Galaxy to further your own and your community's research. *Link: http://galaxy.psu.edu/gcc2011/ * *Overview *This event aims to engage a broader community of developers, data producers, tool creators, and core facility and other research hub staff to become an active part of the Galaxy community. We'll cover defining resources in the Galaxy framework, increasing their visibility and making them easier to use and integrate with other resources, how to extend Galaxy to use custom data sources and custom tools, and best practices for using Galaxy in your organization. Additional topics include, but are not limited to: * Talks submitted by the Galaxy community * Integration of tools (including NGS analysis tools) and distributed job management * Deployment of Galaxy instances on local resources and on the Cloud * Management of large datasets with the Galaxy Library System * Using the Galaxy LIMS functionality at NGS sequencing facilities * Visualizing Data without leaving Galaxy * Performing reproducible research * Performing and sharing complex analyses with Workflows * An "Introduction to Galaxy" session, offered on May 24, for Galaxy newcomers. *Registration *The conference fee is €100 on or before April 24, and €120 after that. The meeting is being held at the Conference Centre De Werelt in Lunteren, The Netherlands, which is also the conference hotel. You are encouraged to register early, as space at the hotel (and at the "Intro to Galaxy" session) is limited and is likely to fill up before the conference itself does. *Link: http://galaxy.psu.edu/gcc2011/Register.html ** Abstract Submission *Abstracts are now being accepted for short oral presentations. Proposals on any topic of interest to the Galaxy community are welcome and encouraged. The abstract submission deadline is the end of February 28. *Link: http://galaxy.psu.edu/gcc2011/Abstracts.html * *Sponsors *The 2011 Galaxy Community Conference is co-sponsored by the US National Science Foundation (NSF), and the Netherlands Bioinformatics Centre (NBIC). NBIC is a collaborative institute of the bioinformatics groups in the Netherlands. Together, these groups perform cutting-edge research, develop novel tools and support platforms, create an e-science infrastructure and educate the next generations of bioinformaticians. *Links: http://www.nbic.nl/ and http://www.nsf.gov/ * We are looking forward to a great conference and hope to see you in the Netherlands! The Galaxy and NBIC Teams -- http://galaxy.psu.edu/gcc2011/ http://getgalaxy.org http://usegalaxy.org/

5 7

Regrading SNPs
by Nripesh Prasad 07 Apr '11

07 Apr '11

I wish to compare SNPs in my mouse sample (SNP file generated from Partek Genomic suite) with SNPs from UCSC browser. how do i do that on galaxy? Nripesh Prasad

4 5

Re: [galaxy-user] Adding the Hydra genome to Galaxy
by Jennifer Jackson 07 Apr '11

07 Apr '11

Hello Rob, We will add this to our to-do list for new genomes. Thanks for sending the Genbank information! Next time, if you could send requests to galaxy-user, that would be very helpful for the team. Best, Jen Galaxy team On 1/3/11 12:37 PM, Rob Steele wrote: > Hi Jennifer, > Would it be possible to get the Hydra genome assembly added to Galaxy? > It has been published and is available in GenBank under accession number > ABRM00000000. > > Cheers, > Rob > > Rob Steele, Ph.D. > Professor > D240 Medical Sciences I > Department of Biological Chemistry > School of Medicine > University of California, Irvine > Irvine, CA 92697-1700 > > phone: 949-824-7341 > e-mail: resteele(a)uci.edu > fax: 949-824-2688 > web: http://polyp.biochem.uci.edu/wiki/index.php/Main_Page > -- Jennifer Jackson http://usegalaxy.org

3 8

Re: [galaxy-user] [Genome] UCSC browser, access from Galaxy
by Hiram Clawson 05 Apr '11

05 Apr '11

Good Morning Dr. Hunt: Can you please clarify what upload function from galaxy you are trying to perform ? Perhaps the galaxy user help email list could direct you to advice on this subject ? (copied on this email) The Ensembl genome browser accepts the same types of upload tracks as does the UCSC genome browser. --Hiram ----- Original Message ----- From: "Pustulka-Hunt Elzbieta (SystemsX.ch)" <ela.hunt(a)systemsx.ch> To: genome(a)soe.ucsc.edu Sent: Thursday, March 31, 2011 5:19:55 AM Subject: [Genome] USCS browser, access from Galaxy Connection refused Couldn't connect to 127.0.0.1 8080 Couldn't open http://127.0.0.1:8080/root/display_as?id=27&display_app=ucsc&authz_method=d… Hi, I have a local installation of Galaxy. Is it at all possible to upload (easily) data from Galaxy to your browser? I have no trouble uploading an extra track to Ensembl. Could you support such access? Regards Ela ------------------------------------------------------------------------------------------------ Dr Ela Hunt SyBIT Deputy Project Manager Clausiusstr. 45, CLP D 2 ETHZ, CH-8092 Zurich +41 44 632 93 37 https://wiki.systemsx.ch/display/~ela.hunt@systemsx.ch/Dr+Ela+Hunt

2 1

Recall: Upload file size and user authorization
by Paul-Michael Agapow 05 Apr '11

05 Apr '11

Paul-Michael Agapow would like to recall the message, "[galaxy-user] Upload file size and user authorization". ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************

2 1

storage space/processor speed
by Keith E. Giles 05 Apr '11

05 Apr '11

I am trying to perform a "join" on two sets of intervals. There are ~20,000 intervals in one dataset and about 13 million on the other. This has been running for about 3 days now, and I'm pretty certain that its not going to work. Is there a way to know if there is enough memory available for a given function to run ahead of time? Also, how much storage space does each account have available? I know that one can access cloud space online, through amazon for example. However that seems to be fairly complicated and a bit out of reach for me for the short term.

2 1

error on indel analysis
by Evan Schwab 05 Apr '11

05 Apr '11

Hi, I am receiving the following error when I try to run the Indel Analysis tool on a sam file. I kept the Frequency threshold at the default 0.015. Traceback (most recent call last): File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py", line 227, in if __name__=="__main__": __main__() File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py", line 84, in __main__ add_to_mis_matches( mis_matches[ chrom ], pos, bases ) File "/galaxy/home/g2main/galaxy_main/tools/indels/indel_analysis.py", line 34, in add_to_mis_matches mis_matches[ pos + j ] = { base: 1 } MemoryError Prior to that I tried with Frequency threshold 0.0 and it ran for days on end without any results. And I've tried Frequency threshold of 0.10 and it gave and error that said "Killed" with no explaination. Is there an error with the tool? Thanks Evan -- Evan Schwab Research Associate Megason Lab Department of Systems Biology Harvard Medical School 200 Longwood Ave Boston, MA 02115 908-938-3779

2 1