September 2011 - galaxy-user - lists.galaxyproject.org

Re: [galaxy-user] Simulating sequencing and removing redundant sequences
by Kevin Lam 20 Sep '11

20 Sep '11

Hi Daniel, You would have multiple names for each sequence and that would be quite hard to display. I am sure someone thought through this. Since the sequence is the same, you can use the sequence to look back in the fastq file for read name. Although I am not sure how that would help you? Cheers Kevin On 20 September 2011 13:43, Daniel Sher <dsher(a)sci.haifa.ac.il> wrote: > Thanks Kevin. However, the collapse sequences replaces the original name > of the sequences with a numerical code, and I need to keep the original > names. Any other suggestions? > > Thanks > > Daniel > On 20/09/2011 05:32, Kevin Lam wrote: > > Hi Daniel > for 2) you may use the tools under NGS QC and manipulation > FASTQ to FASTA<http://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastq_to_fasta>converter > > followed by > > Collapse<http://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastx_collapser>sequences > > > On 19 September 2011 09:54, Kevin Lam <kevin(a)aitbiotech.com> wrote: > >> For 1) you may refer to Simulated Dataset of Solexa - SEQanswers<http://seqanswers.com/forums/showthread.php?t=806> >> >> >> Has anyone replied you for 2) ? >> >> >> >> On 18 September 2011 21:12, Daniel Sher <dsher(a)sci.haifa.ac.il> wrote: >> >>> Hello, >>> >>> I have two questions - I apologize if they are trivial.. >>> >>> 1) I want to simulate the amount of Illumina sequencing needed to >>> sequence and assemble a known genome. Is there a way to randomly pick >>> sequences of a specific length from a genome (either one available online or >>> one I upload)? Something like "pick 100bp randomly (either strand), move >>> 400-500bp forward and pick another 100bp?" >>> >>> 2) Is there a way to remove redundant sequences from a FASTA file without >>> losing the original sequence names (as happens with "collapse")? >>> >>> Thanks >>> >>> Daniel >>> >>> >>> -- >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Daniel Sher, PhD >>> Department of Marine Biology >>> Leon H. Charney School of Marine Sciences >>> University of Haifa, Mt. Carmel 31905, Haifa, Israel >>> >>> Office +972-4-8240731 >>> Lab +972-4-8288961 >>> email: dsher(a)sci.haifa.ac.il >>> >>> >>> ___________________________________________________________ >>> The Galaxy User list should be used for the discussion of >>> Galaxy analysis and other features on the public server >>> at usegalaxy.org. Please keep all replies on the list by >>> using "reply all" in your mail client. For discussion of >>> local Galaxy instances and the Galaxy source code, please >>> use the Galaxy Development list: >>> >>> http://lists.bx.psu.edu/listinfo/galaxy-dev >>> >>> To manage your subscriptions to this and other Galaxy lists, >>> please use the interface at: >>> >>> http://lists.bx.psu.edu/ >>> >> >> > > -- > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Daniel Sher, PhD > Department of Marine Biology > Leon H. Charney School of Marine Sciences > University of Haifa, Mt. Carmel 31905, Haifa, Israel > > Office +972-4-8240731 > Lab +972-4-8288961 > email: dsher(a)sci.haifa.ac.il > >

2 1

Simulating sequencing and removing redundant sequences
by Daniel Sher 20 Sep '11

20 Sep '11

2 2

(no subject)
by raghava rao 19 Sep '11

19 Sep '11

-- Y.P.V.S.Raja Raghava Rao Research Scholar Dr.K.Sreenivasulu Lab Dept.of Animal Sciences School of Life Sciences University of Hyderabad ypvsrrr(a)gmail.com 9989733698

1 0

mailing list
by Sinnakaruppan MATHAVAN 16 Sep '11

16 Sep '11

Hi, Include me in your mailing list. Mathavan

1 0

Getting (or setting) physical file name
by Paul-Michael Agapow 15 Sep '11

15 Sep '11

So one of my colleagues has a script he wants to turn into a Galaxy tool. The twist is that script: 1. Looks for files with a fixed name (e.g. "params.txt") 2. Accepts other file names as commandline arguments, but the actual names of those files has arguments embedded in it (e.g. "nuc_100iter_b.fasta" for nucleotide data in fasta format to be run against model b for 100 iterations.) I know, awkward and clumsy. But hardly unique for many historical bioinformatic tools. Anyway, the challenge for me is to pick the easiest path to port this script to a tool. And it seems to be fairly awkward under the Galaxy model as I understand it. Possibilities: 1. Rewrite the script argument parsing and invocation. Obviously, there will be resistance to this and with some justification ("I thought you said this could wrap any command line program ...") 2. Write a script that calls the original script after moving and renaming files according to desired arguments. Any problems with a two-script/executable tool like this? How do I specify the interpreter for both parts of the script? 3. Use config files for the fixed name files. But configuration files seem to be given a random not fixed name, correct? 4. For the file names with semantic content, extract that from the dataset metadata. Of course, then it still has to be passed to the original script somehow. 5. Use <code> Ideas, suggestions? Obviously a rewrite is the "best" solution, but in this case we might be looking for the quickest ... ---- Paul Agapow (paul-michael.agapow(a)hpa.org.uk) Bioinformatics, Centre for Infections, Health Protection Agency ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk **************************************************************************

2 1

Re: [galaxy-user] galaxy test-data organization
by Jennifer Jackson 15 Sep '11

15 Sep '11

> To manage your subscriptions to this and other Galaxy lists, > please use the interface at: > > http://lists.bx.psu.edu/ > You will want to choose one or both of these: http://lists.bx.psu.edu/listinfo/galaxy-user http://lists.bx.psu.edu/listinfo/galaxy-dev -- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support

1 0

galaxy test-data organization
by Joseph Hargitai 15 Sep '11

15 Sep '11

Hi, is there by any chance a per application breakdown of the test-data structure? best, joe

2 1

cufflinks version
by Shantanu Pavgi 15 Sep '11

15 Sep '11

What version of cufflinks is available on the galaxy site? Is there any page where I can find version of all galaxy tools? Thanks, Shantanu.

2 1

Merging of illumina paired end files
by Arun Khattri 15 Sep '11

15 Sep '11

I have 3 illumina paired end reads of exome capture of the sample. I want to assemble these reads to genome using tools available in Galaxy (BWA etc). My concern is the amount of data that I can analyzed and when these reads should be merged. The total size of data is +30Gb. Thanks, Arun

2 1

Re: [galaxy-user] Upload of most recent genome data for Apis mellifera onto Galaxy and/or NCSC web sites?
by Anton Nekrutenko 15 Sep '11

15 Sep '11

Diana: It is best to direct such requests to galaxy-user(a)bx.psu.edu mailing list, which I am doing. Adding this genome should be possible, but will take us some time. Thanks, anton Anton Nekrutenko http://galaxyproject.org On Sep 12, 2011, at 1:23 PM, Diana Cox-Foster wrote: > Hi, Anton--- I am currently doing a NGS project and want to compare the sequencing data to the Apis mellifera genome. Unfortunately, the genomes on Galaxy and the UCSC website are quite outdated. I am planning to do another sequencing project that would also benefit from having the newest version as well. >

3 2