You would have multiple names for each sequence and that would be quite hard
to display. I am sure someone thought through this. Since the sequence is
the same, you can use the sequence to look back in the fastq file for read
name. Although I am not sure how that would help you?
On 20 September 2011 13:43, Daniel Sher <dsher(a)sci.haifa.ac.il> wrote:
> Thanks Kevin. However, the collapse sequences replaces the original name
> of the sequences with a numerical code, and I need to keep the original
> names. Any other suggestions?
> On 20/09/2011 05:32, Kevin Lam wrote:
> Hi Daniel
> for 2) you may use the tools under NGS QC and manipulation
> FASTQ to FASTA<http://main.g2.bx.psu.edu/tool_runner?tool_id=cshl_fastq_to_fasta>converter
> followed by
> On 19 September 2011 09:54, Kevin Lam <kevin(a)aitbiotech.com> wrote:
>> For 1) you may refer to Simulated Dataset of Solexa - SEQanswers<http://seqanswers.com/forums/showthread.php?t=806>
>> Has anyone replied you for 2) ?
>> On 18 September 2011 21:12, Daniel Sher <dsher(a)sci.haifa.ac.il> wrote:
>>> I have two questions - I apologize if they are trivial..
>>> 1) I want to simulate the amount of Illumina sequencing needed to
>>> sequence and assemble a known genome. Is there a way to randomly pick
>>> sequences of a specific length from a genome (either one available online or
>>> one I upload)? Something like "pick 100bp randomly (either strand), move
>>> 400-500bp forward and pick another 100bp?"
>>> 2) Is there a way to remove redundant sequences from a FASTA file without
>>> losing the original sequence names (as happens with "collapse")?
>>> Daniel Sher, PhD
>>> Department of Marine Biology
>>> Leon H. Charney School of Marine Sciences
>>> University of Haifa, Mt. Carmel 31905, Haifa, Israel
>>> Office +972-4-8240731
>>> Lab +972-4-8288961
>>> email: dsher(a)sci.haifa.ac.il
>>> The Galaxy User list should be used for the discussion of
>>> Galaxy analysis and other features on the public server
>>> at usegalaxy.org. Please keep all replies on the list by
>>> using "reply all" in your mail client. For discussion of
>>> local Galaxy instances and the Galaxy source code, please
>>> use the Galaxy Development list:
>>> To manage your subscriptions to this and other Galaxy lists,
>>> please use the interface at:
> Daniel Sher, PhD
> Department of Marine Biology
> Leon H. Charney School of Marine Sciences
> University of Haifa, Mt. Carmel 31905, Haifa, Israel
> Office +972-4-8240731
> Lab +972-4-8288961
> email: dsher(a)sci.haifa.ac.il
So one of my colleagues has a script he wants to turn into a Galaxy
tool. The twist is that script:
1. Looks for files with a fixed name (e.g. "params.txt")
2. Accepts other file names as commandline arguments, but the
actual names of those files has arguments embedded in it (e.g.
"nuc_100iter_b.fasta" for nucleotide data in fasta format to be run
against model b for 100 iterations.)
I know, awkward and clumsy. But hardly unique for many historical
bioinformatic tools. Anyway, the challenge for me is to pick the easiest
path to port this script to a tool. And it seems to be fairly awkward
under the Galaxy model as I understand it. Possibilities:
1. Rewrite the script argument parsing and invocation. Obviously,
there will be resistance to this and with some justification ("I thought
you said this could wrap any command line program ...")
2. Write a script that calls the original script after moving and
renaming files according to desired arguments. Any problems with a
two-script/executable tool like this? How do I specify the interpreter
for both parts of the script?
3. Use config files for the fixed name files. But configuration
files seem to be given a random not fixed name, correct?
4. For the file names with semantic content, extract that from the
dataset metadata. Of course, then it still has to be passed to the
original script somehow.
5. Use <code>
Ideas, suggestions? Obviously a rewrite is the "best" solution, but in
this case we might be looking for the quickest ...
Paul Agapow (paul-michael.agapow(a)hpa.org.uk)
Bioinformatics, Centre for Infections, Health Protection Agency
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
I have 3 illumina paired end reads of exome capture of the sample. I want
to assemble these reads to genome using tools available in Galaxy (BWA etc).
My concern is the amount of data that I can analyzed and when these reads
should be merged. The total size of data is +30Gb.
It is best to direct such requests to galaxy-user(a)bx.psu.edu mailing list, which I am doing. Adding this genome should be possible, but will take us some time.
On Sep 12, 2011, at 1:23 PM, Diana Cox-Foster wrote:
> Hi, Anton--- I am currently doing a NGS project and want to compare the sequencing data to the Apis mellifera genome. Unfortunately, the genomes on Galaxy and the UCSC website are quite outdated. I am planning to do another sequencing project that would also benefit from having the newest version as well.