Assembly of Paired and Unpaired sequences

16 Dec 2013

      Dear All

Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa and
unpaired.fa  files as of now.  The sequences have been trimmed before
that.  Now when I assemble these reads using 'ssake -f paired -g unpaired
...',  it takes hell lot of time.  Perhaps, I am running out of memory in
analyzing the sequence reads.  I could use galaxy platform, but would like
to stick with ssake.

Few questions:
What if I concatenate these two files, would I be able to peruse this for
blasting against my reference?
At this point, how do I know whether or not paired or single-end reads are
better?
How do I know the two chromosomal sequences?

Help appreciated for stupid questions :)

Thank you in advance
Prash

Prashanth Suravajhala, PhD.
Homepage: http://www.bioinformatics.org/wiki/Prash
Linkedin: http://dk.linkedin.com/in/prashbio
<http://dk.linkedin.com/in/prashbio>

“What counts in life is not the mere fact that we have lived. It is what
difference we have made to the lives of others that will determine the
significance of the life we lead.” — Nelson Mandela

On 15 December 2013 18:00, <galaxy-user-request@lists.bx.psu.edu> wrote:
...
Send galaxy-user mailing list submissions to
        galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.bx.psu.edu/listinfo/galaxy-user
or, via email, send a message with subject or body 'help' to
        galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at
        galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific
than "Re: Contents of galaxy-user digest..."
HEY!  This is important!  If you reply to a thread in a digest, please
1. Change the subject of your response from "Galaxy-user Digest Vol ..."
to the original subject for the thread.
2. Strip out everything else in the digest that is not part of the thread
you are responding to.
Why?
1. This will keep the subject meaningful.  People will have some idea from
the subject line if they should read it or not.
2. Not doing this greatly increases the number of emails that match search
queries, but that aren't actually informative.
Today's Topics:
1. Re: fastqc and blast? trinity? (Peter Cock)
----------------------------------------------------------------------
Message: 1
Date: Sat, 14 Dec 2013 21:18:29 +0000
From: Peter Cock <p.j.a.cock@googlemail.com>
To: Jorge Braun <braun_bio@hotmail.com>
Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu>
Subject: Re: [galaxy-user] fastqc and blast? trinity?
Message-ID:
        <CAKVJ-_4BKUgtb37EYF_FsAAj=
YC+eT5ZuRvFgZk0pUySKdmsxg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Sat, Dec 14, 2013 at 8:52 AM, Jorge Braun <braun_bio@hotmail.com>
wrote:
...
Hello, of course, Jennifer is right for the first question . For
the second question about  blast ... I wonder if running after
blast in galaxy I can remove sequences that can contaminate
the data. It's possible?
The BLAST suite is not available on the public Galaxy
server at http://usegalaxy.org but is available from the
Galaxy Tool Shed if you have a local Galaxy instance:
http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
One way to filter your FASTA file based on BLAST hits
would be to use the tabular output from BLAST with
this sequence filtering tool:
http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
e.g. If you want to remove transcripts which seem
to be mitochondria, you could BLAST against a
mitochondrial database, and take only the sequence
with no hits.
Regards,
Peter
------------------------------
_______________________________________________
galaxy-user mailing list
galaxy-user@lists.bx.psu.edu
http://lists.bx.psu.edu/listinfo/galaxy-user
To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
End of galaxy-user Digest, Vol 90, Issue 13
*******************************************

Prash

Jennifer Jackson

Prash

tags

participants (2)