Dear All Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa and unpaired.fa files as of now. The sequences have been trimmed before that. Now when I assemble these reads using 'ssake -f paired -g unpaired ...', it takes hell lot of time. Perhaps, I am running out of memory in analyzing the sequence reads. I could use galaxy platform, but would like to stick with ssake. Few questions: What if I concatenate these two files, would I be able to peruse this for blasting against my reference? At this point, how do I know whether or not paired or single-end reads are better? How do I know the two chromosomal sequences? Help appreciated for stupid questions :) Thank you in advance Prash Prashanth Suravajhala, PhD. Homepage: http://www.bioinformatics.org/wiki/Prash Linkedin: http://dk.linkedin.com/in/prashbio <http://dk.linkedin.com/in/prashbio> “What counts in life is not the mere fact that we have lived. It is what difference we have made to the lives of others that will determine the significance of the life we lead.” — Nelson Mandela On 15 December 2013 18:00, <galaxy-user-request@lists.bx.psu.edu> wrote:
Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: fastqc and blast? trinity? (Peter Cock)
----------------------------------------------------------------------
Message: 1 Date: Sat, 14 Dec 2013 21:18:29 +0000 From: Peter Cock <p.j.a.cock@googlemail.com> To: Jorge Braun <braun_bio@hotmail.com> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] fastqc and blast? trinity? Message-ID: <CAKVJ-_4BKUgtb37EYF_FsAAj= YC+eT5ZuRvFgZk0pUySKdmsxg@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1
On Sat, Dec 14, 2013 at 8:52 AM, Jorge Braun <braun_bio@hotmail.com> wrote:
Hello, of course, Jennifer is right for the first question . For the second question about blast ... I wonder if running after blast in galaxy I can remove sequences that can contaminate the data. It's possible?
The BLAST suite is not available on the public Galaxy server at http://usegalaxy.org but is available from the Galaxy Tool Shed if you have a local Galaxy instance:
http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
One way to filter your FASTA file based on BLAST hits would be to use the tabular output from BLAST with this sequence filtering tool:
http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
e.g. If you want to remove transcripts which seem to be mitochondria, you could BLAST against a mitochondrial database, and take only the sequence with no hits.
Regards,
Peter
------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
End of galaxy-user Digest, Vol 90, Issue 13 *******************************************
Hi Prash, You have reach the galaxy-user@bx.psu.edu mailing list that supports the public Galaxy instance at http://usegalaxy.org. Sometimes we can help with broader questions, but for general bioinformatics help I would search, then ask, the communities at a web sites such as biostars.org and seqanswers.com. The original tool author and any web sites they support are also good resources. That said, to give some short help for your questions (but follow up with the above): 1 - most any short read dataset can be run with blast - so I am not sure what you are asking. when you ask at the other sites, add more details about your goal. 2 - running a tool such as FastQC can give you an idea about sequence quality (if that is what you mean by "better"). some tools require paired end data, so that could make it automatically better. If you are wondering which set is contributing in a "better" way to the assembly, then asking other users of the tool, ideally working with a similar genome, how they determine this would be a good place to start. 3 - to annotate assembly results with chromosome assignment - how to do this depends on what other data is available for your genome (genomic or transcripts/genes). Or what related genomes may be available (comparative). The basic idea would be to compare against known to make assignments. There is a repository for this tool in the Galaxy Tool Shed, for use to local or cloud instances, but it sounds like you already saw that. http://usegalaxy.org/toolshed. If you had technical problems with that tool, the tool author could be contacted. Although if the tool fails on the line command, then there is likely a bigger issue as you suspect (memory or otherwise), and the wrapper would be unlikely to change that. But, you could also move to a cloud instance with more resource. http://usegalaxy.org/cloud Good luck! Jen Galaxy team On 12/16/13 2:14 AM, Prash wrote:
Dear All Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa and unpaired.fa files as of now. The sequences have been trimmed before that. Now when I assemble these reads using 'ssake -f paired -g unpaired ...', it takes hell lot of time. Perhaps, I am running out of memory in analyzing the sequence reads. I could use galaxy platform, but would like to stick with ssake. Few questions: What if I concatenate these two files, would I be able to peruse this for blasting against my reference? At this point, how do I know whether or not paired or single-end reads are better? How do I know the two chromosomal sequences? Help appreciated for stupid questions :) Thank you in advance Prash Prashanth Suravajhala, PhD. Homepage: http://www.bioinformatics.org/wiki/Prash <http://www.bioinformatics.org/wiki/Prash> Linkedin: http://dk.linkedin.com/in/prashbio <http://dk.linkedin.com/in/prashbio>
"What counts in life is not the mere fact that we have lived. It is what difference we have made to the lives of others that will determine the significance of the life we lead." --- Nelson Mandela
On 15 December 2013 18:00, <galaxy-user-request@lists.bx.psu.edu <mailto:galaxy-user-request@lists.bx.psu.edu>> wrote:
Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu <mailto:galaxy-user@lists.bx.psu.edu>
To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu <mailto:galaxy-user-request@lists.bx.psu.edu>
You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu <mailto:galaxy-user-owner@lists.bx.psu.edu>
When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: fastqc and blast? trinity? (Peter Cock)
----------------------------------------------------------------------
Message: 1 Date: Sat, 14 Dec 2013 21:18:29 +0000 From: Peter Cock <p.j.a.cock@googlemail.com <mailto:p.j.a.cock@googlemail.com>> To: Jorge Braun <braun_bio@hotmail.com <mailto:braun_bio@hotmail.com>> Cc: "galaxy-user@lists.bx.psu.edu <mailto:galaxy-user@lists.bx.psu.edu>" <galaxy-user@lists.bx.psu.edu <mailto:galaxy-user@lists.bx.psu.edu>> Subject: Re: [galaxy-user] fastqc and blast? trinity? Message-ID:
<CAKVJ-_4BKUgtb37EYF_FsAAj=YC+eT5ZuRvFgZk0pUySKdmsxg@mail.gmail.com <mailto:YC%2BeT5ZuRvFgZk0pUySKdmsxg@mail.gmail.com>> Content-Type: text/plain; charset=ISO-8859-1
On Sat, Dec 14, 2013 at 8:52 AM, Jorge Braun <braun_bio@hotmail.com <mailto:braun_bio@hotmail.com>> wrote: > > Hello, of course, Jennifer is right for the first question . For > the second question about blast ... I wonder if running after > blast in galaxy I can remove sequences that can contaminate > the data. It's possible? >
The BLAST suite is not available on the public Galaxy server at http://usegalaxy.org but is available from the Galaxy Tool Shed if you have a local Galaxy instance:
http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
One way to filter your FASTA file based on BLAST hits would be to use the tabular output from BLAST with this sequence filtering tool:
http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
e.g. If you want to remove transcripts which seem to be mitochondria, you could BLAST against a mitochondrial database, and take only the sequence with no hits.
Regards,
Peter
------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu <mailto:galaxy-user@lists.bx.psu.edu> http://lists.bx.psu.edu/listinfo/galaxy-user
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
End of galaxy-user Digest, Vol 90, Issue 13 *******************************************
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson http://galaxyproject.org
Thank you Jennifer. That was a big help :) Regards Prash Prashanth Suravajhala, PhD. Homepage: http://www.bioinformatics.org/wiki/Prash Linkedin: http://dk.linkedin.com/in/prashbio <http://dk.linkedin.com/in/prashbio> “What counts in life is not the mere fact that we have lived. It is what difference we have made to the lives of others that will determine the significance of the life we lead.” — Nelson Mandela On 16 December 2013 18:19, Jennifer Jackson <jen@bx.psu.edu> wrote:
Hi Prash,
You have reach the galaxy-user@bx.psu.edu mailing list that supports the public Galaxy instance at http://usegalaxy.org. Sometimes we can help with broader questions, but for general bioinformatics help I would search, then ask, the communities at a web sites such as biostars.org and seqanswers.com. The original tool author and any web sites they support are also good resources.
That said, to give some short help for your questions (but follow up with the above): 1 - most any short read dataset can be run with blast - so I am not sure what you are asking. when you ask at the other sites, add more details about your goal. 2 - running a tool such as FastQC can give you an idea about sequence quality (if that is what you mean by "better"). some tools require paired end data, so that could make it automatically better. If you are wondering which set is contributing in a "better" way to the assembly, then asking other users of the tool, ideally working with a similar genome, how they determine this would be a good place to start. 3 - to annotate assembly results with chromosome assignment - how to do this depends on what other data is available for your genome (genomic or transcripts/genes). Or what related genomes may be available (comparative). The basic idea would be to compare against known to make assignments.
There is a repository for this tool in the Galaxy Tool Shed, for use to local or cloud instances, but it sounds like you already saw that. http://usegalaxy.org/toolshed. If you had technical problems with that tool, the tool author could be contacted. Although if the tool fails on the line command, then there is likely a bigger issue as you suspect (memory or otherwise), and the wrapper would be unlikely to change that. But, you could also move to a cloud instance with more resource. http://usegalaxy.org/cloud
Good luck!
Jen Galaxy team
On 12/16/13 2:14 AM, Prash wrote:
Dear All
Greetings! I am analysing a genome of ca. 3.4Mb where I have paired.fa and unpaired.fa files as of now. The sequences have been trimmed before that. Now when I assemble these reads using 'ssake -f paired -g unpaired ...', it takes hell lot of time. Perhaps, I am running out of memory in analyzing the sequence reads. I could use galaxy platform, but would like to stick with ssake.
Few questions: What if I concatenate these two files, would I be able to peruse this for blasting against my reference? At this point, how do I know whether or not paired or single-end reads are better? How do I know the two chromosomal sequences?
Help appreciated for stupid questions :)
Thank you in advance Prash
Prashanth Suravajhala, PhD. Homepage: http://www.bioinformatics.org/wiki/Prash Linkedin: http://dk.linkedin.com/in/prashbio <http://dk.linkedin.com/in/prashbio>
“What counts in life is not the mere fact that we have lived. It is what difference we have made to the lives of others that will determine the significance of the life we lead.” — Nelson Mandela
On 15 December 2013 18:00, <galaxy-user-request@lists.bx.psu.edu> wrote:
Send galaxy-user mailing list submissions to galaxy-user@lists.bx.psu.edu
To subscribe or unsubscribe via the World Wide Web, visit http://lists.bx.psu.edu/listinfo/galaxy-user or, via email, send a message with subject or body 'help' to galaxy-user-request@lists.bx.psu.edu
You can reach the person managing the list at galaxy-user-owner@lists.bx.psu.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of galaxy-user digest..."
HEY! This is important! If you reply to a thread in a digest, please 1. Change the subject of your response from "Galaxy-user Digest Vol ..." to the original subject for the thread. 2. Strip out everything else in the digest that is not part of the thread you are responding to.
Why? 1. This will keep the subject meaningful. People will have some idea from the subject line if they should read it or not. 2. Not doing this greatly increases the number of emails that match search queries, but that aren't actually informative.
Today's Topics:
1. Re: fastqc and blast? trinity? (Peter Cock)
----------------------------------------------------------------------
Message: 1 Date: Sat, 14 Dec 2013 21:18:29 +0000 From: Peter Cock <p.j.a.cock@googlemail.com> To: Jorge Braun <braun_bio@hotmail.com> Cc: "galaxy-user@lists.bx.psu.edu" <galaxy-user@lists.bx.psu.edu> Subject: Re: [galaxy-user] fastqc and blast? trinity? Message-ID: <CAKVJ-_4BKUgtb37EYF_FsAAj= YC+eT5ZuRvFgZk0pUySKdmsxg@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1
On Sat, Dec 14, 2013 at 8:52 AM, Jorge Braun <braun_bio@hotmail.com> wrote:
Hello, of course, Jennifer is right for the first question . For the second question about blast ... I wonder if running after blast in galaxy I can remove sequences that can contaminate the data. It's possible?
The BLAST suite is not available on the public Galaxy server at http://usegalaxy.org but is available from the Galaxy Tool Shed if you have a local Galaxy instance:
http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
One way to filter your FASTA file based on BLAST hits would be to use the tabular output from BLAST with this sequence filtering tool:
http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
e.g. If you want to remove transcripts which seem to be mitochondria, you could BLAST against a mitochondrial database, and take only the sequence with no hits.
Regards,
Peter
------------------------------
_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
End of galaxy-user Digest, Vol 90, Issue 13 *******************************************
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/
-- Jennifer Hillman-Jacksonhttp://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Prash