Re: [galaxy-user] Metagenomic filtering
Gerald 16s is basically useless for identification to genus. Since I started sequencing 16s in 1992, I have come to realize that without sequencing the full 1540 bases, it is generally misleading, and even than, it is not accurate enough to nail genus on more than 1/2 the cases. However, what is your feeling on ITS and gyrase, They seem to be far more discriminating but those databases have been decommissioned sometime ago. The desirable thing would be that Galaxy or NCBI add a "filter conserved genes" [ ie any hit with a second choice greater than 3% distance]. Something such as that. If you (or others) are aware of such a thing, I'd love the here about it. Sincerely Scott Scott Tighe Senior Core Laboratory Research Staff Advanced Genome Technologies Core University of Vermont Vermont Cancer Center 149 Beaumont ave Health Science Research Facility 303/305 Burlington Vermont 05405 802-656-2557 On 9/18/2013 2:05 PM, Gerald Bothe wrote:
Removing model organisms may not be enough, you may have the same problem with, say, a Clostridium cluster IV anaerobe. I think a solution would be to first: compare to a collection of genes, e.g. get all the hits for 16S rRNA genes, RNA polymerases (conserved to quite conserved), and to e.g. ion channels and cell surface proteins. then: once a read or contig is identified as belonging to a gene family, gene, or protein domain, check within that group for species identities. Then you compare apples to apples in terms of gene conservation level Does anybody know a program that would do this efficiently from metagenomic data? Gerald Bothe
*From:* Scott W. Tighe <Scott.Tighe@uvm.edu> *To:* galaxy-user@lists.bx.psu.edu *Sent:* Wednesday, September 18, 2013 10:03 AM *Subject:* Re: [galaxy-user] Metagenomic filtering
Dear Galaxy
When running HiSeq shot metagenomics sample from the environment against megablast and taxonomic representation, How do I filter/remove all the 16s and other conserved sequences.
The problem if blasting a single organism that has a fraction of conserved sequence, the results will align with E.coli 10,000 times more then the possible target organism. This data would be wrong and misleading. For example a 100mg sample that was negative for e coli using MUG test, give thousands of hits with galaxy.
1) Is there a "filter conserved sequences" setting?
2) Is there a "remove model organisms" setting?
Scott Tighe --Core Laboratory Research Staff Advanced Genome Technologies Core Deep Sequencing (MPS) Facility Vermont Cancer Center 149 Beaumont Ave University of Vermont HSRF 303 Burlington Vermont USA 05045 802-656-AGTC 802-999-6666 (cell)
Quoting Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>>:
> Hello Elwood, > > Are you still having connection issues today? Or is this resolved? > > Best, > > Jen > Galaxy team > > On 9/13/13 11:36 AM, Elwood Linney wrote: >> A message sent earlier this week by me indicated that I could not connect to Galaxy via Fetch to download data. >> >> A reply indicated a glitch was fixed. >> >> I then could connect with Fetch and I tried to transfer 4 x 16gb files and the connection disconnected about 4 times. >> >> Now, once again, I cannot connect with Galaxy online to transfer data. >> >> Is this a problem that can be solved-either at my end or at Galaxy? >> >> Elwood Linney >> >> >> ___________________________________________________________ >> The Galaxy User list should be used for the discussion of >> Galaxy analysis and other features on the public server >> at usegalaxy.org <http://usegalaxy.org/>. Please keep all replies on the list by >> using "reply all" in your mail client. For discussion of >> local Galaxy instances and the Galaxy source code, please >> use the Galaxy Development list: >> >> http://lists.bx.psu.edu/listinfo/galaxy-dev >> >> To manage your subscriptions to this and other Galaxy lists, >> please use the interface at: >> >> http://lists.bx.psu.edu/ >> >> To search Galaxy mailing lists use the unified search at: >> >> http://galaxyproject.org/search/mailinglists/ > > --Jennifer Hillman-Jackson > http://galaxyproject.org
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
Scott, agreed, 16S is not accurate if you only have partial sequences. I would make the Galaxy button(s) more specific, saying "remove all rRNA and tRNA genes from bacteria/archaea/eukaryotes". That would leave the user with protein coding regions and intergenic regions. Ideally, one would then add an option "compare to gene collection" which would then give options for a collection of gyrase etc. As the gyrase collection is no longer available, one would have to rebuild this from the sequenced genomes - that's far from perfect in terms of coverage, but at least the quality of the published genomes is generally good (rRNA gene sequences are often not very good, another problem with the rRNA approach). Currently, I don't know of a such a program. Gerald From: Scott Tighe <scott.tighe@uvm.edu>
To: Gerald Bothe <g_bothe@yahoo.com>; galaxy-user@lists.bx.psu.edu Sent: Thursday, September 19, 2013 10:45 AM Subject: Re: [galaxy-user] Metagenomic filtering
Gerald
16s is basically useless for identification to genus. Since I started sequencing 16s in 1992, I have come to realize that without sequencing the full 1540 bases, it is generally misleading, and even than, it is not accurate enough to nail genus on more than 1/2 the cases. However, what is your feeling on ITS and gyrase, They seem to be far more discriminating but those databases have been decommissioned sometime ago.
The desirable thing would be that Galaxy or NCBI add a "filter conserved genes" [ ie any hit with a second choice greater than 3% distance]. Something such as that.
If you (or others) are aware of such a thing, I'd love the here about it.
Sincerely Scott
Scott Tighe Senior Core Laboratory Research Staff Advanced Genome Technologies Core University of Vermont Vermont Cancer Center 149 Beaumont ave Health Science Research Facility 303/305 Burlington Vermont 05405 802-656-2557On 9/18/2013 2:05 PM, Gerald Bothe wrote:
Removing model organisms may not be enough, you may have the same problem with, say, a Clostridium cluster IV anaerobe. I think a solution would be to
first: compare to a collection of genes, e.g. get all the hits for 16S rRNA genes, RNA polymerases (conserved to quite conserved), and to e.g. ion channels and cell surface proteins. then: once a read or contig is identified as belonging to a gene family, gene, or protein domain, check within that group for species identities. Then you compare apples to apples in terms of gene conservation level Does anybody know a program that would do this efficiently from metagenomic data?
Gerald Bothe
From: Scott W. Tighe mailto:Scott.Tighe@uvm.edu
To: galaxy-user@lists.bx.psu.edu Sent: Wednesday, September 18, 2013 10:03 AM Subject: Re: [galaxy-user] Metagenomic filtering
Dear Galaxy
When running HiSeq shot metagenomics sample from the environment against megablast and taxonomic representation, How do I filter/remove all the 16s and other conserved sequences.
The problem if blasting a single organism that has a fraction of conserved sequence, the results will align with E.coli 10,000 times more then the possible target organism. This data would be wrong and misleading. For example a 100mg sample that was negative for e coli using MUG test, give thousands of hits with galaxy.
1) Is there a "filter conserved sequences" setting?
2) Is there a "remove model organisms" setting?
Scott Tighe --Core Laboratory Research Staff Advanced Genome Technologies Core Deep Sequencing (MPS) Facility Vermont Cancer Center 149 Beaumont Ave University of Vermont HSRF 303 Burlington Vermont USA 05045 802-656-AGTC 802-999-6666 (cell)
Quoting Jennifer Jackson <jen@bx.psu.edu>:
Hello Elwood,
Are you still having connection issues today? Or is this resolved?
Best,
Jen Galaxy team
On 9/13/13 11:36 AM, Elwood Linney wrote:
A message sent earlier this week by me indicated that I could not connect to Galaxy via Fetch to download data.
A reply indicated a glitch was fixed.
I then could connect with Fetch and I tried to transfer 4 x 16gb files and the connection disconnected about 4 times.
Now, once again, I cannot connect with Galaxy online to transfer data.
Is this a problem that can be solved-either at my end or at Galaxy?
Elwood Linney
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
--Jennifer Hillman-Jackson http://galaxyproject.org/
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
participants (2)
-
Gerald Bothe
-
Scott Tighe