I have looked through the metagenome tools and looked at the tutorials, and was wondering how one could pull out reads that contain specific protein domains or COGS. Blastx is not possible (?) but megablast could get GI codes, and these could potentially be used to retrieve CDD information. I just can't see the way to do this on galaxy. Any suggestions would be greatly appreciated. Mike DS Sent from my iPhone4
Hi Mike, To use BLASTX directly, a wrapper is available in the Tool Shed for use with a local or cloud instance of Galaxy. Please see: http://toolshed.g2.bx.psu.edu http://getgalaxy.org http://usegalaxy.org Another option is to map against the target genome, then compare coordinates of those hits with the coordinates of known annotation that represents CCDS or alternate protein tracks of interest. UCSC, Biomart, and other sources under "Get Data" can be used to import BED/Interval data directly into Galaxy. Compare coordinates using tools in the group "Operate on Genomic Intervals". There are other tools that compare coordinates (Bed Tools, etc.) but these are a good place to start. A several of our tutorials have examples of how to compare coordinates, including "Galaxy 101" and protocols 1 & 4 of "Using Galaxy". The tool's themselves also have help directly on the tool forms. https://main.g2.bx.psu.edu/u/aun1/p/galaxy101 https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 If you used an Ensembl annotation track, then tools in the group "Genome Diversity -> KEGG and GO" might be of interest to you. The UCSC "Known Genes" track also has some extra tables (http://genome.ucsc.edu) that you may find interesting to pull in and consider, if you decided to use that as the annotation track to compare against. Most (if not all) of this data can linked together either through coordinates or identifiers, but it is not available for all genomes, you will have to check at the data sources. For predictive domain analysis using conserved genomic data, the tools in "Fetch Alignments" function with MAF inputs. A bed file of hits can be used to query out data from multiple species, obtain sequence, etc. for downstream analysis. Protocol 5 in the "Using Galaxy" paper above has a walk-through of how this can be done. If the public Main server does not have the MAF data for your genome, and it is small, it is possible to use one from the history. If it is larger, using a local or cloud Galaxy would be recommended. Be sure to check the Tool Shed if there is a specific tool that you are looking for. If it is not there now, you could ask if someone has it or if it is the process of being wrapped (on the development list: galaxy-dev@bx.psu.edu). And keep checking back, more tools are added all the time. Best, Jen Galaxy team On 4/15/13 3:23 PM, Mike Dyall-Smith wrote:
I have looked through the metagenome tools and looked at the tutorials, and was wondering how one could pull out reads that contain specific protein domains or COGS. Blastx is not possible (?) but megablast could get GI codes, and these could potentially be used to retrieve CDD information. I just can't see the way to do this on galaxy. Any suggestions would be greatly appreciated.
Mike DS
Sent from my iPhone4 ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at:
-- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
Mike Dyall-Smith