![](https://secure.gravatar.com/avatar/287495ea7c94c5c6e4c9ca6f3e0ff8b7.jpg?s=120&d=mm&r=g)
Thanks so much for the prompt reply. I don't mind using last years GenBank, as long as I am getting accurate hits. I just have a couple more questions to confirm I am safe using the Galaxy pipline for this... So if I continue to work within the the 1 year old database, can I trust the output as accurate matches? Specifics about my project: I have environmental samples that were sequenced for fungal ITS. I have clustered these into OTUs, and chosen a representative sequence for each. If I retrieve hits for this representative sequence file in my sample, can I trust the hits as being the correct hits as of last year? I'm just worried about what that one person said who thought there was some column arrangement problems, because I'm finding that I'm getting hits from different phylum for the same sequence using default parameters in megablast... Can I also assume, then, that I should NOT identify my representative sequence file to updated GI numbers using another pipeline, and then bring the file of GI numbers to Galaxy to fetch taxonomic assignments? (which I would do because of the nice neat columns for each taxonomic level Galaxy puts out) Sarah On Mon, Apr 23, 2012 at 2:26 PM, Jennifer Jackson <jen@bx.psu.edu> wrote:
Hi Sarah,
Peter defined the columns (thanks) but I can provide some information about the GenBank identifiers. The megablast database on the public server are roughly a year old and there have been updates at NCBI since that time. As I understand it, this manifests as occasional mismatches between hits at Galaxy vs Genbank when comparing certain IDs linked to updated records.
We are working to update these three databases, but there are some complicating factors around this processing specifically related to the public instance and the metagenomics workflow that have yet to be resolved. Please know that getting updated is a priority for us and we apologize for the inconvenience.
To use the most current databases, a local or (better) cloud instance with either the regular or BLAST+ version of the tool and a database your choice is the recommendation. Instructions to get started are at: getgalaxy.org getgalaxy.org/cloud
Hopefully this explains the data mismatch. This question has come up before, but I think you are correct in that the final conclusion never was posted back to the galaxy-user list (for different reasons). So, thank you for asking so we that could send out a clear reply for everyone using the tool.
Best,
Jen Galaxy team
On 4/23/12 9:56 AM, Sarah Hicks wrote:
I am having trouble finding information on the MegaBLAST output columns. What is each column for? I can't seem to figure this out by comparing info in the columns to NCBI directly because the GI#'s don't match with the correct entry on NCBI. I've seen that others have posted about that problem, so I'm also waiting on details on that question, but for now, I'd just like to know what to make of the output... best, Sarah ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org