I would like to know if mapping reads to a reference genome, through galaxy, can generate the query genome sequence?
I had mapped my reads from Illumina sequencer to a reference genome through both BWA and Bowtie on Galaxy public server platform(https://main.g2.bx.psu.edu/root ). As a result, i gained the SAM files. But, i can't find how to generate the resulted assemblied genome sequence.
Is there anyone know this? Any reply will be very appreciated.
Weiping Zhang, Doctor candidate;
School of bioengineering, Jiangnan University;
Lihu Roads 1800#, WUXI, Jiangsu;
Zip Code: 214122
This should be easy (but not for me so far). I want to do local blast searches, so I download the premade nr protein blast database from GenBank. It is split into 10 .tar.gz files.
I've decompressed them all, and now I want to put all the file parts together. Can I simply concatenate all similar files? (e.g. all 10 parts of the .phd files). The Readme mentions use of an alias file, but I did not find this at all clear. A set of step-by-step decompression and restoration instructions would be useful. I could not find any.
Thanks for any assistance, Mike DS
Sent from my iPhone4
To merge together multiple datasets this way, use the tool "Text
Manipulation -> Concatenate datasets tail-to-head". This works on two
datasets at a time, so you may need to run it a few times if you have
more than that, adding in a new file to the master merged file with each
Watch out for introducing blank lines (unintentionally) between the
files. To remove them should any be present (it doesn't harm a file if
none are there), after you have merged all the files together, use the
tool "Filter and Sort -> Select" with:
option: NOT Matching
and the expression: ^$
Once you are sure that the merged file is correct, you can permanently
delete the working files to recover disk space. "FastQC" and/or "FASTQ
Groomer" are generally both good at detecting format problems.
Good luck with your project,
On 4/27/13 8:23 PM, Yona Kim wrote:
> Dear Jennifer
> I was wondering if there is a tool in galaxy that combines several txt
> files (which I got from decompressing fastq.tgz file) and produce one
> fastq file from them.
> I was searching it in google and read your previous email to somebody
> else and you mentioned about the tool "cat" which seems to be the
> right tool for me to use to combine these txt files in order to
> produce one fastq file.. but I can't find this tool..
> any advice?
> Thank you very much and I always appreciate your help very much!!
> Yona Kim
Galaxy Support and Training
Dear Galaxy team
I am so sorry for repeatedly posting the same question, but I do need some inputs in to this.
Please let mek now the best way to use barcode splitter on Paired end Miseq data. The data is already split for the Illumina indexes using Miseq reporter, what I want to do is to split some inhouse barcodes within each of the sample. Barcodes are there in both 5' and 3' end but they are both the same.
Please let me know if the best practise is to
1. Join read 1 and two - barcode split and split the two reads
2. Split Read 1 and 2 and them join using FastQ joiner and split again
Basically I want to exclude any reads where the same numbered reads are not categorised in to the same barcode.
I am having problems that I think are related:
*1. * I have not been able to visualize (in Trackster) a custom build that
I recently added (Trackster says: "Could not load chroms for this dbkey:").
In addition, when I try to Operate on Genomic Intervals using bed files
associated with that particular build I get an error: Error executing tool:
maximum recursion depth exceeded while calling a Python object
2. Now I am trying to add a new custom build to see if there was something
wrong with the previous build, and I get an error right after I click on
"Add a new Custom Build" in the New Visualization menu (*The error has
been logged to our team.* If you want to contact us about this error,
please reference the following *GURU MEDITATION:
I am just wondering if anyone is having similar issues? Or if this is a
I've been using the "Join, Subtract and Group" to join my
transcriptome/annotation data to GO and GO Slim for some time (in the Main
galaxy). I just updated my GO files as I've run a a new data set, and have
been having trouble with the joining function, it never seems to complete
(while before it would be done in just a few minutes). It works just fine
joining my "new data" with my "old" GO files (which of course are now out
of date) but not the new GO files from both my collaborator and from EBI
(specifically the unipro). Not sure if its a file size limitation?
Dear Galaxy Developers,
I am trying to run Galaxy on ARM architecture. I am able to scramble the eggs locally. However, I am unable to run Galaxy on my machine as "run.sh" gives an error after some initialization steps:
Initializing datatypes_conf.xml from datatypes_conf.xml.sample
Initializing external_service_types_conf.xml from external_service_types_conf.xml.sample
Initializing migrated_tools_conf.xml from migrated_tools_conf.xml.sample
Initializing reports_wsgi.ini from reports_wsgi.ini.sample
Initializing shed_tool_conf.xml from shed_tool_conf.xml.sample
Initializing tool_conf.xml from tool_conf.xml.sample
Initializing shed_tool_data_table_conf.xml from shed_tool_data_table_conf.xml.sample
Initializing tool_data_table_conf.xml from tool_data_table_conf.xml.sample
Initializing tool_sheds_conf.xml from tool_sheds_conf.xml.sample
Initializing data_manager_conf.xml from data_manager_conf.xml.sample
Initializing shed_data_manager_conf.xml from shed_data_manager_conf.xml.sample
Initializing openid_conf.xml from openid_conf.xml.sample
Initializing tool-data/shared/ncbi/builds.txt from builds.txt.sample
Initializing tool-data/shared/ensembl/builds.txt from builds.txt.sample
Initializing tool-data/shared/ucsc/builds.txt from builds.txt.sample
Initializing tool-data/shared/ucsc/publicbuilds.txt from publicbuilds.txt.sample
Initializing tool-data/shared/igv/igv_build_sites.txt from igv_build_sites.txt.sample
Initializing tool-data/shared/rviewer/rviewer_build_sites.txt from rviewer_build_sites.txt.sample
Initializing tool-data/add_scores.loc from add_scores.loc.sample
Initializing tool-data/alignseq.loc from alignseq.loc.sample
Initializing tool-data/all_fasta.loc from all_fasta.loc.sample
Initializing tool-data/annotation_profiler_options.xml from annotation_profiler_options.xml.sample
Initializing tool-data/annotation_profiler_valid_builds.txt from annotation_profiler_valid_builds.txt.sample
Initializing tool-data/bfast_indexes.loc from bfast_indexes.loc.sample
Initializing tool-data/binned_scores.loc from binned_scores.loc.sample
Initializing tool-data/blastdb.loc from blastdb.loc.sample
Initializing tool-data/blastdb_p.loc from blastdb_p.loc.sample
Initializing tool-data/bowtie2_indices.loc from bowtie2_indices.loc.sample
Initializing tool-data/ccat_configurations.loc from ccat_configurations.loc.sample
Initializing tool-data/codingSnps.loc from codingSnps.loc.sample
Initializing tool-data/encode_datasets.loc from encode_datasets.loc.sample
Initializing tool-data/faseq.loc from faseq.loc.sample
Initializing tool-data/funDo.loc from funDo.loc.sample
Initializing tool-data/gatk_annotations.txt from gatk_annotations.txt.sample
Initializing tool-data/gatk_sorted_picard_index.loc from gatk_sorted_picard_index.loc.sample
Initializing tool-data/liftOver.loc from liftOver.loc.sample
Initializing tool-data/maf_index.loc from maf_index.loc.sample
Initializing tool-data/maf_pairwise.loc from maf_pairwise.loc.sample
Initializing tool-data/microbial_data.loc from microbial_data.loc.sample
Initializing tool-data/mosaik_index.loc from mosaik_index.loc.sample
Initializing tool-data/ngs_sim_fasta.loc from ngs_sim_fasta.loc.sample
Initializing tool-data/perm_base_index.loc from perm_base_index.loc.sample
Initializing tool-data/perm_color_index.loc from perm_color_index.loc.sample
Initializing tool-data/phastOdds.loc from phastOdds.loc.sample
Initializing tool-data/picard_index.loc from picard_index.loc.sample
Initializing tool-data/quality_scores.loc from quality_scores.loc.sample
Initializing tool-data/regions.loc from regions.loc.sample
Initializing tool-data/sam_fa_indices.loc from sam_fa_indices.loc.sample
Initializing tool-data/sequence_index_base.loc from sequence_index_base.loc.sample
Initializing tool-data/sequence_index_color.loc from sequence_index_color.loc.sample
Initializing tool-data/sift_db.loc from sift_db.loc.sample
Initializing tool-data/srma_index.loc from srma_index.loc.sample
Initializing tool-data/twobit.loc from twobit.loc.sample
Initializing static/welcome.html from welcome.html.sample
Traceback (most recent call last):
File "/mnt/ceph/galaxy/galaxy-dist/lib/galaxy/webapps/galaxy/buildapp.py", line 36, in app_factory
from galaxy.app import UniverseApplication
File "/mnt/ceph/galaxy/galaxy-dist/lib/galaxy/app.py", line 17, in <module>
from galaxy.visualization.data_providers.registry import DataProviderRegistry
File "/mnt/ceph/galaxy/galaxy-dist/lib/galaxy/visualization/data_providers/registry.py", line 2, in <module>
from galaxy.visualization.data_providers import genome
File "/mnt/ceph/galaxy/galaxy-dist/lib/galaxy/visualization/data_providers/genome.py", line 13, in <module>
File "/mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/__init__.py", line 137, in <module>
File "/mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/add_newdocs.py", line 9, in <module>
from numpy.lib import add_newdoc
File "/mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/lib/__init__.py", line 4, in <module>
from type_check import *
File "/mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/lib/type_check.py", line 8, in <module>
import numpy.core.numeric as _nx
File "/mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/core/__init__.py", line 5, in <module>
ImportError: /mnt/ceph/galaxy/galaxy-dist/eggs/numpy-1.6.0-py2.7-linux-armv7l-ucs4.egg/numpy/core/multiarray.so: Unable to run arch-specific checks
May I ask if anyone has any idea about this error? Many thanks.
I am currently creating a "bioinformatics" course for undergraduate
(biology students with no knowledge of programming). I would like to use
Galaxy as their everyday platform where they would learn the basics and use
the appropriate tools (BLAST and databases, multiple alignment,
phylogenetics, dealing with "omics" data, and so on).
Is there any available resources about using Galaxy for teaching
Any suggestions of good textbooks? Not a Galaxy textbook of course, but a
"bioinformatics textbook" that would be a good companion to help the
students understand the basics behind the tools.