I am attempting to use Galaxy to calculate the mean sequence read
length and identify the range of read lengths for my 454 data. The
data has already been organized and sorted by species. The format of
the data is as follows:
etc...for each species
I have attempted to use the "Summary Statistics" button, however it
appears to only be for numerical data and not sequence data. Is this
User name: dac330
I've several questions regarding multiplex data and its analysis.
First, I'm using a MiSeq run on 96-plex (dual index) data. While our current MiSeq data has had the machine the adapters ( including the indexes), it would be nice to be able to clip adapters using a file of adapters rather than one at a time for our older data, which didn't have the adapter clipped.
Second, we have the MiSeq run a BWA alignment to the refererence genome for each (96) sample. The folder containing them all averages at about 5GB of data. I noticed that on a local version of Galaxy, a user with administrator privileges can load multiple files. But how can a regular user load them to the main galaxy? I've been reading the developer thread-is this something that will be released soon-ish?
Third, Is there a way to run a workflow , several Samtools and Picard analysis, on multiple files so they all get processed?
Ann Holtz-Morris, M.S.
de Jong Laboratory
I have a VCF file and I want to filter it for nonsynonymous/ deletion/
insertion seq variations. Once I filter this file and compare between
tumor vs normal samples and then annotate such variations. I believe I can
filter this file using SnpSift and then can annotate with SnpEff, When I
try to use Snsift filter it just says arbitrary expression. Are there rules
how to use expression for a particular filter with in galaxy. If any one
has used SnpSift in galaxy may share their expertise.
Having provided a name (field 4) in a UCSC bed file ( http://www.genome.ucsc.edu/FAQ/FAQformat.html#format1 ) and sought a RefSeq name using the UCSC Table Browser ( http://www.genome.ucsc.edu/cgi-bin/hgTables ), I would now like to recover which line of the bed file delivered which line of the output file… However, I am told I need Galaxy to provide a workflow to do this. Can anyone explain how? eg, one line of my bedfile looks like:
chr2 2723752 2723777 seqid6354405 0 -
and one line of my intersected table browser output looks like:
chr1 176432306 176811970 NM_020318 0 + 176525458 176811590 0 23 248,1835,1072,146,294,193,122,490,129,92,194,147,136,217,172,178,214,169,136,110,72,99,455, 0,92236,131353,207799,226966,228955,232567,235929,239436,243188,246812,248664,276455,276809,302495,306436,307796,326638,328176,330389,336890,377002,379209,
Clearly the first line of my bed doesn't correspond to the first line of my intersection output, but as my bed is long, what reference can I use to unambiguously identify which line of output the first line of my intersection corresponds to? How do I do this in Galaxy?
PS - I tried this workflow earlier today without success, aiming to achieve a similar objective: https://usegalaxy.org/u/james/w/workflow-from-ucsc-genes-and-symbols
PPS- I also note similar issues were raised in this discussion, with Galaxy promoted as the solution, but with no real details about how to achieve the desired results:
Bert Gold, Ph.D., FACMG
Frederick, MD 21702
I am a user from Cornell University. And you website is a great help to me
and my research. But there are two problems with it I cannot figure out by
myself, hoping you can help me.
1. When I uploading the data via FTP, there's option of mouse reference
genome mmp10. When I get to Tophat2, there's only mmp9. Is there a problem
that I use mmp10 at the beginning and use mmp9 at tophat2? Or maybe you
will update the tophat2?
2. I have around 50G space missing. I have one and only one history (at
least I can see) with 171.5G, but when I checked my preference I used
225.2G. I don't know where the missing 50G count for then I don't know how
to make room for my ongoing analysis. My user name is douyadou. Can you
help me check for a min?
I’ve been using Galaxy for RNA-seq analysis. Many thanks for this great resource!
I have run into a problem that I hope someone might be able to help me with. When loading my .BAM files and/or .gtf files into trackster, I get an error stating: "Input error: Chromosome chrDecoy found in your input file but not in your genome file.”
My Illumina fastq files were QCed and aligned using tophat to hg19, and I used cufflinks with hg19 as annotation guide. All my files have hg19 as the dbkey.
My guess is that the hg19 I used as a reference differs form the built in model, but I am not sure how I might fix this. Fwiw, I am able to load all my data into IGV, using hg19 as the genome, and visualize everything.
Any suggestions would be really appreciated!
Please note that this e-mail and any files transmitted from
Memorial Sloan-Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.
I have a user who is running into some problems on the Galaxy 101
tutorial at: https://usegalaxy.org/u/aun1/p/galaxy101. Specifically, in
Safari, it's prompting for a userid and password for each image (and the
galaxy account doesn't work). I can confirm this behavior myself. On
Chrome, there is no prompt, but no images show up. Any suggestions
would be quite welcome. Hoping to give a good first impression... ;-)
Lance Parsons - Scientific Programmer
134 Carl C. Icahn Laboratory
Lewis-Sigler Institute for Integrative Genomics
I have used http://galaxy.nbic.nl/ server to run edgeR on my RNA-seq data to identify differentially expressed genes successfully in August 2013.
However, while trying with another digital expression matrix at present, I get only empty output repeatedly.
Has anyone used edgeR wrapper recently?
What could be the problem?