Please, in the future forward such questions to galaxy-user(a)bx.psu.edu mailing list. This is the most reliable way to get your queries answered.
If you carefully go through http://usegalaxy.org/galaxy101 you will see how to count occurrences of features (such as LINES) within genomic ranges (such as (ChrX:144,000,000-146,500,000)).
On Apr 19, 2012, at 12:53 PM, Binnaz Yalcin wrote:
> Dear Anton,
> Can I do an anrichment analysis in Galaxy? if so, how?
> I want to know whether a region on mouse chromosome X (ChrX:144,000,000-146,500,000 bp) is enriched with LINE elements, compared to the rest of the chromosome X.
> Any advice would be much more appreciated.
> Binnaz YALCIN
> [Not sure if this is better suited to galaxy-dev or -user, so I'm sending to both].
galaxy-user is most appropriate for this question because it related to usage of Galaxy; galaxy-dev is for local installation and tool development questions.
> My question is - can I create a Galaxy 'Published Page' from my local Galaxy instance/histories, and then transfer that page to the main Galaxy instance?
Not currently, though this is in our long-term plan.
> The reason is that I cannot make my local Galaxy instance public, as I am using a campus resource to host our galaxy. If this is possible, how can I do that? If not, any other ideas?
It is possible to move datasets and workflows relatively easily between instances, so I'd recommend that:
(a) you move your data and workflows to our public instance;
(b) rerun your analyses on the public instance to create the required;
(c) create and host the Page on our public instance.
You can be assured that we will maintain our public server over the coming years and your Page will remain available and have a stable URL.
> Also, are there any tutorials/pages on how to create Published Pages in Galaxy in the first place?
Not yet, though the idea is for the Page editor to be self explanatory. Here's how to get started with Pages:
(a) from User menu, go to Saved Pages;
(b) create a Page;
(c) edit the Page using the Web-based editor; there are menus for inserting embedded datasets, workflows, histories, and visualizations as well as performing standard word-processing operations.
Let us know if you have problems/questions and we'll start a guide for creating Pages.
I've run FastQC successfully on several different files, but I can not download them or view them. In either. case I receive the following message in my browser (Firefox):
An error occurred. See the error logs for more information. (Turn debug on to display exception reports here)
If I go into a history and try to view other FastQC reports that I have viewed before, I get the same error. So I don't think this is a data format or tools issue.
I searched the archives, and the server error message has been reported before, and was fixed but no details were given.
Thanks for your help.
The numbers 87 and 93 are the actual maximum lengths of the aligned
regions on either side of the junction. If you want to examine your
pair-end data statistically, the "NGS: Picard (beta)" tool group has
several tool options.
However, examining the track at the gene/transcript level for a few well
characterized gene bounds is really the best way to understand how the
file describes the data. A browser with your tracks loaded (Trackster or
UCSC), the text data files, and the Cufflinks manual/FAQ will likely
address most of your questions or at least will be a good orientation.
The visual portion of this helps a great deal.
To address the visualization at UCSC, I can point you to their User
Guide: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html and
contact mailing list: http://genome.ucsc.edu/contacts.html
Good luck with your project. Please remember to keep questions to the
Galaxy team on our mailing lists so that our entire team and community
-------- Original Message --------
Subject: RE: [galaxy-dev] Tophat output
Date: Wed, 18 Apr 2012 15:23:43 +0000
From: Xu, Jianpeng <jianpeng.xu(a)emory.edu>
To: Jennifer Jackson <jen(a)bx.psu.edu>
In the history, I have the splice junction file, and click it to show
the display at UCSC main. Then I click display at UCSC main. It will
open the USCS Genome Browser. Since this is the first time for me to
visualize the splice junction, can you give me more instructions on how
to visualize it with UCSC genome browser ?
On 4/18/12 7:58 AM, Xu, Jianpeng wrote:
> Thanks a lot, Jennifer. It is very useful and helpful. I got the result using Paired-end reads. The read length for both ends is 100 bp.
> chr20 199821 204701 JUNC00000001 17 - 199821 204701 255,0,0 2 87,93 0,4787
> Since the read length is 100 bp, why the 87, 93 are less than 100 ?
> Below is a sing end read result:
> chr11 60277777 60278396 JUNC00000001 1 + 60277777 60278396 255,0,0 2 22,28 0,591
> Can you explain a little bit more ?
> From: Jennifer Jackson [jen(a)bx.psu.edu]
> Sent: Wednesday, April 18, 2012 2:56 AM
> To: Xu, Jianpeng
> Cc: galaxy-dev(a)lists.bx.psu.edu
> Subject: Re: [galaxy-dev] Tophat output
> Hello Jianpeng,
> The output files from TopHat are described on the TopHat tool form:
> --- quote ---
> Tophat produces two output files:
> junctions -- A UCSC BED track of junctions reported by TopHat. Each
> junction consists of two connected BED blocks, where each block is as
> long as the maximal overhang of any read spanning the junction. The
> score is the number of alignments spanning the junction.
> accepted_hits -- A list of read alignments in BAM format.
> Two other possible outputs, depending on the options you choose, are
> insertions and deletions, both of which are in BED format.
> BED format is described in the Galaxy wiki, which includes links to the
> UCSC BED format description (they authored the format).
> Two important rules to remember about BED format:
> rule #1: coordinate data is already reported with respect to the (+) strand
> rule #2: "start" is defined as the smallest coordinate, "end" is defined
> as the largest coordinate, due to the rule #1.
> BED files have a 0-based, fully-closed, "start" position in data files,
> but in browsers the data will display as 1-based. This means you'll need
> to add "1" to any "start" coordinate in a .bed file to locate it in a
> display application. The two will not and should not match. The "end"
> coordinate is also 0-based, but half-open. This will make it appear to
> be 1-based for casual users, so it will match between data files and
> display applications.
> Using the first data row as an example and this information, we can tell
> chr20 199821 204701 JUNC00000001 17 - 199821 204701 255,0,0 2
> 87,93 0,4787
> * column 5 is 'score', or 'number of alignments spanning the
> junction'. In this case, "17" alignments.
> * column 11 is the blockSizes, or 'read maximal overhang' of the
> junctions (max alignment length). The first is 87 bases, the second 93
> * column 12 is the blockStarts, or 'overhang start' of the junctions
> (alignment start). The first is 0, the second 4787 bases. I am fairly
> certain that the first is always 0 and the second could be interpreted
> as the 'intron' length, but someone please correct me if this is wrong!
> Some calculations can be done with these numbers with respect to the
> overall position of the junction already defined in columns 1,2,3
> (chrom, start, end): chr20:199821-204701 (-) that define the location of
> the predicted splices, the flanking aligned regions, and the (presumed)
> 'intron'. This example is a bit tricky because the alignment is on the
> (-) strand, but for most uses it is enough to simply calculate backwards
> from the end coordinate to the start. (Consider the end the start, and
> the start the end). If this sounds confusing, that's because it is! When
> you visualize the data the concept will make more sense and it is
> definitely worth learning about.
> Brief explanation: The first start is 0, which literally means that it
> starts at the very beginning of the alignment (0-based), which would be
> at position chr20, base 204,701, on the (-) strand. This alignment would
> continue for 87 bases, then stop. Then the splice would be present. The
> second start is at position (204701 - 4787) = 199914 = chr20, base
> 199,914, on the (-) strand. This is where the second splice would be
> present. This alignment would continue for 93 bases. The places the end
> at (199914 - 93) = 199821 = chr20, base 199,821, on the (-) strand.
> Which is the same as the reported global junction start position, which
> we are considering our "end", because this is a (-) stranded alignment.
> And, it all adds up.
> Trackster would be a good place to start for "Visualization (use the top
> menu bar link). The dataset can also be saved as a regular .bed file and
> loaded as a custom track into the UCSC Genome Browser (If the direct
> link is not fully configured yet).
> Hopefully this helps,
> Galaxy team
> On 4/17/12 7:20 AM, Xu, Jianpeng wrote:
>> I have installed local galaxy. I used the Tophat to do the RNA-seq
>> alignment and got a output file: splice junction in bed format.
>> I can not understand it clearly. What does the number 17, 14 ... in the
>> column 5 mean ? What does the 87,93 mean ? What does the 0, 4787 mean ?
>> Can you explain a little bit to me ? Which tool can be used to view this
>> file ?
>> track name=junctions description="TopHat junctions"
>> chr20 199821 204701 JUNC00000001 17 - 199821 204701 255,0,0 2 87,93 0,4787
>> chr20 204631 205520 JUNC00000002 14 - 204631 205520 255,0,0 2 96,87 0,802
>> chr20 205428 205775 JUNC00000003 9 - 205428 205775 255,0,0 2 92,91 0,256
>> chr20 205699 205958 JUNC00000004 15 - 205699 205958 255,0,0 2 87,92 0,167
>> chr20 205929 207067 JUNC00000005 31 - 205929 207067 255,0,0 2 95,97 0,1041
>> chr20 206977 207909 JUNC00000006 19 - 206977 207909 255,0,0 2 93,97 0,835
>> chr20 207884 212679 JUNC00000007 15 - 207884 212679 255,0,0 2 87,76 0,4719
>> chr20 207910 218238 JUNC00000008 1 - 207910 218238 255,0,0 2 61,39 0,10289
>> chr20 212628 218293 JUNC00000009 28 - 212628 218293 255,0,0 2 94,94 0,5571
>> This e-mail message (including any attachments) is for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If the reader of this message is not the intended
>> recipient, you are hereby notified that any dissemination, distribution
>> or copying of this message (including any attachments) is strictly
>> If you have received this message in error, please contact
>> the sender by reply e-mail message and destroy all copies of the
>> original message (including attachments).
>> Please keep all replies on the list by using "reply all"
>> in your mail client. To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
> Jennifer Jackson
I am working with a genome that is not well annotated and when I tried
to obtain the 3'UTR of around 500 genes from Biomart, I only obtain a few
(around 20). What would be the easiest way to obtain the 3'UTR using my
RNA-seq data (I have 15 samples sequenced)?. I can see the reads on the
genomic browser, but I need an automated process. I tried to get the
untranscribed gene from Biomart, then the CDS and cDNA and align them, to
see the 3'UTR, but still not able get anything after the stop code. Does
anyone have experience with it? I also saw that there a lot of genes with
"wrong" annotation within the coding region.
Texas A&M Entomology
Vector Biology Research Group
979 845 1885
After mapping RNA-Seq paired end reads with Tophat, I can see that most of reads fall into the right regions. However, I still can see lots of reads mapped to non-coding region (the locations where the reads are mapped to don't contain exons).
I am wondering if these "non-coding reads" will be included when cufflinks calculates transcript/gene expression.
Dying to know your opinion.
And another question is: how to know the number of reads mapped to a certain exon?
I'm trying to download the screencasts referenced in the Galaxy ENCODE
paper (Blankenberg et al 2007). They appear to be on
screencast.g2.bx.psu.edu, but while that domain resolves, it doesn't
respond to HTTP requests. Does anyone know where they are currently
I'm trying to clip adapter sequences off the ends of my
sequencing reads. I tested the tool by purposefully adding parts of the
adapter sequence or the full adapter sequence to the 3' end. The tools is
amazing in detecting these sequences. However, I encountered some
problems. On top of detecting my adapter sequence I found that unrelated
sequences are sometimes trimmed off at the 3' end. For example the
sequence "TATCCACGTGCTC" was trimmed off even though it does not match the
adapter sequence GATCCTCGGCCGCGACC at all. Is there a way to increase the
fidelity of the tool? Any comments will be appreciated! Thank you!
Just installed Galaxy locally and ran "sh run.sh" for the first time.
The web page on 127.0.0.1:8080 gave this error:
ValueError: expected only letters, got ' en'
This was from line 762 in core.py
if not lang.isalpha()
raise ValueError('expected only letters, got %r' % lang)
So, the workaround is to insert a line before:
lang = lang.strip()
Then it worked :)
Biorenewables Group (IBERS)
01970 823 069