I'm unable to reproduce the issue reported by you. I used the BED file sent by you as attachment and fetched sequences on it. The sequences produced were of the same lengths as in the BED file.
Also, I computed summary statistics for the lengths of your BED intervals - the mean and median lengths were 19790 bp and 3466 bp respectively. Since you mention that you expect the constitutive exons to be 100-200 bp long, I'm guessing there might be a problem in one of the steps prior to the subtraction step in your pipeline.
If you can share your history with me (guru@psu.edu), I can take a look at what might be going on. Here's how to do it:
-go to the Options menu above your current history, select "Saved Histories"
-go to the pull-down menu for the problematic history and select "Share or Publish"
-share the history with me by clicking on "share with a user" and entering my email address: guru@psu.edu.
Thanks for using Galaxy,
Guru
Galaxy team.
On Mar 27, 2010, at 5:06 AM, Andigoni Malousi wrote:
Dear Galaxy member,
I'm sending you this e-mail because of a problem I have in fetching
sequences. I used Galaxy to fetch sequences corresponding to
alternatively spliced exons as well as constitutive exons.
Here they are the steps I followed to do that:
First, I extracted the coordinates of the ref genes from whole human
genome:
1. I selected Get Data -> UCSC table browser
2. Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene
Prediction Tracks", Track:"RefSeq genes".
3. Filter: (+) region: Genome
4. "get output" and "Exons plus 0 bases…"
Second, I extracted the coordinates of alternative splicing events
1. I selected "Get Data" -> "UCSC Main table browser".
2. Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene
Prediction Tracks", Track:"Alt Events".
3. Filter: (+) region: Genome
Then, I stored the outcome of these two processes in BED format. To
extract the coordinates of the constitutive exons I used the Subtract
operation in "Operate on Genomic Intervals"
1. Substract: Alternative splicing events From: Ref genes
2. "Intervals with no overlap"
3. Stored the constitutive exon coordinates in BED format
Finally, using the "Fetch sequences" I tried to extract the genomic
sequences for the outcome of the substraction (constitutive exons) in
FASTA format. Please see the attached files for the extracted
coordinates and sample sequences corresponding to constitutive exons.
The strange thing about the results is that while constitutive exons
are on average 100-200nt length the extracted sequences are much more
larger.
I was wondering whether there is something wrong in the whole procedure
or this is a bug of Galaxy that we need to report.
Thank you very much in advance,
Andigoni
Andigoni Malousi
Post-doc in Bioinformatics
Aristotle University of Thessaloniki
<GalaxyHistoryItem-3-[Subtract_on_data_2_and_data_1]-constitutive.bed><constitutive.txt>_______________________________________________ galaxy-user mailing list galaxy-user@lists.bx.psu.edu http://lists.bx.psu.edu/listinfo/galaxy-user