I'm sending you this e-mail because of a problem I have in fetching
sequences. I used Galaxy to fetch sequences corresponding to
alternatively spliced exons as well as constitutive exons.
Here they are the steps I followed to do that:
First, I extracted the coordinates of the ref genes from whole human
genome:
1. I selected Get Data -> UCSC table browser
2. Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene
Prediction Tracks", Track:"RefSeq genes".
3. Filter: (+) region: Genome
4. "get output" and "Exons plus 0 bases…"
Second, I extracted the coordinates of alternative splicing events
1. I selected "Get Data" -> "UCSC Main table browser".
2. Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene
Prediction Tracks", Track:"Alt Events".
3. Filter: (+) region: Genome
Then, I stored the outcome of these two processes in BED format. To
extract the coordinates of the constitutive exons I used the Subtract
operation in "Operate on Genomic Intervals"
1. Substract: Alternative splicing events From: Ref genes
2. "Intervals with no overlap"
3. Stored the constitutive exon coordinates in BED format
Finally, using the "Fetch sequences" I tried to extract the genomic
sequences for the outcome of the substraction (constitutive exons) in
FASTA format. Please see the attached files for the extracted
coordinates and sample sequences corresponding to constitutive exons.
The strange thing about the results is that while constitutive exons
are on average 100-200nt length the extracted sequences are much more
larger.
I was wondering whether there is something wrong in the whole procedure
or this is a bug of Galaxy that we need to report.
Thank you very much in advance,
Andigoni
Andigoni Malousi
Post-doc in Bioinformatics
Aristotle University of Thessaloniki