Dear Galaxy member,

I'm sending you this e-mail because of a problem I have in fetching sequences. I used Galaxy to fetch sequences corresponding to alternatively spliced exons as well as constitutive exons.
Here they are the steps I followed to do that:
First, I extracted the coordinates of the ref genes from whole human genome:
1.     I selected Get Data -> UCSC table browser
2.     Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene Prediction Tracks", Track:"RefSeq genes".
3.     Filter:  (+) region: Genome
4.     "get output" and "Exons plus 0 bases…"
Second, I extracted the coordinates of alternative splicing events
1.     I selected "Get Data" -> "UCSC Main table browser".
2.     Genome: "Human", assembly: "Feb. 2009", Group: "Genes and Gene Prediction Tracks", Track:"Alt Events".
3.     Filter: (+)
region: Genome

Then, I stored the outcome of these two processes in BED format. To extract the coordinates of the constitutive exons I used the Subtract operation in "Operate on Genomic Intervals"
1. Substract: Alternative splicing events  From: Ref genes
2. "Intervals with no overlap"
3. Stored the constitutive exon coordinates in BED format

Finally, using the "Fetch sequences" I tried to extract the genomic sequences for the outcome of the substraction (constitutive exons) in FASTA format. Please see the attached files for the extracted coordinates and sample sequences corresponding to constitutive exons.

The strange thing about the results is that while constitutive exons are on average 100-200nt length the extracted sequences are much more larger.

I was wondering whether there is something wrong in the whole procedure or this is a bug of Galaxy that we need to report.


Thank you very much in advance,
Andigoni


Andigoni Malousi
Post-doc in Bioinformatics
Aristotle University of Thessaloniki