Hi,
I have one stupid question. The coordinates of the region chr1 2351533 - 2351843 from UCSC (hg18) will retrieve 311 bases. However, when I use Fetch Sequences from galaxy, it will only retrieves 310 bases. Apparently, the first base of the 311 bases is missing from the Fetch Sequences result because the ending bases are the same.
Does this mean that I need to modified the coordinates first and then use the Fetch Sequences to get the correct sequence? I thought UCSC and galaxy were both 0 base?
Thanks.
Sean
Hello,
The coordinates are interpreted in Galaxy as having a 0-based start. This means that in order to determine the actual start genome position, add 1. Not a stupid question - everyone has to learn this as they begin to work with data types sourced originally from UCSC and associated projects.
Depending on which tool you are using in the UCSC database, the coordinates will be interpreted as 0-based or 1-based. What tools outside of UCSC or Galaxy do with the coordinates can vary.
In general: positional coordinates of format "chrA:NNN-NNNN" will be 1-based BED/Interval format will be 0-based
More help is on the "Convert formats" tool descriptions (included in BED format description). And, this link at UCSC has all of the details: http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
Hopefully this helps!
Best,
Jen Galaxy team
On 4/6/11 11:15 PM, Sean wrote:
Hi,
I have one stupid question. The coordinates of the region chr1 2351533 - 2351843 from UCSC (hg18) will retrieve 311 bases. However, when I use Fetch Sequences from galaxy, it will only retrieves 310 bases. Apparently, the first base of the 311 bases is missing from the Fetch Sequences result because the ending bases are the same.
Does this mean that I need to modified the coordinates first and then use the Fetch Sequences to get the correct sequence? I thought UCSC and galaxy were both 0 base?
Thanks.
Sean
The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
galaxy-user@lists.galaxyproject.org