Hi Keith, Good questions - hopefully this info can help: To get from BED3 into BED12, use the BED3 as a filter in the UCSC Table Browser against a gene track (UCSC Genes, RefSeq Genes, etc.) and send the output to Galaxy. Or better, use a BED6 so that you can include strand in column 6, just enter NULL values for name (".", column 4) and score ("0", column 5) to pad the file format out correctly so that the UCSC Table Browser can interpret it. Interval is a Galaxy file type, with the UCSC Browser, the BED format must be intact and to spec. BED format is defined on the BED-to-GFF tool help (scroll down). If BED12, the features listed are interpreted from the format. If you want repeat information and such, then perhaps a tool like "Operate on Genomic Intervals -> Profile Annotations" would be a good choice. From the results, you could determine which ancillary tracks to pull over into Galaxy from the UCSC Table browser (in GTF or BED format). There are choices here (multiple repeat tracks, for example). Please note this tool is set up for human annotation currently. When running a query in the Table browser for certain data, the way the internal query is structured will pull out as a result every entry in the track with any coverage, complete (i.e. not limited to the original BED/Coordinate filters). BED3 would be necessary to pull in data contained in introns-only, although a BED6 that included strand might be a better choice for some tracks (those that are stranded). Don't use a BED12 if you want information about the entire region (transcribed & other). These "any coverage" results from UCSC can be trimmed down in Galaxy using tools in "Operate on Genomic Intervals" and "Join, Subtract and/or Group" (depends on the data). The process would be step-by-step the first time, but can be easily saved into a workflow to use again without having to re-do it each time around. If you would like more help, just let us know, Jen Galaxy team On 4/7/11 11:20 AM, Keith E. Giles wrote:
Hi Jen, I actually used a BED3. I thought the script could go to HG18 for each entry and then see what was there. Does the feature information need to be in the BED file in order for it to get into the GFF file? If so, do you know of a way to map each aligned read to a certain feature?
On Thu, Apr 7, 2011 at 2:16 PM, Jennifer Jackson <jen@bx.psu.edu <mailto:jen@bx.psu.edu>> wrote:
Hi Keith,
Are you using a full BED12 file? Or just a BED3-6? Full BED12 should return the available features:
3. feature - The name of this type of feature. Some examples of standard feature types are "CDS", "start_codon", "stop_codon", and "exon".
If you would like to share a history, that would help if this is not enough information ("Options -> Share or Publish). You can send the link to me directly.
Best,
Jen Galaxy team
On 4/7/11 10:46 AM, Keith Giles wrote:
I am trying to use the galaxy "BED to GFF" function. The operation worked, but instead of giving me back any feature information (e.g., exon, intron, repeat, etc.); I just received back the sequence of the interval contained within the BED file. Does anyone know what I'm doing wrong? Moreover, does anyone know the best way to map each read of a RNAseq run to a given feature?
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org <http://usegalaxy.org>. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org
participants (1)
-
Jennifer Jackson