So I'm continuing to run into issues with BED vs BED12 files due to Galaxy automatically converting GFF/GTF files to non-BED12 files. Essentially, the RSeQC tools require BED12 files, but Galaxy allows users to use GTF/GFF files which it automatically converts incorrectly.

Is there any desire/support on the part of the Galaxy team to allow tools to use/require the BED12 format?

If not, I'll work on incorporating the conversion(s) within each tool wrapper, unless someone has an alternate suggestion.

Thanks all,

Lance

Lance Parsons
January 29, 2016 at 11:03 AM
Well, I somewhat follow you. My main goal would be to allow people to use GTF files in the RSeQC tools. My initial thought was to write a tool to convert GTF to BED12 (which I've done). However, it would be really nice to have Galaxy be able to automatically convert behind the scenes. The problem there is that it already does conversion from GTF to BED, but not to BED12. Another option would be to include the conversion as part of the tools, but that is kinda messy and doesn't help any other tools that need the same thing.

I'm open to suggestions on how to handle this, but right now, the only option I can see is to build the conversion in as part of the tool, correct? I'm not quite sure why allowing users to specify BED12 vs BED6 vs "BED" is any worse than the way things work with other datatypes (fastq, fastqsanger, etc.), but I realize you guys have a lot of experience with users that I don't.

Lance

Daniel Blankenberg wrote:

Daniel Blankenberg
January 28, 2016 at 11:55 AM
Hi Lance,

Ah, yes, using bed12 would be problematic, as all bedstrict types are currently set to not be uploadable or datatype assignable. This stems from a long history of abuse of the BED format in Galaxy (where datasets should have been generic ‘interval’ matching bed metadata of chrome,start,end of 1,2,3). I added these datatypes to force conversions of bed/interval files to Real bed/6/12 files, especially as needed by external visualization tools — that way you can click on a ‘bed’ (fake) file and Galaxy will convert it to Real bed and load the external service, or run the Galaxy tool on the properly formatted data.

Basically, there are restrictions in place to try to ensure that a bedstrict datatype is actually BED conforming. I am open to loosening these restrictions up, however, and allowing them to function as ’normal' datatypes (users would be free to shoot themselves in the foot by mis-assigning the datatype). In the meantime, you should be able to test this by flipping ‘allow_datatype_change’ in interval.py to True.

Thoughts?


Thanks,

Dan





Lance Parsons
January 27, 2016 at 3:38 PM
Thanks for the info Dan. I've explored using bed12, but I have a few questions.

1. When attempting to use 'bed12' or 'Bed12' as a file type in a test, I get the following error:

    Exception: {u'message': {u'type': u'error', u'data': {u'file_type': u"An invalid option was selected for file_type, u'Bed12', please verify.", u'files_metadata': [u"An invalid option was selected for file_type, u'Bed12', please verify."]}}}

2. I see that Bed12 is in the datatypes_conf.sample file. Is there a way to add a converter for that datatype? Perhaps something like: https://wiki.galaxyproject.org/ToolShedDatatypesFeatures#Including_datatype_converters_and_display_applications? My concern is that since it already exists, I wouldn't be able to add a converter. Also, the sniffer doesn't seem to work (it just finds the files as "bed", thus my desire to specify ftype in tests).

Thanks,
Lance

Daniel Blankenberg wrote:

Hi Lance,

FWIW, there is an existing bedstrict and bed12 (and bed6) datatypes in Galaxy. The strict datatypes are currently usually created by implicit datatype converters and are most often used by some external display applications that need standards conforming files.  bed6/12 are subclasses of bedstrict. They can of course be consumed or created by any sort of tool. Please let us know if we can provide additional information.


Thanks for using Galaxy,

Dan


On Jan 14, 2016, at 4:26 PM, Lance Parsons<lparsons@princeton.edu>  wrote:


Does anyone know of any efforts to create a BED12 datatype for Galaxy? Since some tools require BED12 and the automatic convertion from GFF-to-BED does not seem to generate a BED12, it seems it might be a worthwhile addition.

If not, what would be the best way to go about doing this? Making it part of the core galaxy (which would allow multiple tools to share the same data type definitions) or making it part of a toolshed tool (which I'm not sure how to do)? BTW, I'm thinking about RSeQC at the moment, but I know other tools use/require this format.

--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 141 (Temporary)
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/



January 25, 2016 at 9:47 AM
Hi Lance,

FWIW, there is an existing bedstrict and bed12 (and bed6) datatypes in Galaxy. The strict datatypes are currently usually created by implicit datatype converters and are most often used by some external display applications that need standards conforming files. bed6/12 are subclasses of bedstrict. They can of course be consumed or created by any sort of tool. Please let us know if we can provide additional information.


Thanks for using Galaxy,

Dan



January 14, 2016 at 4:26 PM
Does anyone know of any efforts to create a BED12 datatype for Galaxy? Since some tools require BED12 and the automatic convertion from GFF-to-BED does not seem to generate a BED12, it seems it might be a worthwhile addition.

If not, what would be the best way to go about doing this? Making it part of the core galaxy (which would allow multiple tools to share the same data type definitions) or making it part of a toolshed tool (which I'm not sure how to do)? BTW, I'm thinking about RSeQC at the moment, but I know other tools use/require this format.


Daniel Blankenberg
January 25, 2016 at 9:47 AM
Hi Lance,

FWIW, there is an existing bedstrict and bed12 (and bed6) datatypes in Galaxy. The strict datatypes are currently usually created by implicit datatype converters and are most often used by some external display applications that need standards conforming files. bed6/12 are subclasses of bedstrict. They can of course be consumed or created by any sort of tool. Please let us know if we can provide additional information.


Thanks for using Galaxy,

Dan



Lance Parsons
January 14, 2016 at 4:26 PM
Does anyone know of any efforts to create a BED12 datatype for Galaxy? Since some tools require BED12 and the automatic convertion from GFF-to-BED does not seem to generate a BED12, it seems it might be a worthwhile addition.

If not, what would be the best way to go about doing this? Making it part of the core galaxy (which would allow multiple tools to share the same data type definitions) or making it part of a toolshed tool (which I'm not sure how to do)? BTW, I'm thinking about RSeQC at the moment, but I know other tools use/require this format.


--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 141 (Temporary)
Lewis-Sigler Institute for Integrative Genomics
Princeton University