Well, I somewhat follow you. My main goal would be to allow people to use GTF files in the RSeQC tools. My initial thought was to write a tool to convert GTF to BED12 (which I've done). However, it would be really nice to have Galaxy be able to automatically convert behind the scenes. The problem there is that it already does conversion from GTF to BED, but not to BED12. Another option would be to include the conversion as part of the tools, but that is kinda messy and doesn't help any other tools that need the same thing.

I'm open to suggestions on how to handle this, but right now, the only option I can see is to build the conversion in as part of the tool, correct? I'm not quite sure why allowing users to specify BED12 vs BED6 vs "BED" is any worse than the way things work with other datatypes (fastq, fastqsanger, etc.), but I realize you guys have a lot of experience with users that I don't.

Lance

Daniel Blankenberg wrote:
Hi Lance,

Ah, yes, using bed12 would be problematic, as all bedstrict types are currently set to not be uploadable or datatype assignable. This stems from a long history of abuse of the BED format in Galaxy (where datasets should have been generic ‘interval’ matching bed metadata of chrome,start,end of 1,2,3). I added these datatypes to force conversions of bed/interval files to Real bed/6/12 files, especially as needed by external visualization tools — that way you can click on a ‘bed’ (fake) file and Galaxy will convert it to Real bed and load the external service, or run the Galaxy tool on the properly formatted data.

Basically, there are restrictions in place to try to ensure that a bedstrict datatype is actually BED conforming. I am open to loosening these restrictions up, however, and allowing them to function as ’normal' datatypes (users would be free to shoot themselves in the foot by mis-assigning the datatype). In the meantime, you should be able to test this by flipping ‘allow_datatype_change’ in interval.py to True.

Thoughts?


Thanks,

Dan



On Jan 27, 2016, at 3:38 PM, Lance Parsons <lparsons@princeton.edu> wrote:

Thanks for the info Dan. I've explored using bed12, but I have a few questions.

1. When attempting to use 'bed12' or 'Bed12' as a file type in a test, I get the following error: 

    Exception: {u'message': {u'type': u'error', u'data': {u'file_type': u"An invalid option was selected for file_type, u'Bed12', please verify.", u'files_metadata': [u"An invalid option was selected for file_type, u'Bed12', please verify."]}}}

2. I see that Bed12 is in the datatypes_conf.sample file. Is there a way to add a converter for that datatype? Perhaps something like:https://wiki.galaxyproject.org/ToolShedDatatypesFeatures#Including_datatype_converters_and_display_applications? My concern is that since it already exists, I wouldn't be able to add a converter. Also, the sniffer doesn't seem to work (it just finds the files as "bed", thus my desire to specify ftype in tests).

Thanks,
Lance

Daniel Blankenberg wrote:

Hi Lance,

FWIW, there is an existing bedstrict and bed12 (and bed6) datatypes in Galaxy. The strict datatypes are currently usually created by implicit datatype converters and are most often used by some external display applications that need standards conforming files.  bed6/12 are subclasses of bedstrict. They can of course be consumed or created by any sort of tool. Please let us know if we can provide additional information.


Thanks for using Galaxy,

Dan


On Jan 14, 2016, at 4:26 PM, Lance Parsons<lparsons@princeton.edu>  wrote:


Does anyone know of any efforts to create a BED12 datatype for Galaxy? Since some tools require BED12 and the automatic convertion from GFF-to-BED does not seem to generate a BED12, it seems it might be a worthwhile addition.

If not, what would be the best way to go about doing this? Making it part of the core galaxy (which would allow multiple tools to share the same data type definitions) or making it part of a toolshed tool (which I'm not sure how to do)? BTW, I'm thinking about RSeQC at the moment, but I know other tools use/require this format.

-- 
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 141 (Temporary)
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
http://galaxyproject.org/search/mailinglists/



 
January 25, 2016 at 9:47 AM
Hi Lance,

FWIW, there is an existing bedstrict and bed12 (and bed6) datatypes in Galaxy. The strict datatypes are currently usually created by implicit datatype converters and are most often used by some external display applications that need standards conforming files. bed6/12 are subclasses of bedstrict. They can of course be consumed or created by any sort of tool. Please let us know if we can provide additional information.


Thanks for using Galaxy,

Dan



 
January 14, 2016 at 4:26 PM
Does anyone know of any efforts to create a BED12 datatype for Galaxy? Since some tools require BED12 and the automatic convertion from GFF-to-BED does not seem to generate a BED12, it seems it might be a worthwhile addition. 

If not, what would be the best way to go about doing this? Making it part of the core galaxy (which would allow multiple tools to share the same data type definitions) or making it part of a toolshed tool (which I'm not sure how to do)? BTW, I'm thinking about RSeQC at the moment, but I know other tools use/require this format. 


-- 
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 141 (Temporary)
Lewis-Sigler Institute for Integrative Genomics
Princeton University



--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 141 (Temporary)
Lewis-Sigler Institute for Integrative Genomics
Princeton University