On Mon, Aug 4, 2014 at 4:28 PM, Eric Rasche <rasche.eric@yandex.ru> wrote:
Hi Peter,
On 08/04/2014 09:25 AM, Peter Cock wrote:
Hi Mert,
Most of the Galaxy tools dealing with tables of data use "tabular" format (tab separated variables), not csv (comma separated variables). CVS is a horrible horrible mess of formats, see e.g. http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-...
This annoyed me when I was first starting out with galaxy, I really wish it'd be labelled TSV. The labels all read "CSV" so I gave galaxy "CSV" data and galaxy didn't like it, much to my confusion.
Which labels say "CSV" at the moment? (And yes, I would also have preferred "tsv" to "tabular" as the datatype name in Galaxy, that way it would match the typical file extension).
Also, most (biologists) I work with use the term CSV very generically without regard to the differences between the two.
I've seen that too - but people saying CSV when they mean TSV will unavoidable cause confusion.
Also beware that anything other than MS Excel could be confused by quirks in the Excel format, e.g. multiple ways to record dates: http://support.microsoft.com/kb/180162
I would personally save each tab of the Excel sheet as tab separated data, and import those into Galaxy.
Would it not make sense to have an XLS <-> TSV datatype converter? I'm sure many biologists would appreciate being able to use the in-galaxy version as opposed to having to open+re-save all of their data.
It makes sense to me to offer a tool mapping one Excel sheet to multiple tabular output files (one per sheet). How best to write this will depend on the platform and available dependencies (e.g. some of the R converters for this are Windows only IIRC). Peter