Hi All,

I noticed there is a bug when you read in tab separated files and leave them as type auto.

These are then identified by
https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/tabular.py
as "CSV" as the CSV type uses the python module "csv" which can read tab separated files.

Fine so far EXCEPT that CSV's set_meta method does not read columns correctly if tab separated.
def
set_meta( self, dataset, **kwd ):
...
reader = csv.reader(csvfile)    #line 920

The default delimiter for ythins csv module is comma so a tab separated file  file will have only 1 column.

As a result especially in Planemo parameters of type type="data_column" will not work as the systems thinks there is only one column in the data.

==

The CSV data type needs to be fixed or to protect backward compatibility replaced.

There are then several options for comma separated files.

1. Use python csv's sniff method to detect the delimiter in set_meta.
This will result in a slow down and effect backward compatability.

2. Make CSV handle only comma separated files.
Improve the def sniff( self, filename ): method (line 907) to make sure it is comma separated.
There are various clean ways of doing this.

3. Create a new True_CSV type that sniffs only comma separated files but leave the old one for backward compatibility.


For tab separated files

1. Above works here too

4 Then allow the default tabular to handle tab separated files.

5. Add a new type which extends True_CSV to sniff for tab separations and get_meta correctly with tabs.

===
I have code that works for True_CSV and the new TSV type if that is the best option.

Christian
University of Manchester

3b. Add one or more new types to handle tab separated files using pythons csv but informing python's csv reader of the new delimiter or dialect.


regards
Christian