I think the "cvs" datatype sniffer should be fixed not to accept tab
separated files, to me a clear false positive given Galaxy has a
separate "tabular" format for "tsv" files.
Also surprisingly the "tabular" datatype does not seem to have a sniff
method at all:
If those are fixed, then the order of sniffing ("csv" vs "tabular")
defined in datatypes_conf.xml should not matter.
On Tue, Nov 17, 2015 at 10:56 AM, Christian Brenninkmeijer
I noticed there is a bug when you read in tab separated files and leave them
as type auto.
These are then identified by
as "CSV" as the CSV type uses the python module "csv" which can read
Fine so far EXCEPT that CSV's set_meta method does not read columns
correctly if tab separated.
def set_meta( self, dataset, **kwd ):
reader = csv.reader(csvfile) #line 920
The default delimiter for ythins csv module is comma so a tab separated file
file will have only 1 column.
As a result especially in Planemo parameters of type type="data_column" will
not work as the systems thinks there is only one column in the data.
The CSV data type needs to be fixed or to protect backward compatibility
There are then several options for comma separated files.
1. Use python csv's sniff method to detect the delimiter in set_meta.
This will result in a slow down and effect backward compatability.
2. Make CSV handle only comma separated files.
Improve the def sniff( self, filename ): method (line 907) to make sure it
is comma separated.
There are various clean ways of doing this.
3. Create a new True_CSV type that sniffs only comma separated files but
leave the old one for backward compatibility.
For tab separated files
1. Above works here too
4 Then allow the default tabular to handle tab separated files.
5. Add a new type which extends True_CSV to sniff for tab separations and
get_meta correctly with tabs.
I have code that works for True_CSV and the new TSV type if that is the best
University of Manchester
3b. Add one or more new types to handle tab separated files using pythons
csv but informing python's csv reader of the new delimiter or dialect.
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: