Hi All, I noticed there is a bug when you read in tab separated files and leave them as type auto. These are then identified by https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/tabula... as "CSV" as the CSV type uses the python module "csv" which can read tab separated files. Fine so far EXCEPT that CSV's set_meta method does not read columns correctly if tab separated. def set_meta( self, dataset, **kwd ): ... reader = csv.reader(csvfile) #line 920 The default delimiter for ythins csv module is comma so a tab separated file file will have only 1 column. As a result especially in Planemo parameters of type type="data_column" will not work as the systems thinks there is only one column in the data. == The CSV data type needs to be fixed or to protect backward compatibility replaced. There are then several options for comma separated files. 1. Use python csv's sniff method to detect the delimiter in set_meta. This will result in a slow down and effect backward compatability. 2. Make CSV handle only comma separated files. Improve the def sniff( self, filename ): method (line 907) to make sure it is comma separated. There are various clean ways of doing this. 3. Create a new True_CSV type that sniffs only comma separated files but leave the old one for backward compatibility. For tab separated files 1. Above works here too 4 Then allow the default tabular to handle tab separated files. 5. Add a new type which extends True_CSV to sniff for tab separations and get_meta correctly with tabs. === I have code that works for True_CSV and the new TSV type if that is the best option. Christian University of Manchester 3b. Add one or more new types to handle tab separated files using pythons csv but informing python's csv reader of the new delimiter or dialect. regards Christian
Hi Christian, I think the "cvs" datatype sniffer should be fixed not to accept tab separated files, to me a clear false positive given Galaxy has a separate "tabular" format for "tsv" files. Also surprisingly the "tabular" datatype does not seem to have a sniff method at all: https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/tabula... If those are fixed, then the order of sniffing ("csv" vs "tabular") defined in datatypes_conf.xml should not matter. Peter On Tue, Nov 17, 2015 at 10:56 AM, Christian Brenninkmeijer <christian.brenninkmeijer@manchester.ac.uk> wrote:
Hi All,
I noticed there is a bug when you read in tab separated files and leave them as type auto.
These are then identified by https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/datatypes/tabula... as "CSV" as the CSV type uses the python module "csv" which can read tab separated files.
Fine so far EXCEPT that CSV's set_meta method does not read columns correctly if tab separated. def set_meta( self, dataset, **kwd ): ... reader = csv.reader(csvfile) #line 920
The default delimiter for ythins csv module is comma so a tab separated file file will have only 1 column.
As a result especially in Planemo parameters of type type="data_column" will not work as the systems thinks there is only one column in the data.
==
The CSV data type needs to be fixed or to protect backward compatibility replaced.
There are then several options for comma separated files.
1. Use python csv's sniff method to detect the delimiter in set_meta. This will result in a slow down and effect backward compatability.
2. Make CSV handle only comma separated files. Improve the def sniff( self, filename ): method (line 907) to make sure it is comma separated. There are various clean ways of doing this.
3. Create a new True_CSV type that sniffs only comma separated files but leave the old one for backward compatibility.
For tab separated files
1. Above works here too
4 Then allow the default tabular to handle tab separated files.
5. Add a new type which extends True_CSV to sniff for tab separations and get_meta correctly with tabs.
=== I have code that works for True_CSV and the new TSV type if that is the best option.
Christian University of Manchester
3b. Add one or more new types to handle tab separated files using pythons csv but informing python's csv reader of the new delimiter or dialect.
regards Christian
___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/
To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
participants (2)
-
Christian Brenninkmeijer
-
Peter Cock