Hello,

When i try to upload bam files into galaxy data library though system paths with "link to files without copying into galaxy" option, galaxy always complains that

"The uploaded files need grooming, so change your Copy data into Galaxy? selection to be Copy files into Galaxy instead of Link to files without copying into Galaxy so grooming can be performed."

no matter if the bam files are sorted or not.

in galaxy-dist/tools/data_source/upload.py:

285: if dataset.type in ( 'server_dir', 'path_paste' ) and link_data_only == 'link_to_files':
286:         # Never alter a file that will not be copied to Galaxy's local file store.
287:        if datatype.dataset_content_needs_grooming( output_path ):
288:             err_msg = 'The uploaded files need grooming, so change your <b>Copy data into Galaxy?</b> selection to be ' + \
289:                 '<b>Copy files into Galaxy</b> instead of <b>Link to files without copying into Galaxy</b> so grooming can be performed.'

The 'output_path' is alwasy None when using 'link_to_files', so the grooming check always returns False.

I think we need to check the original bam file instead at line 287 like:

287: if datatype.dataset_content_needs_grooming( databset.path ):

And the following code also needs to change:

314: if datatype.dataset_content_needs_grooming( output_path ):
315:         # Groom the dataset content if necessary
316:         datatype.groom_dataset_content( output_path )

A quick dirty fix would be:

314: if not (dataset.type in ( 'server_dir', 'path_paste' ) and link_data_only == 'link_to_files') and datatype.dataset_content_needs_grooming( output_path ):

Please advise,

--

Zhibin Lu

Bioinformatics Support


Ontario Institute for Cancer Research           

MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3

 

Tel: 647-260-7944

Toll-free: 1-866-678-6427

www.oicr.on.ca