Hi all,
I recently tried uploading an couple of xml files to my
local Galaxy installation using the standard ‘Upload File (version 1.1.3)’
tool. For some files this produced the error: The uploaded file contains
inappropriate HTML content.
Given the files had been created by the same automated
code and contained the same tags etc, I couldn't understand why one would
produce this error and the other not.
Finally tracking down the function check_html() in
galaxy-dist/lib/galaxy/datatypes/checkers.py, I discovered that my use of the
tag <metabolite> had flagged up as a likely <META > tag and
produced the warning/failed upload.
The reason this did not happen in every case is that
check_html only reads the first 100 lines of the file and depending upon how
many samples were in my dataset, my <metabolite> tag could appear before
or after this cutoff.
I've solved the problem by changing my xml tag names but
my question is:
a) why does check_html only read up to line 100?
b) would it be possible to change the regular expressions
in check_html so that e.g. <meta ...> would be found but e.g. <metaxxx
...> would not?
Thanks for reading.
Rob
Dr Robert L Davidson
NERC Metabolomics Facility
School of Biosciences
University of Birmingham
Edgbaston, UK