Re: Advice sought on (markup) validator's "Help and FAQ" document

(I first replied to this using the web form, but it apparently was rejected so I try again.)

Olivier Thereaux <ot@w3.org> wrote:

>Since we are soon going to start an alpha/beta test for the
>next version of the Markup Validator, I think now is a good time for a
>quick review of the validator's doc, and its cornerstone, the Help&FAQ.
>

For the Help&FAQ I would wish to see more helpful, honest and accurate explanations of why the x-mac encodings aren't supported by the Validator and about what to do when one uses such encodings. 

I don't know if the exactt messages that the Valdiator gives lays outside the Help&FAQ issues that you asked about, but I want to comment on what responses the Validator gives when one tries to validate pages that use x-mac encodings. According to the XML specs all encodings except the IANA registered ones, should begin with 'x-'.  XML processors are not required to know x- encodings, it may lable them as unknown. It may even lable all the IANA registered ones as unknown (the fact that they are «OFFICIAL» as Validator names them, is irrelevant to the XML-spec) except those encodings that all XML processors MUST know (utf-8 and utf-16). Yet Validator seems to recognize all of the IANA charsets, even such seldomly used charsets names as 'macintosh'. ('Macintosh' is used in TeX and such places but very seldomly in the WWW.)

Therefore I find the way the Validator reacts when one supplies x-mac encodings as misleading and unhelpful: «Sorry! A fatal error occurred [...] detected character encoding was "x-mac-roman". [...] The error was "x-mac-roman undefined; replace by macintosh".» (If you use another x-mac encoiding than x-mac-roman, for instance x-mac-cyrillic, the response is basically the same, except that explanation in the line which begin «The error was ...» is empty.) The recommendation to «replace with 'macintosh'» may be a good advice for getting something through the validator --a hint about using Character Encoding Override would have been fitting then-- but using 'macintosh' online  is an advice I am very deeply in doubt about. Using x-mac-roman would then be a better advice, IMHO, because even to use 'macintosh' would not make the page compatible with all (or any?) e.g. Windows parsers, perhaps not even with all mac user agents. One uses the x-mac encodings when for some reason one only care baout mac user agents. 

Furthermore the explanatins gives the impression --or could be interpreted that way-- that because one has used an x-mac encoding, the  document must be invalid. It would have been more honest and helpful to say that the document may indeed be fully valid, according to the XML spec, but that the validator has chosen to only support the IANA registred charsets.

It is likely that most if not all x-mac encoded documents, including e.g. x-mac-cyrillic ones, can be validated with Character Encoding Override using 'macintosh' as charset. To inform about this would therefore have a very meaningful response.

A simplification, and now I am for sure not talking about Help&FAQ anymore, would be if the Validator automatically halted with a warning when someone used x-mac-roman and offered a button to with Character Encoding Override as 'macintosh' instead.
-- 
leif halvard silli,
oslo

Received on Saturday, 23 April 2005 18:57:58 UTC