Re: Fallback to UTF-8 from Jukka K. Korpela on 2008-04-25 (www-validator@w3.org from April 2008)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 25 Apr 2008 11:14:36 +0300
To: <www-validator@w3.org>
Message-ID: <018301c8a6ac$67b574b0$0500000a@DOCENDO>

David Dorward wrote:

> The validator outputs both parts of the original source and it's own
> error messages. So whatever it outputs, it has to do so in a fashion
> compatible with the original document.

It cannot, since the original document is a sequence of octets with no 
defined meaning, and you are just _assuming_ some encoding or class of 
encodings.

In practice, the best shot is probably to violate specifications by not 
specifying any encoding for the result page in this case. This is 
compatible with the original document in the sense of not assigning any 
meaning to the octets. It also gives the user maximal flexibility in 
manually setting the encoding.

This is of course theoretically all wrong (as is the data), since even 
the octets used for markup and validator messages have no defined 
meaning then.

The alternatíve of using U+FFFD might be feasible, too.

> Would outputting entities for its own messages would solve that
> problem?

Using entity or character references for non-ASCII characters would be 
useful in practice in this scenario.

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Received on Friday, 25 April 2008 08:15:11 UTC