RE: i18n Polyglot Markup/NCRs (7th issue) from Leif Halvard Silli on 2010-07-22 (public-html@w3.org from July 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 23 Jul 2010 01:33:57 +0300
To: "Phillips, Addison" <addison@lab126.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, public-html <public-html@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <20100723013357797746.0e75e580@xn--mlform-iua.no>

Phillips, Addison, Mon, 19 Jul 2010 12:01:48 -0400:
> (personal response)
> 
>>> First of all, my comment was to Richard, who suggested that
>>> POlyglot markup should "favor" hexadecimal NCRs.
>> 
>> I think neither decimal nor hexadecimal can be preferred over the
>> other on polyglot grounds, so the publication shouldn't prefer one
>> over the other.
> 
> Polyglot itself must, of course, support both decimal and hex NCRs. 
> The comment was on specific text in the document that used a decimal 
> NCR instead of a hex NCR. It's a editorial comment, but it would be 
> best to make the change, in my opinion. If the W3C just ignores its 
> own advice in writing documents, why would document authors pay 
> attention to it? Note well: our WG's comment is not saying that 
> polyglot should favor one form over the other normatively. Only that 
> the examples should use hex instead of decimal (unless necessary to 
> the example).

It would not be bad to show both a hex and a dec example, would it? 
Authors have different preferences w.r.t. to NCRs. E.g. I have learned 
the dec NCR for 'å' long ago. But I have not learned the hex value yet 
... I think the text in question is sufficiently general so that both 
NCR forms should be mentioned.

I would suggest this text, were new text is _underlined_

 ]] For entities beyond the previous list, _polyglot markup_ uses 
_numeric_ character references (NCRs). For example, polyglot markup 
uses _&#xA0; (or the decimal NCR equivalent &#160;)_ instead of 
&nbsp;. [[

>>> A possible answer to your question is found in Sam's messages
>>> [1][2].
>>> He suggest only to allow UTF-8 as encoding of polyglot markup.
>> 
>> That steps outside logical inferences from specs to determine
>> what's polyglot. The logical inferences lead to a conclusion that
>> polyglot documents can be constructed using UTF-8 and using UTF-16.
>> 
>> There are other reasons to prefer UTF-8 over UTF-16, but
>> polyglotness isn't one of them, so the WG shouldn't pretend that it
>> is.
> 
> I agree. Polyglot supports both encoding forms and so it really must 
> treat them somewhat equally. The choice of which encoding to use and 
> the reasons to prefer one or the other lie elsewhere. 

Inferring from HTML5, one must conclude that Polyglot Markup prefers 
UTF-8 over UTF-16. See my preceding message.

[…]
-- 
leif halvard silli

Received on Friday, 23 July 2010 14:38:34 UTC