- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Tue, 7 Jun 2011 17:43:47 +0200
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-international <www-international@w3.org>
Bjoern Hoehrmann, Tue, 07 Jun 2011 16:56:29 +0200: > * Leif Halvard Silli wrote: >> Bjoern Hoehrmann, Tue, 07 Jun 2011 06:39:34 +0200: >>> Higher-level information overrides lower-level information, explicit >>> information overrides fallbacks, and user agents should do what their >>> users want them to do. So, HTTP-level Content-Type overrides document- >>> internal information, a BOM overrides user-chosen fallbacks, and user- >>> chosen overrides trump anything else. >> >> You portray the BOM as "fallback". It actuallly is an encoding >> signature. > > If you think I wrote something that is inconsistent with facts, then > maybe you misread what I wrote? I did not, and did not mean to, por- > tray a Unicode signature as a fallback in the sense I used the word. I meant "fallback declaration" or "backup declaration, in case HTTP has no declaration". > I meant fallback in the sense of a "If page lacks encoding declaration > assume it's $encoding encoded" setting, as opposed to a "Whatever the > page says it's encoded in, use $encoding to decode" setting. Your priority map was easy to parse. So, in truth, what I reacted to was only the wording. >> "Looks like a BOM". Looks like or are exactly those bytes? Can you >> describe a use case? When and how can an XML document/entity legally >> start with the BOM if it is not meant to be interpreted as the BOM? > > Looks like as opposed to "defined as". > > Content-Type: application/xml-external-parsed-entity;charset=l1 > > 0xFE 0xFF > > That's a properly formed external parsed entity containing LATIN SMALL > LETTER THORN and LATIN SMALL LETTER Y WITH DIAERESIS. If you ignore the > charset parameter, the bytes may look like a Unicode signature, but the > bytes are not a Unicode signature because they are not defined as such. So what could that seemingly narrow case lead to? Firstly, since the external entity is not UTF-8 or UTF-16 encoded, there is no guarantee that the parser will handle it. Thus a browser could not really be said to be breaking the XML spec if it was unable to handle such a thing properly. Otherwise, the parser could let the user override a setting so that the parser could hanlde it. Meanwhile, the external parsed entitied SHOULD itself begin with the text declaration, which in turn ought to tell the encoding. In that case, the external entitity did not need to be served with encoding information. So there are serveral limitations and recomendations that have to be broken before that use case could be a real use case. Btw, since this external parsed entity begins with those two characters rather than with U+FEFF, then XML 1.0 does not require that there is a BOM in the "current" (as opposed to in the "external parsed") XML file. In that regard, it is interesting to note that RFC 3023 is from 2001 and doesn't discuss the UTF-8 BOM. http://tools.ietf.org/html/rfc3023#page-15 -- Leif Halvard Silli
Received on Tuesday, 7 June 2011 15:44:20 UTC