W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2011

[Bug 12062] UTF-8 BOM should not be forbidden in Polyglot Markup

From: <bugzilla@jessica.w3.org>
Date: Thu, 03 Mar 2011 01:48:44 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Puxem-00049L-5z@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062

Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |

--- Comment #2 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-03-03 01:48:42 UTC ---
(In reply to comment #1) Some nitty gritties. 

FIRSTLY: In the *below* quote, inside the parenthesis, please change the
upper-case "I"  (If used ...") to lowercase:

]]
> Polyglot markup declares character encoding in the following ways, which may be
> used separately or in combination (If used in combination, each approach
> contains identical encoding information): 
[[

SECONDLY, in the *below* quote, you changed my  suggested wording from " XML
UTF-8 encoding default" to "default XML UTF-8 encoding". I don't know if it is
really correct to say that UTF-8 is  _the_ default encoding of XML? My
intention was to say that UTF-8 is _an_ encoding default - one of two - the
other being UTF-16. (In my proposal, "XML" is an adjective - think "XML-ish",
or "of XML" or "XML's".)

>     &#9702;By relying on the default XML UTF-8 encoding in combination with the
> use of the <meta charset="UTF-8"/> element.

It is true that XML says that if a document *DOES NOT*  have an encoding
declaration (internal or external) and also does not have an encoding signature
(aka BOM), then the document *MUST* be in the UTF-8 encoding - see section
'4.3.3 Character Encoding in Entities' of XML 1.0.) From that angle it seems
correct that UTF-8 has something to do with "the default"

* But I still think that my wording was better. Feel free to go back and use
it.
* Or else, I suggest to use the follwing formulation instead, where I use the
word 'autodetection':

]] By relying on XML's autodection of the UTF-8 encoding, in combination with
the HTML <meta charset="UTF-8"/> encoding declaration. [[

Justification for the 'autodetection' variant: 

* XML 1.0 has an entire section about 'autodetection of character encodings':
http://www.w3.org/TR/REC-xml/#sec-guessing
* in contrast, the word 'default' only occurs once in relation to encoding:
http://www.w3.org/TR/REC-xml/#charencoding
* the autodetection section relates to 'default' by the 'other' option in the
"Without a Byte Order Mark" table. Quot:
   ]] Other       UTF-8 without an encoding declaration, or else the data
stream is mislabeled
      (lacking a required encoding declaration), corrupt, fragmentary, or
enclosed in a wrapper 
     of some kind [[

Sidenote: In Norwegian, 'default' and 'automatic' are often synonyms.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 3 March 2011 01:48:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 3 March 2011 01:48:48 GMT