[Bug 15195] apparently incorrect note about violation of Unicode wrt stripping leading BOM

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15195

Glenn Adams <glenn@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |

--- Comment #4 from Glenn Adams <glenn@skynav.com> 2012-01-29 05:19:59 UTC ---

> Rationale: The "systems" in [3] effectively means character encodings. When
> using an explicit UTF-16LE, you're not allowed to use a BOM, so a leading BOM
> isn't a BOM, it's part of the text stream. We strip it anyway.

Are you saying that because HTML5 8.2.2.1 step (2) allows and may make use of a
transport specified encoding, and because that encoding may be UTF-16LE, that
this makes HTML5 a system "using an explicit UTF-16LE"?

And that, consequently, the language (in [3])

"Where the byte order is explicitly specified, such as in UTF-16BE or UTF-16LE,
then all U+FEFF characters — even at the very beginning of the text — are to be
interpreted as zero width no-break spaces."

is violated in the case that HTML5 8.2.2.3 requires an initial U+FEFF to be
ignored (as if it were a BOM rather than a ZWNBSP)?

If this is the case, then I believe my comment can be positively resolved by
merely adding the following sentence to the end of the Note in question:

<quote>
See [UNICODE] Section 16.8, which specifies that an initial U+FEFF be
interpreted as zero width no-break space "where the byte order is explicitly
specified".
</quote>

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 29 January 2012 05:20:04 UTC