Normalization vs. encoding layers from Bjoern Hoehrmann on 2001-08-14 (www-i18n-comments@w3.org from August 2001)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 14 Aug 2001 12:13:53 +0200
To: www-i18n-comments@w3.org
Message-ID: <44thnt4d6ipuduij56rmb476mr0n95gq2q@4ax.com>

Hi,

   http://www.w3.org/TR/2001/WD-charmod-20010126 currently doesn't
mention the case of an abitrary text with mixed data formats and
therfore mixed escaping mechanisms. For example an XHTML document
like

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
      "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
      <title></title>
  
      <style type='text/css'>

        /* all Björn elements with blue color                   */
        /* of course there aren't any 'Björn' elements in XHTML */

        Bjo\000308rn::after { color: blue }
      </style>
  
    </head>
  
    <body><p>Bjo&#x308;rn</p></body>
  </html>
  
Normalizing the XHTML part to 'Björn'/'Bj&ouml;rn'/'Bj&#xf6rn'/etc.
won't make this document W3C-normalized, since it contains an escape
that would, on unescaping, cause the data to become no longer
Unicode-normalized, at least currently. I think it isn't a good idea to
require applications to deal with such multiple encoding layers,
otherwise applications had to consider all possibly included data and
there encoding mechanisms. Take an XHTML editor for example that doesn't
know anything about CSS. Should it just in order to insure that the
output is properly normalized? This wouldn't be feasable for most
applications. I suggest to add a note, that applications only have to
deal with escape mechanisms of the top-most encoding layer (XHTML in my
example).

regards,
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/

Received on Tuesday, 14 August 2001 06:15:02 UTC