W3C home > Mailing lists > Public > www-i18n-comments@w3.org > May 2003

Re: Your comments on the Character Model [C130, C131]

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Fri, 9 May 2003 13:37:01 +0200
Message-ID: <003f01c3161f$4f4d7760$df13fea9@srx41p>
To: "Francois Yergeau" <FYergeau@alis.com>
Cc: <www-i18n-comments@w3.org>, <w3c-html-wg@w3.org>, <ishida@w3.org>

From: "Francois Yergeau" <FYergeau@alis.com>

> > With regards C132:
> > "Rationale: Steven's example is too HTML-specific, and doesn't match
> > with what we say, namely that transcoders don't resolve NCRs."
> >
> > What is HTML-specific about the example?
>
> It uses HTML (or XML) NCR syntax.

Well, it's not HTML specific then, is it?

> > My example *does* match what you say, because the first
> > character gets transcoded (from 0xf5 to 0x0151), and the NCR
> > doesn't get transcoded so that the user agent eventually
> > gets two 0x0151 characters.
>
> There's the rub: "the user agent eventually gets..."  There is not
> necessarily a user agent involved in transcoding.  If there is one, it
> eventually gets the second 0x0151 not by transcoding, but by interpreting
> the NCR according to rules of the document language at hand (HTML or
> XML).

Exactly! That's the whole point. The problem in my experience is that people
don't understand whether they are encoding characters in an NCR in Unicode
or the transmission character set. In fact, it's even worse: some think that
if you indicate an encoding that the UA processes the document in that
encoding. The very day that I sent my example in, I had had someone in my
office asking me to explain it, and the example I sent was the one I had
used to explain.

> We felt it would be confusing to say "Transcoders ... do not deal with
> character escapes such as numeric character references ..." (first para of
> 3.3) and in the next breath show an example where an NCR *does* get
> resolved.

But the NCR doesn't get resolved in the example! Of course it eventually
gets resolved, and users want to know what to. If you say that the UA gets
two 0x0151 characters it is obvious it hasn't been transcoded.

> Oops!  Edit tracking problem here.  The examples (one simple and one
> complex) are now done and should be online shortly.

I look forward to seeing the resulting example.

Best wishes,

Steven Pemberton
Received on Friday, 9 May 2003 07:37:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 October 2009 08:32:33 GMT