RE: [Serial] I18N WG last call comments from Martin Duerst on 2004-05-21 (public-qt-comments@w3.org from May 2004)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 21 May 2004 17:08:52 +0900
To: "Michael Kay" <mhk@mhk.me.uk>, "'Henry Zongaro'" <zongaro@ca.ibm.com>, <w3c-i18n-ig@w3.org>
Cc: <public-qt-comments@w3.org>
Message-Id: <4.2.0.58.J.20040521164559.0574be08@localhost>

Hello Michael,

The I18N WG (Core TF) has discussed your mail, and has asked
me to reply. I'm sorry for the delay.

At 17:52 04/05/06 +0100, Michael Kay wrote:

> > > > I worry that we will get many complaints from users who
> > are misusing
> > > > these codepoints if we do this.
> >
> > How are they misusing these code points? The case we know is that
> > bytes in the rage 0x80-0x9F are used e.g. in iso-8859-1 but with
> > the intent of giving them the windows-1252 semantics.
>
>This was the case I had in mind. People create documents in cp1252 and
>declare them as iso-8859-1. And it all works, because the errors cancel each
>other out. If we oblige processors to detect this situation we will be
>asking users to pay for the extra processing cost, and in return the
>application that worked before will stop working. Will they thank us?
>Because if they won't, we shouldn't do it.

Some users will be very thankful, others won't. The users that will
be thankful will be those that care about data integrity and interoperability
worldwide and in the long term. They will be able to fix a problem
in their data that they otherwise might not have found. As a result,
they will not only produce correct, valid output, but will also
make sure that their input data will work well in other circumstances,
such as searching, sorting, and any kind of other processing. Not the
least, with the introduction of XML 1.1, there are also such issues
as the confusion betwen NEL and the three-dot elipsis.

There was a time when the mentality on the Web was 'everything goes',
which lead to the slippery slope of bugwards compatibility. We have
learned, with great pain, that this is a dead end, and we don't want
to go there anymore. XML is the clearest example of how this can be
done better. And I sincerely hope that XSLT will not be tempted to
go down the bugwards compatibility slope.

The C1 area is forbidden in HTML exactly because it is a very easy
and cheap way to help people check and (if necessary) clean up their
data. RFC 2070 (http://www.ietf.org/rfc/rfc2070.txt) was written
almost 10 years ago. That C1 is allowed in XML is, according to
James Clark, an oversight. XML 1.1 has corrected it.

> > In some way just a detail, but: There is currently no XSLT 2.0
> > code that will stop working. XSTL 1.0 doesn't have the XHTML
> > output method.
>
>I may have lost the thread, but I thought we were discussing the HTML output
>method?

Okay, sorry. There is still no XSLT 2.0 code that will stop working,
even for the HTML output method. And because the XHTML output
method is supposed to work according to the compatibility guidelines,
it of course also should forbid producing C1 character output.

Regards,    Martin.

> > > [20] 6.4 HTML Output Method: Writing Character Data: "Certain
> >characters,
>
>Michael Kay

Received on Friday, 21 May 2004 04:09:36 UTC