RE: [Serial] I18N WG last call comments from Michael Kay on 2004-05-21 (public-qt-comments@w3.org from May 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Fri, 21 May 2004 11:04:17 +0100
To: "'Martin Duerst'" <duerst@w3.org>, "'Henry Zongaro'" <zongaro@ca.ibm.com>, <w3c-i18n-ig@w3.org>
Cc: <public-qt-comments@w3.org>
Message-Id: <20040521100452.6D6BEA05F5@frink.w3.org>
Thanks. There's no easy right answer on this one. It's similar to the
question of whether products should accept "c:\a\b.xml" in places where a
URI is required. Some products allow it. I've resisted, and report it as an
error. When users find that it works on one product and doesn't work on
mine, it's me they complain to. I tell them they are wrong and they should
read the specs, but I can afford to do that because they aren't (at present)
paying customers.

I would be happy with the stricter rule if we had imposed it from the start.
I'm not happy with the idea that version 2 should be stricter than version
1. That's in good measure because, for the time being, people's first
exposure to XSLT 2.0 is through my product, and when they get compatibility
or usability problems, they report it to me as "a Saxon bug".

In addition, the XSLT spec has always been pragmatic about the reality of
HTML interoperability. If the spec wasn't pragmatic in this way, then I
think XSLT implementors would have to be pragmatic, and the weaknesses of
HTML conformance would spill over into weaknesses in XSLT conformance. There
are many ways that we allow XSLT stylesheets to generate non-conformant
HTML, and I don't see that this one is particularly different from the
others. Most areas where we have tried to be strict about what we generate
(for example, in URI escaping) have led to practical problems for users.

Michael Kay


> -----Original Message-----
> From: Martin Duerst [mailto:duerst@w3.org] 
> Sent: 21 May 2004 08:09
> To: Michael Kay; 'Henry Zongaro'; w3c-i18n-ig@w3.org
> Cc: public-qt-comments@w3.org
> Subject: RE: [Serial] I18N WG last call comments
> 
> Hello Michael,
> 
> The I18N WG (Core TF) has discussed your mail, and has asked
> me to reply. I'm sorry for the delay.
> 
> At 17:52 04/05/06 +0100, Michael Kay wrote:
> 
> > > > > I worry that we will get many complaints from users who
> > > are misusing
> > > > > these codepoints if we do this.
> > >
> > > How are they misusing these code points? The case we know is that
> > > bytes in the rage 0x80-0x9F are used e.g. in iso-8859-1 but with
> > > the intent of giving them the windows-1252 semantics.
> >
> >This was the case I had in mind. People create documents in 
> cp1252 and
> >declare them as iso-8859-1. And it all works, because the 
> errors cancel each
> >other out. If we oblige processors to detect this situation 
> we will be
> >asking users to pay for the extra processing cost, and in return the
> >application that worked before will stop working. Will they thank us?
> >Because if they won't, we shouldn't do it.
> 
> Some users will be very thankful, others won't. The users that will
> be thankful will be those that care about data integrity and 
> interoperability
> worldwide and in the long term. They will be able to fix a problem
> in their data that they otherwise might not have found. As a result,
> they will not only produce correct, valid output, but will also
> make sure that their input data will work well in other circumstances,
> such as searching, sorting, and any kind of other processing. Not the
> least, with the introduction of XML 1.1, there are also such issues
> as the confusion betwen NEL and the three-dot elipsis.
> 
> There was a time when the mentality on the Web was 'everything goes',
> which lead to the slippery slope of bugwards compatibility. We have
> learned, with great pain, that this is a dead end, and we don't want
> to go there anymore. XML is the clearest example of how this can be
> done better. And I sincerely hope that XSLT will not be tempted to
> go down the bugwards compatibility slope.
> 
> The C1 area is forbidden in HTML exactly because it is a very easy
> and cheap way to help people check and (if necessary) clean up their
> data. RFC 2070 (http://www.ietf.org/rfc/rfc2070.txt) was written
> almost 10 years ago. That C1 is allowed in XML is, according to
> James Clark, an oversight. XML 1.1 has corrected it.
> 
> 
> > > In some way just a detail, but: There is currently no XSLT 2.0
> > > code that will stop working. XSTL 1.0 doesn't have the XHTML
> > > output method.
> >
> >I may have lost the thread, but I thought we were discussing 
> the HTML output
> >method?
> 
> Okay, sorry. There is still no XSLT 2.0 code that will stop working,
> even for the HTML output method. And because the XHTML output
> method is supposed to work according to the compatibility guidelines,
> it of course also should forbid producing C1 character output.
> 
> Regards,    Martin.
> 
> 
> > > > [20] 6.4 HTML Output Method: Writing Character Data: "Certain
> > >characters,
> >
> >Michael Kay
> 
>
Received on Friday, 21 May 2004 06:04:52 UTC