RE: [Serial] I18N WG last call comments from Martin Duerst on 2004-05-24 (public-qt-comments@w3.org from May 2004)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 24 May 2004 14:35:54 +0900
To: "Michael Kay" <mhk@mhk.me.uk>, "'Henry Zongaro'" <zongaro@ca.ibm.com>, <w3c-i18n-ig@w3.org>
Cc: <public-qt-comments@w3.org>
Message-Id: <4.2.0.58.J.20040524142147.0747cc30@localhost>
Hello Michael,

At 11:04 04/05/21 +0100, Michael Kay wrote:

>Thanks. There's no easy right answer on this one. It's similar to the
>question of whether products should accept "c:\a\b.xml" in places where a
>URI is required. Some products allow it. I've resisted, and report it as an
>error. When users find that it works on one product and doesn't work on
>mine, it's me they complain to. I tell them they are wrong and they should
>read the specs, but I can afford to do that because they aren't (at present)
>paying customers.

I think rather than waiting for the customers to complain, the best
solution may be to produce an instructing error message. In this case,
such a message is quite easy to produce. In this way, you can tell them,
without having to write emails individually.

For example, an error message could read as follows:

line x, character y: Illegal C1 codepoint in HTML output.
(Hint: This is most probably due to a problem in the input, in
particular for example due to input declared to be in the iso-8859-1
character encoding  that is actually in the windows-1252 character
encoding.)


>I would be happy with the stricter rule if we had imposed it from the start.
>I'm not happy with the idea that version 2 should be stricter than version
>1. That's in good measure because, for the time being, people's first
>exposure to XSLT 2.0 is through my product, and when they get compatibility
>or usability problems, they report it to me as "a Saxon bug".

I have separately complained about the number of XSLT 1.0/2.0 compatibility
issues, so I'm definitely not unsympathetic to this point. But looking
at the various patterns of compatibility issues, this one is really
harmless: The XSLT fails with a very clear error message and a very
clear fix. Although I haven't done an in-depth analysis (I have suggested
such a thing), I strongly suspect that many other incompatibilities
are of a much more dangerous nature: The XSLT still works, but the
output is a little different in some cases, which may be detected
sooner or later, or too late.


>In addition, the XSLT spec has always been pragmatic about the reality of
>HTML interoperability.

Well, if it were only HTML interoperability, that may be another issue.
But the fact is that we know that the XML input is garbage. And I don't
think XSLT should be lenitent with garbage XML input in cases where it
is easy to detect that it's garbage.


>If the spec wasn't pragmatic in this way, then I
>think XSLT implementors would have to be pragmatic, and the weaknesses of
>HTML conformance would spill over into weaknesses in XSLT conformance.

I'm sure there will be a test suite for XSLT 2.0. Adding the right
tests would probably go a long way, and wouldn't be very difficult.
(I'd be happy to produce some.)


>There
>are many ways that we allow XSLT stylesheets to generate non-conformant
>HTML, and I don't see that this one is particularly different from the
>others.

Could you point to a list of these, or list (some of) them here?

Regards,     Martin.


>Most areas where we have tried to be strict about what we generate
>(for example, in URI escaping) have led to practical problems for users.
>
>Michael Kay
>
>
> > -----Original Message-----
> > From: Martin Duerst [mailto:duerst@w3.org]
> > Sent: 21 May 2004 08:09
> > To: Michael Kay; 'Henry Zongaro'; w3c-i18n-ig@w3.org
> > Cc: public-qt-comments@w3.org
> > Subject: RE: [Serial] I18N WG last call comments
> >
> > Hello Michael,
> >
> > The I18N WG (Core TF) has discussed your mail, and has asked
> > me to reply. I'm sorry for the delay.
> >
> > At 17:52 04/05/06 +0100, Michael Kay wrote:
> >
> > > > > > I worry that we will get many complaints from users who
> > > > are misusing
> > > > > > these codepoints if we do this.
> > > >
> > > > How are they misusing these code points? The case we know is that
> > > > bytes in the rage 0x80-0x9F are used e.g. in iso-8859-1 but with
> > > > the intent of giving them the windows-1252 semantics.
> > >
> > >This was the case I had in mind. People create documents in
> > cp1252 and
> > >declare them as iso-8859-1. And it all works, because the
> > errors cancel each
> > >other out. If we oblige processors to detect this situation
> > we will be
> > >asking users to pay for the extra processing cost, and in return the
> > >application that worked before will stop working. Will they thank us?
> > >Because if they won't, we shouldn't do it.
> >
> > Some users will be very thankful, others won't. The users that will
> > be thankful will be those that care about data integrity and
> > interoperability
> > worldwide and in the long term. They will be able to fix a problem
> > in their data that they otherwise might not have found. As a result,
> > they will not only produce correct, valid output, but will also
> > make sure that their input data will work well in other circumstances,
> > such as searching, sorting, and any kind of other processing. Not the
> > least, with the introduction of XML 1.1, there are also such issues
> > as the confusion betwen NEL and the three-dot elipsis.
> >
> > There was a time when the mentality on the Web was 'everything goes',
> > which lead to the slippery slope of bugwards compatibility. We have
> > learned, with great pain, that this is a dead end, and we don't want
> > to go there anymore. XML is the clearest example of how this can be
> > done better. And I sincerely hope that XSLT will not be tempted to
> > go down the bugwards compatibility slope.
> >
> > The C1 area is forbidden in HTML exactly because it is a very easy
> > and cheap way to help people check and (if necessary) clean up their
> > data. RFC 2070 (http://www.ietf.org/rfc/rfc2070.txt) was written
> > almost 10 years ago. That C1 is allowed in XML is, according to
> > James Clark, an oversight. XML 1.1 has corrected it.
> >
> >
> > > > In some way just a detail, but: There is currently no XSLT 2.0
> > > > code that will stop working. XSTL 1.0 doesn't have the XHTML
> > > > output method.
> > >
> > >I may have lost the thread, but I thought we were discussing
> > the HTML output
> > >method?
> >
> > Okay, sorry. There is still no XSLT 2.0 code that will stop working,
> > even for the HTML output method. And because the XHTML output
> > method is supposed to work according to the compatibility guidelines,
> > it of course also should forbid producing C1 character output.
> >
> > Regards,    Martin.
> >
> >
> > > > > [20] 6.4 HTML Output Method: Writing Character Data: "Certain
> > > >characters,
> > >
> > >Michael Kay
> >
> >
Received on Monday, 24 May 2004 01:40:51 UTC