- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 24 May 2004 14:35:54 +0900
- To: "Michael Kay" <mhk@mhk.me.uk>, "'Henry Zongaro'" <zongaro@ca.ibm.com>, <w3c-i18n-ig@w3.org>
- Cc: <public-qt-comments@w3.org>
Hello Michael, At 11:04 04/05/21 +0100, Michael Kay wrote: >Thanks. There's no easy right answer on this one. It's similar to the >question of whether products should accept "c:\a\b.xml" in places where a >URI is required. Some products allow it. I've resisted, and report it as an >error. When users find that it works on one product and doesn't work on >mine, it's me they complain to. I tell them they are wrong and they should >read the specs, but I can afford to do that because they aren't (at present) >paying customers. I think rather than waiting for the customers to complain, the best solution may be to produce an instructing error message. In this case, such a message is quite easy to produce. In this way, you can tell them, without having to write emails individually. For example, an error message could read as follows: line x, character y: Illegal C1 codepoint in HTML output. (Hint: This is most probably due to a problem in the input, in particular for example due to input declared to be in the iso-8859-1 character encoding that is actually in the windows-1252 character encoding.) >I would be happy with the stricter rule if we had imposed it from the start. >I'm not happy with the idea that version 2 should be stricter than version >1. That's in good measure because, for the time being, people's first >exposure to XSLT 2.0 is through my product, and when they get compatibility >or usability problems, they report it to me as "a Saxon bug". I have separately complained about the number of XSLT 1.0/2.0 compatibility issues, so I'm definitely not unsympathetic to this point. But looking at the various patterns of compatibility issues, this one is really harmless: The XSLT fails with a very clear error message and a very clear fix. Although I haven't done an in-depth analysis (I have suggested such a thing), I strongly suspect that many other incompatibilities are of a much more dangerous nature: The XSLT still works, but the output is a little different in some cases, which may be detected sooner or later, or too late. >In addition, the XSLT spec has always been pragmatic about the reality of >HTML interoperability. Well, if it were only HTML interoperability, that may be another issue. But the fact is that we know that the XML input is garbage. And I don't think XSLT should be lenitent with garbage XML input in cases where it is easy to detect that it's garbage. >If the spec wasn't pragmatic in this way, then I >think XSLT implementors would have to be pragmatic, and the weaknesses of >HTML conformance would spill over into weaknesses in XSLT conformance. I'm sure there will be a test suite for XSLT 2.0. Adding the right tests would probably go a long way, and wouldn't be very difficult. (I'd be happy to produce some.) >There >are many ways that we allow XSLT stylesheets to generate non-conformant >HTML, and I don't see that this one is particularly different from the >others. Could you point to a list of these, or list (some of) them here? Regards, Martin. >Most areas where we have tried to be strict about what we generate >(for example, in URI escaping) have led to practical problems for users. > >Michael Kay > > > > -----Original Message----- > > From: Martin Duerst [mailto:duerst@w3.org] > > Sent: 21 May 2004 08:09 > > To: Michael Kay; 'Henry Zongaro'; w3c-i18n-ig@w3.org > > Cc: public-qt-comments@w3.org > > Subject: RE: [Serial] I18N WG last call comments > > > > Hello Michael, > > > > The I18N WG (Core TF) has discussed your mail, and has asked > > me to reply. I'm sorry for the delay. > > > > At 17:52 04/05/06 +0100, Michael Kay wrote: > > > > > > > > I worry that we will get many complaints from users who > > > > are misusing > > > > > > these codepoints if we do this. > > > > > > > > How are they misusing these code points? The case we know is that > > > > bytes in the rage 0x80-0x9F are used e.g. in iso-8859-1 but with > > > > the intent of giving them the windows-1252 semantics. > > > > > >This was the case I had in mind. People create documents in > > cp1252 and > > >declare them as iso-8859-1. And it all works, because the > > errors cancel each > > >other out. If we oblige processors to detect this situation > > we will be > > >asking users to pay for the extra processing cost, and in return the > > >application that worked before will stop working. Will they thank us? > > >Because if they won't, we shouldn't do it. > > > > Some users will be very thankful, others won't. The users that will > > be thankful will be those that care about data integrity and > > interoperability > > worldwide and in the long term. They will be able to fix a problem > > in their data that they otherwise might not have found. As a result, > > they will not only produce correct, valid output, but will also > > make sure that their input data will work well in other circumstances, > > such as searching, sorting, and any kind of other processing. Not the > > least, with the introduction of XML 1.1, there are also such issues > > as the confusion betwen NEL and the three-dot elipsis. > > > > There was a time when the mentality on the Web was 'everything goes', > > which lead to the slippery slope of bugwards compatibility. We have > > learned, with great pain, that this is a dead end, and we don't want > > to go there anymore. XML is the clearest example of how this can be > > done better. And I sincerely hope that XSLT will not be tempted to > > go down the bugwards compatibility slope. > > > > The C1 area is forbidden in HTML exactly because it is a very easy > > and cheap way to help people check and (if necessary) clean up their > > data. RFC 2070 (http://www.ietf.org/rfc/rfc2070.txt) was written > > almost 10 years ago. That C1 is allowed in XML is, according to > > James Clark, an oversight. XML 1.1 has corrected it. > > > > > > > > In some way just a detail, but: There is currently no XSLT 2.0 > > > > code that will stop working. XSTL 1.0 doesn't have the XHTML > > > > output method. > > > > > >I may have lost the thread, but I thought we were discussing > > the HTML output > > >method? > > > > Okay, sorry. There is still no XSLT 2.0 code that will stop working, > > even for the HTML output method. And because the XHTML output > > method is supposed to work according to the compatibility guidelines, > > it of course also should forbid producing C1 character output. > > > > Regards, Martin. > > > > > > > > > [20] 6.4 HTML Output Method: Writing Character Data: "Certain > > > >characters, > > > > > >Michael Kay > > > >
Received on Monday, 24 May 2004 01:40:51 UTC