- From: C M Sperberg-McQueen <cmsmcq@uic.edu>
- Date: Fri, 19 Jun 1998 17:55:34 -0500
- To: Chris.Newman@innosoft.com
- CC: connolly@w3.org, xml-editor@w3.org, cmsmcq@uic.edu
>Date: Fri, 19 Jun 1998 13:36:30 -0700 (PDT) >From: Chris Newman <Chris.Newman@innosoft.com> >> Can anyone give an example of an interoperability problem introduced >> by the notion of processing instructions that could not occur without >> them? > >Vendor A uses a PI which alters the processing of the document. Vendor >A's product generates documents relying on that PI. Take documents from >Vendor A to Vendor B (which ignores the PI), and the documents don't look >the same. Since both Vendor A and Vendor B are compliant, the result is >a legal interoperability problem. Sorry, but I don't follow. In the case you describe, the products from vendors A and B are not interoperating at all, successfully or unsuccessfully; they are working, independently, on the same data. Whenever two programs work on the same data, even if they are doing 'the same thing' (e.g. both are displaying the data), they may produce different results; it is a fundamental assumption of W3C, as I understand it, that products should be allowed to differentiate themselves by producing better results than their competition. Dan Connolly can correct me if I am wrong in this. In your example, the difference in results is not introduced by the use of processing instructions; we can imagine very similar scenarios with the same bottom line, in which processing instructions do not appear at all. - Vendor A sells a browser that uses algorithm I for font fallbacks. Vendor B uses algorithm B. Take documents from one to the other; the documents don't look the same. - Vendor A sells a browser that understands the XML:lang attribute and hyphenates the text accordingly, in order to get better justification of lines. Vendor B sells a monolingual program that assumes all text is written in French. Take documents from Vendor A to Vendor B; they don't look the same. - Vendor A sells a browser with a built-in style sheet for HTML; so does Vendor B. Take documents from Vendor A to Vendor B; they don't look the same. An information owner who wishes to ensure that all critical aspects of document processing rely exclusively on element types, attribute values, and position in the document tree can readily do so, and would be wise to do so; the existence of processing instructions does not bear on this fact. >> >* "<![CDATA[" notation is cumbersome and creates new parser state and >> > alternate representations. >> >> It's much less cumbersome than the alternative, which is to escape >> each delimiter in the block individually. > >This is the same mistake which was made in html with the <PLAINTEXT> >(I forget the exact name used) tag. That tag was obsoleted in favor of >the <pre> tag. A similar mistake was made in an early draft of the >text/enriched media type, but was corrected in the final version. > >It turns out it's easier and cleaner to have one parser state and quote >characters appropriately than it is to have two parser states with >different quoting conventions. Especially if the second parser state is >infrequently used, it causes no end of bugs, complications and problems. I'm sorry, but this seems to assume development by programmers who don't understand how to read formal grammars and don't test their code well. The experience of SGML users over the past decade or so has been that CDATA marked sections are convenient in some applications; I do not recall that CDATA sections have been any more bug-prone in SGML systems than other parts of the spec; I can remember a number of problems with various systems, but none of the bugs I've encountered have ever involved CDATA sections. There are a number of problems with the old HTML PLAINTEXT element, as there are with the PRE element. One of them is the implausible assumption that the segment of the data stream for which special parsing is required should always be coterminous with an SGML element. >Because text/enriched went through a public review process in the IETF, >this problem was identified and eliminated before it was published. Shame >that XML lacked a similar public review process. Drafts of the XML spec were available to the public from November, 1995, continuously through the date XML became a Recommendation. Comments were in fact received from the public, considered by the work group, and in some cases acted upon. It's not your responsibility to know the details of XML's development process, so I don't blame you for not knowing that. But if you don't know how XML was developed, then surely you ought to realize that you don't know. In which case, why are you making any claims at all about XML's development process? >> >* Version number text is broken -- likely to leave things stuck at >> > "1.0" just like MIME-Version. >> >> How? ... > >The XML spec says: > Processors may signal an error if they receive documents labeled with > versions they do not support. > >this is exactly what early MIME drafts said. As soon as one company >choose to check the version number and fail if it differed, the version >number was effectively locked in, since any new version was automatically >incompatible with some compliant implementations of the earlier version -- >even if it was only a minor revision. To get the most out of the version >number, you should have indicated that parsers must not signal an error if >the minor version (the portion after the ".") mismatched. Thank you for the clarification. I respectfully submit that the key ingredient in making the version number useful is the willingness to allow old software to fail gracefully when it encounters version numbers it was not written to handle. If those responsible for MIME chose not to allow this, they must have had other things on their mind than making the version number useful. The rule you suggest is a plausible one, and one I think some developers are likely to use in deciding whether to try to parse a document. But including it in the specification would have involved first a commitment to future version numbers of a particular form, which the WG was not willing to make, and would have implied, second, a commitment on the part of the W3C, or of the WG, to develop and distribute future versions of XML. The WG was not authorized, and the W3C was not disposed, to make such a commitment. >> >* Reference to UCS-2 which doesn'treally exist. >> >> What does 'really exist' mean? ... > >UCS-2 is a myth; as soon as a codepoint is assigned outside the BMP, there >is no 16-bit character set. I consider it a synonym for "UTF-16" which >does exist and is the correct label. >ISO definitions often don't match reality. UCS-2 is not a synonym for UTF-16; the two encodings differ in some crucial ways. Please do not attempt to sell me any software which relies on synonymy of this kind. In character set matters, as in others, vendors often deviate from ISO standards. In character sets, the deviations of commercial vendors from ISO uniformly make the ISO standards look better and better to me as a user and as a computing professional responsible for supporting end users. If I have to make a choice between an ISO standard and a specification issued by a commercial body with no commitment to open processes, I will always choose the ISO standard, and so will any users I have any influence on. Fortunately, the alignment between Unicode and ISO 10646 seems to be holding. The XML spec cites both the Unicode specification and ISO 10646 as equal authorities for character set issues. This is the result of a conscious decision reached after extensive discussion. If you wish the work group to remove the reference to ISO 10646 and to UCS-2, some argument other than standard off-the-shelf sneers at international standardization is going to be necessary. >> >* Byte-order mark replicates TIFF problem. >> >> Can someone explain this? > >TIFF files are permitted to be either big-endian or little-endian with a >magic number at the beginning indicating which. Sound familiar? > >Well look at what happened... Some products supported both variations, >some supported only one. ... >If you had just said "XML in UTF-16 is always stored and transmitted in >network byte order (big-endian)", there would be no >interoperability problems. As it is, I predict exactly the same thing >will happen to XML as happened to TIFF, for exactly the same reasons. The Unicode specification already specifies fairly clearly what the obligations of Unicode-supporting software are. There is no reason whatever for the XML work group to re-do the work of the Unicode consortium or to second-guess its results in this question. There is a simple specification now; people should implement it. Making XML have rules different from Unicode rules for the BMP is a sure recipe for real interoperability problems. -C. M. Sperberg-McQueen Senior Research Programmer, University of Illinois at Chicago Editor, ACH/ACL/ALLC Text Encoding Initiative Co-coordinator, Model Editions Partnership cmsmcq@uic.edu, tei@uic.edu
Received on Friday, 19 June 1998 18:56:55 UTC