- From: Rick Jelliffe <ricko@topologi.com>
- Date: Fri, 17 Jan 2003 04:00:46 +1100
- To: <www-tag@w3.org>
From: "Elliotte Rusty Harold" <elharo@metalab.unc.edu> > Why would you want to restrict the syntax of the documents > you can process? (Yes, I know SOAP does this. I think SOAP is wrong, > and this brain damage should not be encouraged to propagate into > other domains.) I don't want to allow subsets of XML syntax to be > defined and required. It's an interoperability disaster. I think it is the current definition of well-formed that is the interoperability "disaster". As Simeon and Wadler point out in http://www.research.avayalabs.com/user/wadler/papers/xml-essence/xml-essence.pdf one of the important properties of an external data-representation format is round-tripping. The current situation where you don't know what infoset a parser will produce when you give it a document means that at the heart of XML is a flaw which should be removed sooner rather than later. People wrongly attribute the interoperability problem to "entities" in general (often just suggesting some kind of other link whose influence on the information set is even less well defined.) Now by "what infoset a parser will produce" I don't mean minor things like the status of CDATA sections, but very major things: whether an attribute is present, and (most significantly for downstream processing) whether that attribute provides a namespace. Which is why I think we need to move to four kinds of XML documents and processors - headless (e.g. for SOAP, similar to Norm's suggestion) - well-formed (deprecated) - infoset-complete but unvalidated (e.g. for XHTML) - valid To recap, the infoset-complete-but-unvalidated documents/processors would have exactly the same infoset as a valid document. However, the parser would not need to understand content models, nor test that attribute values which were enumerations matched their declarations. A processor would have to maintain about an element: whether it allowed PCDATA (to report whitespace correctly), what the default values for attributes are, what attributes are IDs or IDREFs, and what tokenizing or space-normalizing was needed for an attribute value. But no DFAs. Well-formed should be a category of minority interest to editor-application developers, not something for public usage. Cheers Rick Jelliffe
Received on Thursday, 16 January 2003 11:59:14 UTC