- From: Lars Marius Garshol <larsga@ontopia.net>
- Date: Mon, 13 Nov 2000 13:54:06 +0100
- To: www-xml-infoset-comments@w3.org
Properly distinguishing between lexical and logical information =============================================================== Section 1 says "As long as the information in the information set is made available to XML applications in one way or another, the requirements of this document are satisfied." Section 3, however, says "An XML processor conforms to the XML Information Set if it documents the information items and properties that it provides." This seems inconsistent to the naive reader and what it implies should probably be explicitly explained. Personally, I would rather see the sentence "Conformance to the core is not a requirement for conformance to the Infoset." from 3.1 have the 'not' removed. In fact, I think that there would be definite value in removing all the non-core parts of the infoset, since I am unsure as to what purpose they serve. To me, the value of the infoset is precisely that it draws a firm line between necessary and unnecessary information about XML documents, and the non-core part can only muddy this distinction. 'Parent' properties =================== In my opinion these should be removed, since they serve absolutely no purpose at all that I can see. Section 1 says: "This specification presents the information set as a modified tree for the sake of clarity and simplicity, but there is no requirement that the XML Information Set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces are also capable of providing information conforming to the XML Information Set." To me this implies that if the children of nodes are made available by an API, but parent properties are not explicitly represented, that API still satisifies the infoset requirements, since parent information is implicitly available (just as in event-based interfaces). If this is the case then the parent properties serve no purpose at all and should be removed. This issue is also 'cosmetical': to me these properties feel far too much like an API rather than an abstract data model and ought to be removed even if the above interpretation is not correct, since IMHO it is no concern of the infoset whether APIs (or other systems) explicitly provide parent information or not. The same concern applies to the 'owner element' property of attributes. Minor editorial issues ====================== The term 'document order' is used, but not defined. document.standalone can have the values 'yes', 'no' and 'not present', which IMHO fits badly with document.base URI, which can be null. It seems better to me to allow standalone to also be null. Defining and coordinating the use of the terms 'null' and 'not present' would probably be useful. The definition of attribute.children mentions 'element content', which it probably should not. I assume this is a typo. attribute.specified should probably be defined more precisely if it is left in. attribute.attribute type implies that for enumerated attribute types one should not be told what the enumeration consists of. IMHO that is inconsistent and if this property is retained at all it should include that information. It should probably be specified that the public identifier properties should hold strings normalized according to the rules of section 4.2.2 of the XML Recommendation. Ditto for the system identifier properties. The definition of the namespace.children property seems rather strange and should probably be reformulated. (It also mentions 'element content', probably because it was copied from the attribute.children definition.) Section 3 is immediately followed by a section 4.1, which obviously should be 3.1. Re Appendix B, Other things that are not in the infoset: ======================================================== - distinction between literal text and character references - distinction between hex and decimal character references - distinction between uppercase and lowercase hex character references - whitespace between target and data of PI - original character encoding of document - prefixes used in namespace names ...and much more, but probably this list is not intended to be complete, so I'll skip the rest. --Lars M.
Received on Monday, 13 November 2000 07:49:33 UTC