- From: Michael Rys <mrys@microsoft.com>
- Date: Thu, 22 Feb 2001 10:11:40 -0800
- To: "'www-xml-infoset-comments@w3.org'" <www-xml-infoset-comments@w3.org>
- Cc: "W3C XML Query WG (E-mail) (E-mail)" <w3c-xml-query-wg@w3.org>
Last Call Review of the XML Information Set =========================================== The following represents the feedback of the XML Query working group on the XML Information Set Last Call version [1]. In general, the information set needs to strike a balance between describing too much detailed but mostly irrelevant information and providing too much abstraction. The current working draft has mostly acheived a good balance. However, it is the opinion of the reviewers that it has preserved too much information in certain places. The following issues are ordered approximately according to the perceived importance of the issues to the XQuery working group. Issue 1: Namespace prefix -------- According to the namespace specification [2], namespace prefixes have no semantical meaning. As such there should be no requirement by the information set to preserve the used prefix on element or attribute information items. For namespace information items, it may be useful to preserve the prefix, so that other processors can interpret values inside an attribute or element as namespace references. In addition, the namespace prefix property should not be empty, but absent as many systems report already today (see also section 2.15). Finally, why should it be the prefix part of the element-type in the attribute information item? Scope: Section 2.2 point 3, Section 2.3 point 3, Section 2.15 point 1. Issue 2: In-scope namespaces -------- It would probably be better if only the newly defined namespace information items are provided on an element information item. This would guarantee locality and would allow changing operations on an infoset that has only local impact. The current definition can easily be inferred from that information. Scope: Section 2.2 point 7. Issue 3: namespace attributes and xmlns="" -------- Is xmlns="" represented as a namespace attribute or absent. Since xmlns="" is technically not a namespace declaration but an undeclaration, this needs to be clarified. Scope: Section 2.2 point 6. Issue 4: Character entities -------- All single characters should be represented as character information items. Thus, the predefined entities <, >, &, ', and " do not need to be represented with entity information items. They are simply used to encode the corresponding character information item and should have no special semantical standing in the infoset. Neither should have any of the numerical character entities. Thus, the internal entity information items should preclude information items on character entities. Scope: Section 2.1 point 5, Section 2.9 Issue 5: Representation of missing information in the Infoset -------- The specification currently uses NULL to indicate missing properties. Since the infoset can make use of a semi-structured data description, there is no need to make use of a storage representation that is foreign to the world of XML. The information set specification should make use of absence of a property in such cases. Thus, we have the following proposal: Replace Null section with: Missing and absent information Some properties may sometimes be absent because they have no defined value or are not applicable. This will be expressed by not providing the property on the information item. And replace all: if condition(x), then this property is null with: if condition(x), then this property is absent. Scope: Section on Null in intro and all optional properties. Issue 6: CDATA start and end markers -------- CDATA sections are a purely syntactical tool to allow the easier manipulation of character data that otherwise would need to be entitized. As such, the infoset should not preserve CDATA section boundaries. Basically, <![CDATA[AB]]><![CDATA[C]]> should be equivalent to <![CDATA[ABC]]>. This is important since CDATA sections may have to be broken into two for purely syntactical reasons (whenever a ]]> occurs). Scope: Section 2.2 point 4, Section 2.16 and 2.17 Issue 7: normalized attribute value --------- It may be more useful to provide the value and an indicator whether it was normalized or not. It is also not clear how the infoset deals with entities in attribute values. See also issue 11 below. Please clarify. Scope: Section 2.3 point 4 Issue 8: Attribute types and strings -------- Aren't entities resolved to strings (at the moment)? Shouldn't the default type be CDATA instead of missing (use absence here if the answer is missing). Scope: Section 2.3 point 6 Issue 9: unexpanded entity in attribute values -------- Can entities appear in attribute values? If so, the unexpanded entity reference info item needs to indicate that. Also entity start and end markers (if preserved, see issue 10). Scope: Section 2.5 point 3, Section 2.13, Section 2.14 Issue 10: Entity start and end markers -------- We would consider the entity start and end markers to be too much preserved information for the infoset, assuming that resolved entities are just used for syntactic purposes. If they will be preserved, a good usecase scenario should be provided in the introduction. Scope: Section 2.13 and 2.14 Issue 11: Character information item and attribute values --------- Why are attributes and elements are treated differently w.r.t. character information items? Please clarify. See also issue 7. Scope: Section 2.3 point 4, Section 2.6 point 3. Issue 12: Document Information Item and document type declaration info item -------- It is not clear if a document type declaration information item has to be present if there is a document type declaration or if it may be present. Please clarify the wording. Scope: Section 2.1 point 1 Issue 13: Standalone Indicator -------- The standalone indicator should make use of Boolean values instead of yes and no. Again, the infoset should not make use of concepts in its description that is too concrete and using concepts foreign to XML. Scope: Section 2.1 point 7 and RDF description Issue 14: base URI on element information item -------- It is not clear why this property is needed or useful. Please clarify. Scope: Section 2.2 point 8. Issue 15: Attribute types --------- Given that this property exists and has no meaning in the context of PSV Infoset, can the Infoset working group coordinate with the schema working group why there are now two properties for types in the PSV-Infoset? Scope: Section 2.3 point 6 Issue 16: owner element --------- There is no reason for a separate name. Rename to parent or motivate different property name. Scope: Section 2.3 point 7 Issue 17: RDF Schema --------- In addition to RDF schema, an XML Schema based non-normative description of the infoset would be useful. Scope: New appendix Issue 18: Use cases and examples --------- The spec could benefit from at least one example per information item section and for some of the less obvious information items also some use cases that show why such an information item (or property) is worth preserving. Scope: All of section 2. Issue 19: Grammatical changes --------- In the first paragraph of Section 2.2, change 'children' to 'descendents' in : '...and all other element information items are children of the document element, either directly or indirectly.' In Section 2.2, item 8, change 'may be' to 'is' in : '...entity is not known, this property may be null. ' [1] http://www.w3.org/TR/2001/WD-xml-infoset-20010202 [2] http://www.w3.org/TR/REC-xml-names/ -- Program Manager, SQL Server XML Technologies mrys@microsoft.com, rys@acm.org We store the Web and more...
Received on Thursday, 22 February 2001 13:25:09 UTC