- From: Michael Rys <mrys@microsoft.com>
- Date: Thu, 22 Feb 2001 10:11:40 -0800
- To: "'www-xml-infoset-comments@w3.org'" <www-xml-infoset-comments@w3.org>
- Cc: "W3C XML Query WG (E-mail) (E-mail)" <w3c-xml-query-wg@w3.org>
Last Call Review of the XML Information Set
===========================================
The following represents the feedback of the XML Query working group on the
XML Information Set Last Call version [1].
In general, the information set needs to strike a balance between describing
too much detailed but mostly irrelevant information and providing too much
abstraction. The current working draft has mostly acheived a good balance.
However, it is the opinion of the reviewers that it has preserved too much
information in certain places.
The following issues are ordered approximately according to the perceived
importance of the issues to the XQuery working group.
Issue 1: Namespace prefix
--------
According to the namespace specification [2], namespace prefixes have no
semantical meaning. As such there should be no requirement by the
information set to preserve the used prefix on element or attribute
information items. For namespace information items, it may be useful to
preserve the prefix, so that other processors can interpret values inside
an attribute or element as namespace references.
In addition, the namespace prefix property should not be empty, but absent
as many systems report already today (see also section 2.15).
Finally, why should it be the prefix part of the element-type in the
attribute information item?
Scope: Section 2.2 point 3, Section 2.3 point 3, Section 2.15 point 1.
Issue 2: In-scope namespaces
--------
It would probably be better if only the newly defined namespace information
items are provided on an element information item. This would guarantee
locality and would allow changing operations on an infoset that has only
local impact. The current definition can easily be inferred from that
information.
Scope: Section 2.2 point 7.
Issue 3: namespace attributes and xmlns=""
--------
Is xmlns="" represented as a namespace attribute or absent. Since xmlns=""
is technically not a namespace declaration but an undeclaration, this needs
to be clarified.
Scope: Section 2.2 point 6.
Issue 4: Character entities
--------
All single characters should be represented as character information items.
Thus, the predefined entities <, >, &, ', and " do not
need to be represented with entity information items. They are simply used
to encode the corresponding character information item and should have no
special semantical standing in the infoset. Neither should have any of the
numerical character entities. Thus, the internal entity information items
should preclude information items on character entities.
Scope: Section 2.1 point 5, Section 2.9
Issue 5: Representation of missing information in the Infoset
--------
The specification currently uses NULL to indicate missing properties. Since
the infoset can make use of a semi-structured data description, there is no
need to make use of a storage representation that is foreign to the world
of XML. The information set specification should make use of absence of a
property in such cases.
Thus, we have the following proposal:
Replace Null section with:
Missing and absent information
Some properties may sometimes be absent because they have no defined value
or are not applicable. This will be expressed by not providing the property
on the information item.
And replace all:
if condition(x), then this property is null
with:
if condition(x), then this property is absent.
Scope: Section on Null in intro and all optional properties.
Issue 6: CDATA start and end markers
--------
CDATA sections are a purely syntactical tool to allow the easier
manipulation of character data that otherwise would need to be entitized.
As such, the infoset should not preserve CDATA section boundaries.
Basically, <![CDATA[AB]]><![CDATA[C]]> should be equivalent to
<![CDATA[ABC]]>. This is important since CDATA sections may have to be
broken into two for purely syntactical reasons (whenever a ]]> occurs).
Scope: Section 2.2 point 4, Section 2.16 and 2.17
Issue 7: normalized attribute value
---------
It may be more useful to provide the value and an indicator whether it was
normalized or not. It is also not clear how the infoset deals with entities
in attribute values. See also issue 11 below. Please clarify.
Scope: Section 2.3 point 4
Issue 8: Attribute types and strings
--------
Aren't entities resolved to strings (at the moment)?
Shouldn't the default type be CDATA instead of missing (use absence here if
the answer is missing).
Scope: Section 2.3 point 6
Issue 9: unexpanded entity in attribute values
--------
Can entities appear in attribute values? If so, the unexpanded entity
reference info item needs to indicate that. Also entity start and end
markers (if preserved, see issue 10).
Scope: Section 2.5 point 3, Section 2.13, Section 2.14
Issue 10: Entity start and end markers
--------
We would consider the entity start and end markers to be too much preserved
information for the infoset, assuming that resolved entities are just used
for syntactic purposes. If they will be preserved, a good usecase scenario
should be provided in the introduction.
Scope: Section 2.13 and 2.14
Issue 11: Character information item and attribute values
---------
Why are attributes and elements are treated differently w.r.t. character
information items? Please clarify. See also issue 7.
Scope: Section 2.3 point 4, Section 2.6 point 3.
Issue 12: Document Information Item and document type declaration info item
--------
It is not clear if a document type declaration information item has to be
present if there is a document type declaration or if it may be present.
Please clarify the wording.
Scope: Section 2.1 point 1
Issue 13: Standalone Indicator
--------
The standalone indicator should make use of Boolean values instead of yes
and no. Again, the infoset should not make use of concepts in its
description that is too concrete and using concepts foreign to XML.
Scope: Section 2.1 point 7 and RDF description
Issue 14: base URI on element information item
--------
It is not clear why this property is needed or useful. Please clarify.
Scope: Section 2.2 point 8.
Issue 15: Attribute types
---------
Given that this property exists and has no meaning in the context of PSV
Infoset, can the Infoset working group coordinate with the schema working
group why there are now two properties for types in the PSV-Infoset?
Scope: Section 2.3 point 6
Issue 16: owner element
---------
There is no reason for a separate name. Rename to parent or motivate
different property name.
Scope: Section 2.3 point 7
Issue 17: RDF Schema
---------
In addition to RDF schema, an XML Schema based non-normative description of
the infoset would be useful.
Scope: New appendix
Issue 18: Use cases and examples
---------
The spec could benefit from at least one example per information item
section and for some of the less obvious information items also some use
cases that show why such an information item (or property) is worth
preserving.
Scope: All of section 2.
Issue 19: Grammatical changes
---------
In the first paragraph of Section 2.2, change 'children' to 'descendents' in
:
'...and all other element information items are children of the
document element, either directly or indirectly.'
In Section 2.2, item 8, change 'may be' to 'is' in :
'...entity is not known, this property may be null. '
[1] http://www.w3.org/TR/2001/WD-xml-infoset-20010202
[2] http://www.w3.org/TR/REC-xml-names/
--
Program Manager, SQL Server XML Technologies
mrys@microsoft.com, rys@acm.org
We store the Web and more...
Received on Thursday, 22 February 2001 13:25:09 UTC