Reviews of DOM 3 Core and Load&Save from François Yergeau on 2003-08-06 (www-dom@w3.org from July to September 2003)

From: François Yergeau <francois@yergeau.com>
Date: Wed, 06 Aug 2003 13:35:49 -0400
To: www-dom@w3.org
Message-id: <3F313C75.9050603@yergeau.com>
Sorry about the delay, these are the comments from the XML Core WG about 
the DOM Level 3 Core and Load&Save Last Call specs.


Core (http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/)
====

Document interface, "xmlStandalone" and "xmlVersion" attributes: both
descriptions say that there is a NOT_SUPPORTED_ERR if this document
*does* support the "XML" feature. Shouldn't that be does *not*
support?

+++++++++++++++++++++++++++++++++++++++++
Document interface, "xmlStandalone" attribute: this is said to be from
the XML declaration, but is boolean whereas the XML declaration can
specify 3 values: "yes", "no" and not specified.  Either the datatype
should be changed or (my preference) it should be specified that this is
true when the XML declaration says "yes", false otherwise.

Appendix C.1.1 says that this comes from the [standalone] property, but
does not address the case where the property has no value.

+++++++++++++++++++++++++++++++++++++++++
Node interface, nodeName attribute: the description doesn't say that
this is supposed to be the qualified name of the node.  Nor do the
descriptions of Element.tagName and Attr.name.  One has to determine
that from the description of Node.prefix!

+++++++++++++++++++++++++++++++++++++++++
Attr interface, "value" attribute: the spec should clearly specify that
when retrieving the value of an attribute that contains a reference to
an entity for which no definition is available, the processor
will treat this entity reference as an empty reference (see the reply to
C15 in
http://lists.w3.org/Archives/Member/w3c-dom-wg/2003AprJun/0075.html).
Same comment for Element.getAttribute() and Element.getAttributeNS().

+++++++++++++++++++++++++++++++++++++++++
Entity interface: the 4th paragraph starts "XML does not mandate that a
non-validating XML processor read and process entity declarations made
in the external subset or declared in external parameter entities."  The
last occurrence of "external" is superfluous and somewhat misleading,
since non-validating processors are not obligated to read even
*internal* parameter entities.  This latter point is pretty obscure and
under-documented in the XML spec, but see the last sentence in
http://www.w3.org/TR/2000/REC-xml-20001006#sec-rmd, as well as published
erratum E8 at http://www.w3.org/XML/xml-V10-2e-errata#E8.

+++++++++++++++++++++++++++++++++++++++++
Appendix B.1

void Element.normalizeNamespaces()
{
      ...

      // Fixup element's namespace
      //
      if ( Element's namespaceURI != null )
      {
        ...
      }
      else
      {
        // Element has no namespace URI:
        if ( Element's localName is null )
        {
          ...
        }
        else
        {
          // Element has no namespace URI
          // Element has no pseudo-prefix
          if ( default Namespace in scope is "no namespace" )
          {
            ==> do nothing, we're fine as we stand
          }
          else
          {
            if ( there's a conflicting local default namespace declaration
                 already present )
            {
              ==> change its value to use this empty namespace.

            }
            else
            {
              ==> Set the default namespace to "no namespace" by creating or
              changing a local declaration attribute: xmlns="".

            }

There seems to be useless redundancy in the last "if".  Paraphrasing:
"If there's a declaration, change it, else create one or change an
existing one". Either drop the "if" and keep the "else" part or keep the
"if" and drop "or changing" from the "else" part.

+++++++++++++++++++++++++++++++++++++++++
In appendix B.1.1, the first sentence (after the initial Note) is really
hard to parse.  Comments in the algorithms say things like "if the
prefix/namespace pair is within scope...", using this wording would be
clearer.  It's not an element that's within scope, it's really the
prefix/namespaceURI pair.  Some rewording/tightening needed.

+++++++++++++++++++++++++++++++++++++++++
In appendix C.1.1, Document.actualEncoding should be set to [character
encoding scheme], which is "The name of the character encoding scheme in
which the document entity is expressed." in the infoset spec, not
necessarily the value of the encoding pseudo-att of the XML declaration.
Document.xmlEncoding probably should not be set.

In appendix C.1.2, [character encoding scheme] should be set from
Document.actualEncoding.

+++++++++++++++++++++++++++++++++++++++++
In appendix C.1.1, Document.xmlVersion should be "The [version] property
or 1.0 if the latter has no value."

Same, mutatis mutandis, for xmlStandalone (false if no value).

+++++++++++++++++++++++++++++++++++++++++
In appendix C.1.2, [notations] should be "Document.doctype.notations",
in order to point to the correct DocumentType object.

+++++++++++++++++++++++++++++++++++++++++
In appendix C.2.2, the statement "Element nodes with no namespace URI
(Node.namespaceURI equals to null) cannot be represented using the
Infoset." is counterfactual.  The infoset supports these, provided names
do not contain colons.

+++++++++++++++++++++++++++++++++++++++++
In appendix C.3.1, Node.previousSibling and Node.nextSibling should be
set to null.

+++++++++++++++++++++++++++++++++++++++++
In appendix C.6.1, it should be clarified that CDATASection nodes cannot
occur from an infoset mapping, since the infoset doesn't contain CDATA
section boundaries.



Load&Save (http://www.w3.org/TR/2003/WD-DOM-Level-3-LS-20030619/)
=========

In several places (1.2.3, 1.2.4, DOMInput, DOMOutput), it is said that
UTF-16 is defined in [Unicode] and Amendment 1 of [ISO/IEC 10646].  That
last part is obsolete, UTF-16 was defined in Amd 1 of 10646:1993, but
integrated in an Appendix of 10646:2000.  Just say "...in [Unicode] and
in [ISO/IEC 10646]".

+++++++++++++++++++++++++++++++++++++++++
In interface DOMImplementationLS, the createDOMOutput() method is
missing from the IDL definition and from the Methods details.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMImplementationLS, method createDOMInput(), it says
"Create a new empty input source."  "Empty" is not defined.  Does it
mean that all attributes are null?

This comment will also probably apply to createDOMOutput() when the
latter is added (see previous comment).

+++++++++++++++++++++++++++++++++++++++++
In interface DOMParser, 1st bullet after 3rd para, it is wrong to claim
that CDATA sections are structure.  It also seems wrong to set
expectations that CDATA sections will show up after parsing when in fact
parsers are not required to report them.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMParser, 2nd bullet after 3rd para, there is an instance
of "datatype-normalization", within quotes and as a link to Core, and
just below an instance of "data-type-normalization" without quotes, in
fixed-width type, not linked and with an apparently extraneous hyphen.
Aren't these the same?  The spec certainly makes a nice effort to make
them appear different :-)

+++++++++++++++++++++++++++++++++++++++++
In interface DOMParser, in the description of the
"unbound-namespace-in-entity" warning, how can an unbound prefix be
found in an entity *declaration*?  Perhaps you mean in an entity's
replacement text?

Also typo "encounterd".

+++++++++++++++++++++++++++++++++++++++++
In interface DOMParser, in the description of the "namespaces"
parameter, shouldn't there also be a reference to [XML Namespaces 1.1]?

+++++++++++++++++++++++++++++++++++++++++
In interface DOMInput, it says "The DOMParser will use the DOMInput
object to determine how to read data. The DOMParser will look at the
different inputs specified in the DOMInput in the following order to
know which one to read from, the first one through which data is
available will be used: "

It is not clear how the DOMParser does that, i.e. how it determines if
data is available.  Is there an expectation that, say,
DOMInput.characterStream will be null if data is not available there?
What about stringData?  Null or empty?  Is this binding-specific?

Same comment for DOMOutput.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMSerializer, the statement "For all other types of nodes
the serialized form is not specified, but should be something useful to
a human for debugging or diagnostic purposes." seems a bit weak.  It
should be possible to specify more, especially for Element nodes.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMSerializer, method writeURI(), it would be desirable to
specify more how to write to a URI, at least for very common schemes
such as HTTP(S) and mailto.

In HTTP, it would seem desirable to actually be able to choose which
verb (POST or PUT) is used.  POST is supposed to be used when posting
forms, which XForms does with XML data.  PUT is supposed to be used for
uploading data, here an XML document.  The DOM user should be able to
specify which to use, perhaps using an additional parameter to the method.

The spec should also specify to include a Content-Type header with a
media type (which? need a parameter to the method?) and a charset
parameter.

Same comment for DOMOutput when the systemID ends up being used.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMOutput, the descriptions of encoding and systemID seem
to have been more or less copy-pasted from DOMInput, not fully taking
into account the fact that output is involved, not input.  Setting
encoding indicates an intention, not a knowledge of the encoding of some
existing data.

+++++++++++++++++++++++++++++++++++++++++
In interface DOMOutput, is it not possible to specify the behavior when
the systemID is a relative URI?  Wouldn't "relative to baseURI of
Document" work?

+++++++++++++++++++++++++++++++++++++++++


-- 
François Yergeau
Received on Wednesday, 6 August 2003 13:36:25 UTC