Re: XML namespaces on the Web

Lachlan Hunt scripsit:

> The current XML5 proposal focusses entirely on the parsing issue,
> leaving the definition of what's considered to be a conforming,
> well-formed XML document to XML 1.0.  So, in this sense, it is fully
> compatible with XML 1.0, and any conforming XML 1.0 parser will also be
> a conforming XML5 parser, as the algorithm allows for either aborting or
> applying the defined recovery procedure upon encountering a fatal error.

This turns out not to be the case: the algorithm doesn't come close to
XML 1.0 conformance.  For example, it accepts

        <root less="<">
        </root>

without reporting a parse error, but this is not well-formed XML because
it violates a well-formedness constraint.  In order to be an XML parser,
it has to accept what an XML parser accepts, reject what an XML parser
MUST reject, and report what an XML parser MUST report.  (In practice,
XML parsers to be useful have to report many things that are not REQUIRED
by XML 1.x, beginning with element names.)

In addition, there are inputs like

        this is not XML

on which the XML5 algorithm fails to create a proper DOM, as there is
no root element.

> However, there have also been some suggestions to extend the list
> of pre-defined entity references to all of those defined in HTML5
> (which includes the XHTML and MathML sets).  If this were done, then
> conforming XML 1.0 parsers would need to be updated to recognise these
> entities in order to become conforming XML5 parsers.

That would barely scratch the surface of the complications a parser
would have to handle in order to conform to both XML 1.0 and XML5 even
where they do not actually contradict one another.  (For example, the
XML5 algorithm ignores the DTD internal subset, but XML 1.0 parsers MUST
NOT do so until and unless they see a parameter-entity reference they
do not recognize.

-- 
MEET US AT POINT ORANGE AT MIDNIGHT BRING YOUR DUCK OR PREPARE TO FACE WUGGUMS
John Cowan      cowan@ccil.org      http://www.ccil.org/~cowan

Received on Wednesday, 18 November 2009 21:56:12 UTC