W3C home > Mailing lists > Public > xml-editor@w3.org > October to December 2004

Re: [xml-dev] When to check entity WFness according to 4.3.2

From: Jeff Rafter <lists@jeffrafter.com>
Date: Tue, 26 Oct 2004 09:21:44 -0700
Message-ID: <417E7998.7050305@jeffrafter.com>
To: Richard Tobin <richard@inf.ed.ac.uk>
CC: xml-dev@lists.xml.org, xml-editor@w3.org

 > The requirement is that "Each of the parsed entities which is
 > referenced directly or indirectly within the document is well-formed."
 > An entity is certainly *declared* in the prolog, but it is not
 > *referenced* there.

It can be referenced there in an attribute value declaration but that is 
really not the point. I have to agree with Karl, and must admit to being 
very frustrated by the opaqueness of the recommendation in this area. I 
have spent three days retrofitting a parser to perform a check that was 
not necessary-- and I am sure I am not the only person to have done so.

 > The requirement is that "Each of the parsed entities which is
 > referenced directly or indirectly within the document is well-formed."

Is *far* more clear than

   "4.3.2 The document entity is well-formed if it matches the
    production labeled <document>."

Which links to:

   "[1] document ::= prolog element Misc*"

If at that point you return to your reading in 4.3.2 you encounter:

   "An internal general parsed entity is well-formed if its
    replacement text matches the production labeled content."

By then you already missed the boat. You forgot to check what the 
implications of a wellformed *textual object* is-- which happens to 
include the <document> production-- but is not referenced anywhere in 
section 4.

I agree that there needs to be some sort of erratum here to clarify 
things. At worst, I would love to see "well-formed" in "An internal 
general parsed entity is well-formed" be turned into a link that points 
to http://www.w3.org/TR/REC-xml/#sec-well-formed.

But a clarification sentence of something like:

   "Internal general parsed entities should only be checked for\
    well-formedness if they are <included> or <included in literal>"

would help immensely. For most implementers this is a confusing area and 
can be clarified quite easily.

Thanks
Jeff Rafter
Received on Tuesday, 26 October 2004 16:22:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:46 UTC