Re: 'Leading White Space' Topic

Daniel,

Thank you for taking the time to understand my question and explain what
was wrong with my interpretation. I believed the authors used entity and
element interchangeably, possibly due to different authors at different
times. I feel much better now that I've been shown that the
recommendation and implementations are consistent and well written.

It is clear to me now that white space at the document entity level
means before and/or after the document element which always comes after
the prolog, if it exists.

Thanks again,

Steve Fogoros

>>> Daniel Veillard <veillard@redhat.com> 09/17/09 3:52 PM >>>
On Thu, Sep 17, 2009 at 03:32:56PM -0500, Steve Fogoros wrote:
> I so much want to agree, and I wish the recommendation to be concise
on
> this. I'm reading XML 1.0, Fifth Edition, Section 2.4. Here is a
> cut/paste of the first paragraph:
>  
> Text consists of intermingled character data and markup. [Definition:
> Markup takes the form of start-tags, end-tags, empty-element tags,
> entity references, character references, comments, CDATA section
> delimiters, document type declarations, processing instructions, XML
> declarations, text declarations, and any white space that is at the
top
> level of the document entity (that is, outside the document element
and
> not inside any other markup).]

this is about white space in the top level of the document entity as
in not within the subtree of the top level element. I.e. the Misc
production called after the top level element in [1] and as part of
[22] prolog.

> It says that '... any white space that is at the top level of the
> document entity (that is, outside the document element and ...' is
> markup and it allowed.

  yes those space exist, the fact that your assuming they can come
before prolog is where you get this wrong.

> Production [1] defines the document element as document ::= prolog
> element Misc*
>
> I understand this to mean that 'any white space' outside the document
> element includes any white space before the prolog. How could this be
> interpreted any other way?

  By the obvious fact that the document element is production [39]
and that prolog allows white spaces as part of it's Misc derivation but
only in certain places.
  The Bakus-Naur grammar is normative, it defines what an XML
document can be. And leading spaces can only be consumed by such
a grammar only if there is no XMLDecl.

  There is no confusion about this in implementations, there is actually
at test in the test suite making sure that implementation reject this:

  ./sun/not-wf/sgml02.xml

Case is clear and unambiguous to the exception of a leading Byte Order
Mark allowed by some encoding. In any case a parser accepting leading
space is just nor conformant, i.e. not an XML parser.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit 
http://xmlsoft.org/
daniel@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/




** Confidentiality Notice: This e-mail and any files transmitted with it are confidential to the extent permitted by law and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error please notify the originator of the message and destroy all copies. **

Received on Friday, 18 September 2009 01:39:54 UTC