Re: RS/RE: basic questions from Joe English on 1996-09-24 (w3c-sgml-wg@w3.org from September 1996)

From: Joe English <jenglish@crl.com>
Date: Tue, 24 Sep 1996 12:23:10 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <199609241923.AA13430@mail.crl.com>

Robert Streich <streich@slb.com> wrote:

> [...]
> Even the whitespace in element content wouldn't cause the
> application any problems. It's just going to ignore any extra whitespace
> anyway unless the stylesheet tells it that it is significant.
>
> Even Joe's example [below]
> isn't significant. The stylesheet will determine whether or not <b> starts
> a new line.

I think you're correct that _most_ applications
will be able to treat all syntactic whitespace the
same -- browsers and indexers for instance -- but
there may be some circumstances where the distinction
between SEPCHARs in element and mixed content is
still important.

One of the potential advantages of XML over HTML is the
ability to create hyperlinks to any part of any document
with HyTime or HyTime-like locators.  (I suppose this is
also possible in theory for HTML, but since the vast
majority of Web pages are not valid SGML it wouldn't be
very useful unless all parsers had identical error-recovery
strategies.)

For this to work, it's essential that all XML parsers can construct
equivalent groves.  For instance, in:

> 	<a>
> 	<b>blah</b>
> 	<b>blah</b>
> 	</a>

the tree location address of the second 'B' element can change
depending on whether 'A' has mixed content or element content.

Then again, this might not be an issue either.  TEI-style 'XPTR's
could be used instead, and they wouldn't break.

--Joe English

  jenglish@crl.com

Received on Tuesday, 24 September 1996 15:22:56 UTC