Another pseudo-element gotcha from James Clark on 1997-04-12 (w3c-sgml-wg@w3.org from April 1997)

From: James Clark <jjc@jclark.com>
Date: Sat, 12 Apr 1997 10:04:36 +0700
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970412030436.00eb3bbc@jclark.com>

Consider the following

<A ID="A">
<B>...</B>
C
</A>

ID(A)CHILD(1) selects the B element, right?  Wrong!  All white-space in
mixed content is data. So the newline after the A start-tag is a
pseudo-element.  So the B element is ID(A)CHILD(2)!  Also in something like

<A ID="A">
<B>..</B>
</A>

B will be ID(A)CHILD(2) if parsed without a DTD, and ID(A)CHILD(1) if parsed
with a DTD.  Of course we could say white-space characters produced
pseudo-elements even in content, but apart from being incompatible with TEI,
it makes the numbers so unintuitive as to be virtually useless.

If we want to get addressing in characater content, we don't need
pseudo-elements to do it.  We can simple search or count for characters that
occur directly within an element.

Here's a summary of my suggestions on xpointers from my previous message on
the subject:

- Drop *CDATA
- Drop * as an element name, or, failing that, say it does not take account
of characters
- Drop ALL as an instance, or, failing that, allow it only as the last step
in a xpointer
- Change the rule on attribute value matching to say it does case folding
according to the declared value of the attribute, regardles of whether the
attribute value is a literal
- Add a TREE keyword which is like DESCENDANT but includes the location source

James

Received on Friday, 11 April 1997 23:17:35 UTC