[Prev][Next][Index][Thread]

Another pseudo-element gotcha



Consider the following

<A ID="A">
<B>...</B>
C
</A>

ID(A)CHILD(1) selects the B element, right?  Wrong!  All white-space in
mixed content is data. So the newline after the A start-tag is a
pseudo-element.  So the B element is ID(A)CHILD(2)!  Also in something like

<A ID="A">
<B>..</B>
</A>

B will be ID(A)CHILD(2) if parsed without a DTD, and ID(A)CHILD(1) if parsed
with a DTD.  Of course we could say white-space characters produced
pseudo-elements even in content, but apart from being incompatible with TEI,
it makes the numbers so unintuitive as to be virtually useless.

If we want to get addressing in characater content, we don't need
pseudo-elements to do it.  We can simple search or count for characters that
occur directly within an element.

Here's a summary of my suggestions on xpointers from my previous message on
the subject:

- Drop *CDATA
- Drop * as an element name, or, failing that, say it does not take account
of characters
- Drop ALL as an instance, or, failing that, allow it only as the last step
in a xpointer
- Change the rule on attribute value matching to say it does case folding
according to the declared value of the attribute, regardles of whether the
attribute value is a literal
- Add a TREE keyword which is like DESCENDANT but includes the location source

James