Comments on xpointers from James Clark on 1997-04-07 (w3c-sgml-wg@w3.org from April 1997)

From: James Clark <jjc@jclark.com>
Date: Mon, 07 Apr 1997 10:53:40 +0700
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970407035340.00a6faf0@jclark.com>
The spec often talks about elements, but it's not clear when this
includes pseudo-elements.  For example, the spec says:

"The type may be specified by Name or using the values "*CDATA" or
"*". If the type is specified as "*", any element type is matched;
this means that the following are synonymous

CHILD(2,*)
CHILD(2)"

From a later example, it appears that CHILD(2) does count
pseudo-elements.  But on the other hand saying that "any element type
is matched" implies that pseudo-elements are not matched.  When the
spec means to include pseudo-elements I think it needs to say so
explicitly.

I think it is at least as common to want to specify the n-th element
as it is to want to specify the n-th element or pseudo-element.  If we
have something that matches the n-th element or pseudo-element, I
think we also need something that matches the n-th element.  One
possibility would be to say that CHILD(2) matches any element or
pseudo-element, but CHILD(2, *) matches any element.

The spec needs to make clear what a pseudo-element is.  For example,
is a sequence of characters with a PI in the middle one pseudo-element
or two?

I wonder whether pseudo-elements are really necessary?  Being able to
address pseudo-elements doesn't seem to me be very useful unless
you've also got the ability to then address within the pseudo element,
which we do not have.  It would be a significant simplification to say
that the only thing that's being addressed is elements.

Have we thought whether the * notation for element names is really
necessary?

As far as I can see, there's no way to ask for example for the first
element in the document with attribute FOO equal to BAR.  DESCENDANTS
doesn't do it, because it will not work when the document element is
the first such element.  I think we need another keyword which is like
DESCENDANTS except that it includes the location source.  This is the
subtree function in SDQL.  I would suggest either TREE or SUBTREE as
the keyword.

What happens when a location term other than the last matches selects
more than one element?  For example, is CHILD(ALL)(1) legal and if so
what does it mean? Or how about DESCENDANTS(ALL),DESCENDANTS(ALL)? Do
we really want the ALL selector at all?  It is a significant
additional complication.

Does an attribute value that matches *implied also match *?

The spec says "If specified with quotation marks, the attribute-value
parameter is case-sensitive; otherwise not."  This seems to me a very
bad idea.  First of all, if the declared value of the attribute is
case insensitive, then it makes no sense to do the comparison case
sensitively.  Why not simply say that the comparison is case sensitive
just in case the declared value is?  If the answer is that it must be
possible to do case insensitive matching on attributes declared as
CDATA, then this is a totally inadequate mechanism for achieving it,
because it doesn't handle the case where there are non name characters
in the value.  The only thing that this would be good for is doing
case insensitive matching on attributes whose values can consist only
of name characters; but if the value can consist only of name
characters, then the attribute could be declared as a NMTOKEN, and the
comparison would be case insensitve just because of that. It's also
totally at variance with SGML: whether an attribute is quoted or not
has no bearing on its case sensitivity.  If you really need this, I
would suggest using something like FOO #FOLD "abc def" (or maybe
#NAMECASE) to select elements whose FOO attribute matches "abc def"
case insensitively even if FOO is declared as CDATA.

James
Received on Tuesday, 8 April 1997 13:39:25 UTC