Re: rationales for TEI extended-pointer keywords from James Clark on 1997-06-11 (w3c-sgml-wg@w3.org from June 1997)

From: James Clark <jjc@jclark.com>
Date: Wed, 11 Jun 1997 09:16:44 +0700
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <2.2.32.19970611021644.00b28508@jclark.com>
At 19:00 10/06/97 CDT, Michael Sperberg-McQueen wrote:
>Some time ago, James Clark asked why the full array of tree-walking
>keywords from the TEI extended-pointer notation was needed for XML
>linking, and I was asked by the ERB to provide some account of why I
>thought they were desirable.

Thanks, you've mostly convinced me.

>(Note to those who don't want
>to have to look this up:  descendant finds elements at any level of
>containment; PRECEDING and FOLLOWING search left and right across the
>entire tree -- unlike NEXT and PREVIOUS they are *not* limited to the
>siblings of their location source.  Yes, I know, there's no way to
>remember which is PREVIOUS and which is PRECEDING -- better names would
>be welcome.)

This is my major concern about having both PRECEDING/FOLLOWING and
NEXT/PREVIOUS. It's going to be very hard for people to remember.  This is
especially the case since the TEI meanings for PRECEDING/FOLLOWING are the
opposite of the SDQL meanings:

TEI         SDQL
------------------
NEXT        FOLLOW
PREVIOUS    PRECED
PRECEDING   BEFORE
FOLLOWING   AFTER (currently used only in prose not in a function name)

I notice that the current draft actually defines the TEI terms using the
SDQL terms (was this intentional?):

 PREVIOUS selects preceding sibling elements of the location source.
 NEXT selects following sibling elements of the location source.
 PRECEDING selects elements which appear before the location source.
 FOLLOWING selects elements which appear after the location source.

This suggests to me that the TEI choice of terms is not the best.  NEXT to
me suggests immediately next not any following sibling; and FOLLOWING to me
doesn't suggest looking at elements that aren't following siblings.

My concrete suggestion would be to drop the terms PRECEDING/FOLLOWING since
SDQL and TEI give them opposite meanings and use PREVIOUS/NEXT/AFTER/BEFORE.

>I hope this helps clarify why I want a full set of tree-traversal
>keywords.

It does.  A couple of other issues:

- Why do we need to allow * for the element type name?  Why can't we simply
require that all steps include the element type name or *CDATA?  This
eliminates the confusion over whether * counts pseudo-elements or not.   It
makes my life harder as an implementor to have to support both typed and
untyped counts: making typed and untyped counting efficient requires
different data structures.

- Why do we need * for the attribute name?

- Why do we need * for the attribute value?

- Why do we need *IMPLIED for the attribute value?  This is only going to
work in the presence of the DTD.

While I have your attention, I have a couple of other open issues from
earlier messages which I think need considering:

>As far as I can see, there's no way to ask for example for the first
>element in the document with attribute FOO equal to BAR.  DESCENDANTS
>doesn't do it, because it will not work when the document element is
>the first such element.  I think we need another keyword which is like
>DESCENDANTS except that it includes the location source.  This is the
>subtree function in SDQL.  I would suggest either TREE or SUBTREE as
>the keyword.

>The spec says "If specified with quotation marks, the attribute-value
>parameter is case-sensitive; otherwise not."  This seems to me a very
>bad idea.  First of all, if the declared value of the attribute is
>case insensitive, then it makes no sense to do the comparison case
>sensitively.  Why not simply say that the comparison is case sensitive
>just in case the declared value is?  If the answer is that it must be
>possible to do case insensitive matching on attributes declared as
>CDATA, then this is a totally inadequate mechanism for achieving it,
>because it doesn't handle the case where there are non name characters
>in the value.  The only thing that this would be good for is doing
>case insensitive matching on attributes whose values can consist only
>of name characters; but if the value can consist only of name
>characters, then the attribute could be declared as a NMTOKEN, and the
>comparison would be case insensitve just because of that. It's also
>totally at variance with SGML: whether an attribute is quoted or not
>has no bearing on its case sensitivity.  If you really need this, I
>would suggest using something like FOO #FOLD "abc def" (or maybe
>#NAMECASE) to select elements whose FOO attribute matches "abc def"
>case insensitively even if FOO is declared as CDATA.

As a footnote to this, I would say that the current rule makes sense when
you're working without a DTD.  So I think the rule should be something like:
the matching is case insensitive if

- you're working with a DTD, and the attribute is declared as a case
insensitive type

- the attribute value is unquoted in the pointer (should this apply if the
attribute is declared is explicitly declared to be CDATA?), or

- (maybe) there's a #FOLD keyword present.

James
Received on Tuesday, 10 June 1997 22:34:28 UTC