W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > May 1997

Re: Link-3: Sets, Singletons, and Determinism

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Sun, 18 May 1997 14:29:20 GMT
Message-Id: <6802@ursus.demon.co.uk>
To: w3c-sgml-wg@w3.org
In message <3.0.32.19970518111704.00b18a30@pop.intergate.bc.ca> Tim Bray writes:
> Although XML-link currently doesn't address this at all, the spec probably 
> follows the TEI principal of determinism; that is to say, you always get 
> exactly one location as the result of an xpointer (or in the case of spans, 
> two).

My spec (970406) has production [12] with the keyword 'ALL' and the
statement that it selects <I>all</I> the candidate locations.  Has this
changed?
(The rest of this posting assumes it hasn't :-).
The spec reads (5.3.1)
'The result of evaluating a location term [always is] an element'.  This 
is not compatible with the statement under production [12] which uses
location*S*.
I assume that those locations are then treated as *elements* and that the 
two words are synonyms in this version of the spec.

My understanding of spans (or at least the '..' syntax) is that while
it may return two *locations* it only returns one *element*.  Here is how
JUMBO might interpret it:

<MOL ID="H2O2">
<ATOM>H</ATOM><ATOM>O</ATOM><ATOM>O</ATOM><ATOM>H</ATOM>
</MOL>

'ID(H2O)CHILD(1,ATOM)..CHILD(3,ATOM)'

would return
(b) 
<ATOM>H</ATOM><ATOM>O</ATOM><ATOM>O</ATOM>
which might be valid, but is a set of elements from the original document.
If this is what is required, direct querying would be preferable.

'ID(H2O)..CHILD(3,ATOM)'

is more problematic.  It could return

(a) an error, since it could be interpreted as not being well formed.

(b) a WF element created by unstacking the unbalanced closing tags:
<MOL ID="H2O2">
<ATOM>H</ATOM><ATOM>O</ATOM><ATOM>O</ATOM>
</MOL>
This is dangerous as it creates a new apparently viable element (in the
above case it's a chemical transformation, which clashes with the 'ID').

(c) the elements above *preceded* by the complete MOL element (which 
would contain these ATOMs anyway, with the fourth child).  The result 
would be 4 elements (which would be grotesque).

For this reason I have not implemented '..' because the syntax is not clear
and potentially dangerous.

> 
> If we are going to allow spans, and thus an xpointer to return N
> locations, where N>1, should we consider saying that all xpointers
> return sets of objects, and sometimes the size of the set is 1?  This
> would open up a whole bunch of interesting apps.

This is what JUMBO does at present.  (Note that the size of the set could
be 0 if the search fails).  

I was introduced to this idea by CoST which I found very useful indeed
and when it resurfaced in XML-LINK I assumed it was generic SGML practice.
So JUMBO contains a NodeSet (read ElementSet) which holds the result of
a TEIXptr.  I'd be very sorry to see this disappear.

> 
> On the other hand, it would make xpointers smell even more like queries
> and less like addresses, which makes me at least nervous.  We also
> have to be careful if we are going to (see a later message) allow
> sub-element addressing; then we'd have to say that either that is
> a set of one pseudo-element, or that xpointers can return either
> sets of elements or spans of characters.  Tricky either way... but

Yes - the return value from a TEI pointer needs to be clearly defined.
Also the semantics - if I have a query like:
CHILD(1,MOL)
then I use (internally) a routine like:
	Element e = Tree.TEIsearchFirstElement("CHILD(1,MOL)");
	e can be null or a valid Element
whilst for ALL it would be:
	ElementSet es = Tree.TEIsearch("CHILD(ALL,MOL)");
and es (which is never null) could have 0 to any number of Elements.

For the sub-element addressing I have defined a concept of Subaddressable
(as a java interface).  Not all my Elements are Subaddressable - examples
are ARRAY, SEQUENCE (a protein sequence) and MOL (for the atoms).  A
Subaddressable must implement things like:
	- highlight subaddressed locations
	- allow (mouse) selection of subaddressed locations
	- provide a callback for mouse click on a subaddressable

See later.

> returning sets of elements is a seductive idea.

Yes - and I have been seduced.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Sunday, 18 May 1997 12:22:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:26 UTC