Report on ERB work last week from Steven J. DeRose on 1997-06-11 (w3c-sgml-wg@w3.org from June 1997)

From: Steven J. DeRose <sjd@eps.inso.com>
Date: Wed, 11 Jun 1997 13:11:39 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970611171139.00c4b8e8@pop>
(sorry this took a while to get through -- phoneline probs while out of country)

The ERB met last week with Bosak, Bray, Clark, Connolly, DeRose, Hollander,
Kimber, Magliery, Maler, Paoli, Sperberg-McQueen and Wood present.

We set as agenda resolving several summary questions under discussion.

Link decisions: 1. syntax

None of the 9 syntactic options in question for distinguishing "HERE" as the
EPN keyword from "HERE" as an ID in a URL found enthusiastic support, though
several seemed acceptable. James introduced a tenth suggestion, namely to
require an empty parameter list after those EPN keywords that do not already
require them. For example:

   HERE    is an ID
   HERE()  is an extended pointer

This met with immediate enthusiasm for several reasons, including that it
increases the consistency of EPN itself and appears to be easy to
teach/learn/implement/document. This option was unanimously approved.


Link decisions: 2. Pseudo-elements

Discussion involved several aspects of the problem of whether to count
pseudo-elements.

<NOTE TYPE="terminological">
A pseudo-element is a portion of #PCDATA content uninterrupted by markup. 
A "real" element in contrast, is one that has a GI.
"Subelement addressing" involves addresses like "the third word".
</NOTE>

Whitespace relates to both pseudo-element and sub-element addressing. We
tabled the pseudoelement issue to discuss how the SGML TC changes re.
whitespace relates. The result was RE deleta est as reported
already by Michael.

Returning to the pseudo-element question, we noted that the removal of
ambiguity about the *presense* of whitespace removes ambiguity in *how* to
count pseudo-elements (though not about *whether* to).

The great cost of not counting pseudo-elements is that then you cannot
address them. It was pointed out that *if* you do still allow sub-element
addressing (such as character offsets into #PCDATA), you can get at
pseudo-elements that way, but that character counting across markup
boundaries is itself complex and relatively fragile. It also imposes a
subtle incompatiblity with HyTime and with TEI pointers (and not just for
CHILD, but for several other keywords including complex cases such as
PRECEDING and DESCENDENT).

After much discussion the ERB is leaning toward a proposal under which both
options are available to the user, distinguished by the GI parameter. This
has not been voted, but seems at this time to be the best compromise. Thus:

   CHILD (3)       locates the 3rd real subelement
   CHILD (3 *)     locates the 3rd real subelement
   CHILD (3 !)     locates the 3rd real or pseudo subelement

(the particular reserved value to flag the last case is to be determined;
"!" is merely for illustration).

The approach was also suggested, that pseudo-elements consisting *only* of
whitespace not be counted. This may enhance intuitiveness and compatibility
with SGML systems that do not yet support the TC.

This proposal will be presented to the WG for discussion.


Link decisions: 3. Sets & singletons

Discussion here centered on our relationship to the DOM work, since both
require an explicit definition of what the document structure representation
is before we can give a complete formal specification of what information is
in fact referenced by a locator, particularly in the more complex cases,
where the destination is not a single element.

Lauren will be coordinating this liaison effort, and seek to present a first
cut proposal for a DOM/XML data schema (or grove plan) by July 1. Michael
and James will be contributing to this effort.

As for locating spans, there are complexities because a span is not
generally representable as a set, list, or tree or elements. The span from
the 2nd to the 4th P within SEC ID=SEC3 can be; but the span from the last
word of one P through the 4th word of the next P is not. Neither including
nor excluding the P's involved, or their common ancestor, fully represents
the link: All those elements are *partly* included in the resource.

The end proposal was to include spans in the location syntax, specified as a
start/end pair, with the meaning defined in the same manner as in TEI: as a
reference to the included range. At the same time, we will acknowledge that
the precise details are not yet specified, and that we expect that to be
accomplished via the DOM effort, with which we are working.

This was approved, with James and Dave dissenting.



Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
   (formerly EBT)
Received on Wednesday, 11 June 1997 13:15:10 UTC