addressing into char content with xml-link

In discussions with others over that last couple days, I've come
to the conclusion we should consider added to xml-link the capability
to address into data character content (aka dataloc).

The requirement I see is that users will expect an interface that
allows them to highlight some text in one document, highlight some
text in a second document, and make a link from one to the other.
If the target is a three word phrase in the middle of a very long
paragraph element, making the entire paragraph the target is unacceptable.
(Imagine if the application is one in which a reviewer of a document is
pointer out misspelled words--targeting the entire paragraph is unacceptable.)

I understand the difficulties in counting, and I understand the desire
to avoid specifying a grove plan in the XML spec, but I think we need
to try something.

Considering the 970331 lang spec and the 970406 link spec, what follows
is a concrete suggestion to start things off (numbers in brackets are
production numbers in the indicated spec).

In xml-link[13], add to the or group that defines "Element" something
like "*CHAR" or "*ATOM" to indicate that the Instance indication [12] 
is referring to data content atoms such as characters.  (I see no reason
to worry about what it means to have Attr and Val on *CHAR since we didn't
worry about it on *CDATA.)  The meaning of the Instance indication when
applied to *CHAR would be the obvious except for the specifics of what to
count as a unit.  In that regard, I'd suggest the following (production
numbers below all refer to xml-lang).

Each occurrence of each of the following shall be counted as one unit
for the purposes of the *CHAR addressing:

comment [17]
PI [18]
CDStart [20]
CDEnd [22]
CharRef [59]
EntityRef [61] 
STag [31]
ETag [34]
EmptyElement [37]
Char [2]

Note that Char != byte, but if we can expect the XML processor to know what
Char is when it's parsing an XML file, I figure we can expect it to know
what a Char is when it's addressing into an XML file.

Received on Thursday, 10 April 1997 23:08:14 UTC