Subelement addressing

The proposal described herein is mostly due to James Clark.  

Background: lots of people want subelement addressing in XML-link for what
seem like good reasons.  But there are problems.  Character counting is
going to inevitably get scrambled by those invisible characters at line
ends which vary in their number.  Processing of space-delimited tokens
is OK too, but not suitable for lots of languages, and as we've seen
in the recent postings from our Japanese colleagues, spaces are not
that straightforward a concept.  

The idea is to introduce a new XML-link locator keyword STRING
that would work as follows:

STRING(7, 8, "LITERAL")

which would denote a location at the 8th character following the beginning
of the 7th match to the literal LITERAL.  This would be an exact 
byte-for-byte matching - no case-folding, record-end magic, regexp magic,
or combiner normalization.  Which means it would still have the potential
to fail puzzlingly in some (particuarly combining character) situations.
Nonetheless, it gets you token-counting if you're willing to make sure
your tokens are separated by a pattern that you know about.  It allows
quite a lot, and should be very easy to implement.

However, such a keyword would have to appear at the *end* of a 
chain of locator keywords.

Comments?  The ERB kind of likes this. -Tim

Received on Tuesday, 17 June 1997 19:12:07 UTC