- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 17 Jun 1997 16:10:12 -0700
- To: w3c-sgml-wg@w3.org
The proposal described herein is mostly due to James Clark. Background: lots of people want subelement addressing in XML-link for what seem like good reasons. But there are problems. Character counting is going to inevitably get scrambled by those invisible characters at line ends which vary in their number. Processing of space-delimited tokens is OK too, but not suitable for lots of languages, and as we've seen in the recent postings from our Japanese colleagues, spaces are not that straightforward a concept. The idea is to introduce a new XML-link locator keyword STRING that would work as follows: STRING(7, 8, "LITERAL") which would denote a location at the 8th character following the beginning of the 7th match to the literal LITERAL. This would be an exact byte-for-byte matching - no case-folding, record-end magic, regexp magic, or combiner normalization. Which means it would still have the potential to fail puzzlingly in some (particuarly combining character) situations. Nonetheless, it gets you token-counting if you're willing to make sure your tokens are separated by a pattern that you know about. It allows quite a lot, and should be very easy to implement. However, such a keyword would have to appear at the *end* of a chain of locator keywords. Comments? The ERB kind of likes this. -Tim
Received on Tuesday, 17 June 1997 19:12:07 UTC