Re: Subelement addressing from Gavin Nicol on 1997-06-18 (w3c-sgml-wg@w3.org from June 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Wed, 18 Jun 1997 08:36:24 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <199706181236.IAA28838@nathaniel.eps.inso.com>

>The idea is to introduce a new XML-link locator keyword STRING
>that would work as follows:
>
>STRING(7, 8, "LITERAL")
>
>which would denote a location at the 8th character following the beginning
>of the 7th match to the literal LITERAL.  This would be an exact 
>byte-for-byte matching - no case-folding, record-end magic, regexp magic,
>or combiner normalization.  Which means it would still have the potential
>to fail puzzlingly in some (particuarly combining character) situations.

What would happen if the encoding of the document being linked to, and
the encoding of the link were different? Also, the note about
combining characters is an important one: one could write a link to a
document with a literal that *appears* to be exactly the same, but is,
in fact not, causing the link to fail.

The two major issue with sub-element addressing are:

  1) Whitespace normalisation
  2) Character normalisation (i.e. what is a character).

I think we have almost got (1) taken care of, and (2) is mostly taken
care of by the fact that we use ISO 10646. Additional rules for
normalisation will be required for (2) to work well, but I think we
can formulate them reasonably well by taking the Unicode normalisation
recommendations as a starting point.

Received on Wednesday, 18 June 1997 08:37:06 UTC