- From: Tim Bray <tbray@textuality.com>
- Date: Sun, 16 Mar 1997 18:58:24 -0800
- To: w3c-sgml-wg@w3.org
More on addressing. On March 15, the ERB agreed that: 1. Contrary to our decision of last time, we will support subelement addressing by a simple search operator. We will make it clear that bit-for-bit matching without respect to words or tokens is compliant behavior; if implementations wish to compete on the basis of case-folding or other fancy search optimization, that's fine. 2. Locators shall consist of a URL, optionally followed by a '#' 3. The '#' may be followed by the string "<tei>", in which case the remainder of the locator is to be treated as a TEI extended pointer. Michael Sperberg-McQueen has an action item to figure out the required changes to TEI xptr syntax to fit them into a URL. Note: with respect to our previous concerns on internationalization, we investigated and it appears that both Netscape and MSIE are trying to do the right thing; while there remain bugs in this area, our policy seems to be reasonable. On another subject, we agonized further over the fact that current implementations of '#' in URLs always fetch the whole document and then navigate to the fragment in the client. For SGML, this is probably often unreasonable. Too bad - this behavior is not carved in stone; early implementations that stupidly try to fetch the entire OED or Physician's Desk Reference, just to pull out a fragment, will not succeed in the marketplace CONUNDRUM: 4. If the '#' is followed only by a string, then.... what? This should be an IDREF, right? Maybe. And if it is, how do you know how to find ID attributes in an XML document out at the far end of a URL? Can you be sure of finding the appropriate declaration in the internal DTD subset? Can you be sure of finding the external subset? On the Web, in the URL "http://foo.bar.com/baz.html#sec1.2", the "sec1.2" should correspond to a <A NAME='sec1.2'. It is not, in the HTML DTD, an ID attribute. They want to use more characters than SGML ID allows, and they don't want to enforce uniqueness. If there is more than one matching NAME=, few browsers will do anything reasonable, but it's not an error. In fact, the semantics of #-fragments in HTML are easily expressed in a simple TEI xptr query saying "find the first A element whose NAME attribute has the value whatever". We could duplicate that in XML, but it feels limiting. We could duplicate it but, in the linking element, provide other attributes to say what the element type and attribute name you're trying to match are. But then you're duplicating something you could do with a "#<tei>" string. Or, we could say that it *is* an IDREF, and by default look for an attribute named 'ID' with the indicated value, and also, if it's possible, look in the internal subset or the whole DTD to find out what attributes are IDs. This would be weaker than HTML in the allowed values (SGML NAME) and requirement for only one match. Big deal? What we want is to have a simple behavior that makes sense, specified simply. No surprise that it's hard to be simple. Input and inspiration from the WG are solicited. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
Received on Sunday, 16 March 1997 21:59:32 UTC