Lengthy follow-up on XPointer and XML Schema from Henry S. Thompson on 2002-07-18 (www-xml-schema-comments@w3.org from July to September 2002)

From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
Date: 18 Jul 2002 15:02:46 +0100
To: www-xml-schema-comments@w3.org
Message-ID: <f5b3cuh5e2h.fsf@cogsci.ed.ac.uk>
Eric van der Vlist <vdv@dyomedea.com> writes:

> On Wed, 2002-07-17 at 18:30, Henry S. Thompson wrote:
> 
> > We could do that, but it would be wrong (in my view).  Wrong because
> > it violates locality -- a barename link with name XYZZY is to what the
> > _target_ establishes as is its XYZZY ID, not the source.  
> 
> Can you clarify what you are calling the source and what you are calling
> the target?

Sorry not to be clearer, let me try to be as precise as I can.

   *source*---An XML document containing an remote absolute http-scheme
   URI reference (call this *ref*) which includes a (shortform) fragment
   identifier XYZZY (call this the *idref*)

   *user agent*---The application/machine which issues the GET request for
   *ref*

   *server*---The application handling the GET request at the machine
   identified by domain name part of *ref*

   *target document*---The XML (i.e. the *server* believes it is of
   mime type text/xml or application/xml or . . ., given any accept
   header parameterisations sent along with the GET for *ref*)
   document identified by *ref*, ignoring the fragment identifier part
   thereof, as returned by the *server* in reply to the GET request
   for *ref*

   *TDI*---The representation of the infoset of the *target document*
   constructed by the *user agent*

   *intended target*---The element information item in *TDI* intended
   by the author of *source* as the referent of *ref* (including *idref*)

   *actual target*---The element information item in *TDI* identified
   by the *user agent* as the referent of *ref* by interpreting
   *idref* as a shortform xpointer

   *supplementary resources*---Resources involved in the construction
   by the *user agent* of the *TDI*.  These may be indentified by
   absolute or relative URI references.  Other things being equal,
   *ref* will serve as the base URI for relative URI refs.  What these
   are depends on the *target document* (obviously), the *user
   agent*'s choice of processing done to construct the *TDI* --
   minimal non-validating parsing, full validating parsing, complete
   non-validating parsing (i.e. processes all referenced parameter
   entities parsing) plus-or-not schema validity assessment, and the
   environment in which the *user agent* operates.

So my basic argument is that since what counts as an ID, and therefor
what determines the *actual target*, depends crucially on the
*supplementary resources*, and therefor on the *user agent* and its
environment, that is user parameterisation/policy specifications,
catalogs, caches, proxies, etc.,  the *source* and *target document*
necessarily underdetermine the *actual target*.

<skip/>

> Not really. When I say that I want to access to anchor "boo" per the
> (X)HTML naming system, the rules are set by the server.

Um, you just went to some lengths to argue it was the *user agent*,
not the *server*, which interprets fragIDs -- why change now?  The
only thing the *server* contributes are the resource as such and its
mime type.

> > The _user_ does that by setting up the processing environment, in
> > either case.
> 
> What do you mean?

I hope the clarifications above now make this clear.  *User agents*
typically enable a wide range of user control over their behaviour,
and questions such as whether or not to validate, whether or not to
chase parameter entity references, whether or not to use a proxy, may
all be under user control.  The proxy point is particularly important
-- if I am running without network access, the presence or absence of
a *supplementary resource* such as a DTD in my cache may well
determine whether my reference goes through or not.

So, bottom line: should we _also_ consider providing some _author_
input into the control of *supplementary resource* determination?  If
so, where should it go and whose (i.e. which W3C REC's) job is it to
say how this works?

My answer: Yes, but not in the fragId and it's not the XPointer REC's
job.  These questions are clearly the responsibility of the XML
Processing Model REC (forthcoming, I hope), in my opinion.  Note of
course there are typically at least _two_ authors involved, which is
another reason why putting it in the fragID is a bad idea.

Final note:  the 99.99% case, for both DTDs and Schemas, is that all
sensible *user agents* will do the same thing, and it will be what
people expect, namely:

  1a) If there's a DOCTYPE, process as much of it as you can get
      access to looking for ID declarations, and use them during
      parsing to identify possible anchors;

  2a) If there's an xsi:schemaLocation attribute, use it to get a
      schema doc and schema validity assess using it;

  2b) Otherwise if the doc elt is in a namespace and there's a
      schema doc accessible via the namespace URI, ditto.

People will chose their *user agents* just as they do now, namely on a
combination of ubiquity and functionality.  Let's hope the market
decides XPointer functionality is useful and we get *user agents* that
do all three of the above.

ht
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Thursday, 18 July 2002 10:04:29 UTC