Re: ERB call on addressing from Peter Murray-Rust on 1997-03-28 (w3c-sgml-wg@w3.org from March 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Fri, 28 Mar 1997 11:59:18 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <5200@ursus.demon.co.uk>
The ERB's proposal and the clarification in MS-M's posting meet my needs
very well but I am somewhat confused by some other interpretations and
concerned about the requirement for server participation.  I hope that
the simple comments below will represent how a large number of non-rocket
XML-users would like the system to behave (at least in the first instance).

One assumption seems to be that an XML file will always be mounted
on a server, and that the server will be required to perform some XML-specific
actions on it.  This is not a requirement for much of what we wish to do - 
at the simplest level we wish to hold data in structured form.  XML-LANG 
gives an application the chance to navigate this data in much better ways 
than previously.  

These documents are not always held on a server.  In fact 5000+ CD-ROMs with 
'XML' are being distributed in a few days by a major chemical publisher, along
with an early version of JUMBO.  Most of these CD-ROMs will never be mounted
on a server and I hope that that doesn't disqualify the files from being
called XML. (In fact we expect people to view them using Java-enabled 
Netsplorer and the JUMBO classes are available on the CDROM).  The great
advantage (to my mind) is that the *browser* provides much of the functionality
that the server might otherwise provide.  The URL for the documents has the 
protocol 'file:' (although of course they could also be mounted on a server
with protocol http:).  If we are saying that XML documents can ONLY have a 
protocol 'http:', then I have a problem.

Our current model using MIME is often that the server's role is simply to
stamp the file with the appropriate MIME type and that it is completely up
to the browser/client what action it takes.  So, for example, I could 
reasonably ask a webmaster to mount a set of CML files and to set 
*.cml to stamp them with text/xml.  Anything further than that requires
active participation of the webmaster who may not have the time/funding etc.
(Note: we set high value on cooperative, unchargeable, services like 
mounting local mirrors of (static) resources.  It is unreasonable to expect
webmasters to maintain dynamic resources for free).

If I ask a webmaster to set up something of the sort:
http://www.foo.ac.uk/bar/blort?impenetrability
this implies some real-life negotiation between me and the W/M
and probably a CGI script in the first instance.  The '?' implies work for 
the W/M.

If I ask for an address of the form:
http://www.foo.ac.uk/bar/blort.html#impenetrability
the webmaster's responsibility is simply to locate a given file (blort.html),
stamp it with text/html and send it back.  It's up to my browser what it does.
(Indeed different *browsers* may do different things if there are zero or
multiple occurrences of 'impenetrability' in the file.

In the first instance I am sure that we shall be doing client-side navigation 
within documents.  Anything else requires writing server-side code to do it.
(Since XML is not yet a de facto standard that will take a year or two to get
accepted and for support beyond MIME-stamps to become acceptable.)

At present the only client-side navigation is through html:HREF.  XML gives 
potentially much greater power and CML is constructed on the basis that 
navigation systems are essential.  (The DTD is very flexible, and the
precise architecture of any document is unlikely to be known beforehand, since
it could be converted from any of 20 different legacy types.) A typical
question is 'retrieve all the molecules from this document'.
A year ago I was using CoST, but have now added TEI syntax
to my application (JUMBO).  In TEI this is simply:
DESCENDANT (ALL MOL)
So whilst I assume that 
file:/foo/bar.xml#DESCENDANT(ALL,MOL)
is presumably outside the XML remit, I shall provide that functionality locally.

The key point of this is that the client may not know in advance what 
facilities a server has for XML.  (This can arise when a set of 'static'
documents is mirrored somewhere.  The relative addresses are all correct,
but the server may lack some functionality).  One option has to be that the 
client can retrieve the whole document and process it itself.
  
In message <199703280148.UAA06240@www10.w3.org> Michael Sperberg-McQueen writes:
> On Thu, 27 Mar 1997 19:39:05 -0500 Gavin Nicol said:
> > ... Here, you are actually asking for a standard
> >tree query and transformation language to be supported by
> >all servers.
> 
> This view seems to come up frequently (sometimes in the formulation
> "server owners think the syntax of the query segment belongs to
> them, so we can't specify it"), and it makes no sense to me, so let
> me ask the dumb question.  What do you mean?
> 
> If we specify a way of translating TEI extended pointer notation
> into a URL, either into the query segment, or into the URL proper,
> in what way are we saying this is something to be supported by all
> servers?
> 
> Why aren't we saying "here is a language which, if the server
> supports it, you can use, and which, if your users want it, your
> server can be made to support"?

This is my view, though not all webmasters can be 'made' to do things :-)

> 
> Suppose we were to say (this is not a proposal, though I wouldn't
> mind if it made enough sense to become one -- actually, forms a,
> b, and c below *are* the forms the ERB proposes to define, if I
> understand our decision right).
> 
> 1 An XML-Link locator can include a TEI Extended Pointer in any of
> the following ways:
> 
>   a.  in the query section:
>       http://www.uic.edu/x/y/z.xml?/tei/id(p23)child(1,emph)
>   b.  in the fragment identifier the same way
>       http://www.uic.edu/x/y/z.xml#/tei/id(p23)child(1,emph)
>   c.  in the 'indeterminate form' this way
>       http://www.uic.edu/x/y/z.xml/tei/id(p23)child(1,emph)
>   d.  in the URL-proper form this way
>       http://www.uic.edu/x/y/z.xml/teiq/id(p23)/child(1,emph)

I don't see the difference between c and d unless tei and/or teiq are
reserved words in a URL.  Could this be expanded, please?
(BTW - if you are suggesting commas as a replecment for spaces in the Xptr
that's fine by me)
> 
> 2 The query form and the fragment-identifier form are handled in the
> customary way; the other two forms require special knowledge on the
> part of the client and/or server, and negotiations outside the scope
> of this spec:
> 
>   * the client sends the query form (a) to the server in
> its entirety and gets back exactly what was pointed at(1);

This seems straightforward.  The client does not know what magic the server
uses to locate the fragment/resource, but the result would have been the
same as if the client had had the document locally and applied the Xptr.
(We assume that conceptually there is a document 'z.xml')

>   * for locators of form b, the client strips off the fragment
> identifier, sends the part before the '#' to the server, and uses
> the rest to navigate in the document sent back

Precisely.

>   * when the indeterminate form c is used, the client and the server
> negotiate using some method outside the scope of this standard
> to decide exactly what the server returns and what the client must
> do to it by way of navigation afterwards

This could handle the indeterminate form above: either the server
returns the fragment or it returns the whole document with some indication
that it's up to the client...

>   * when the URL-proper form d is used, something else happens ...
> 
> (1) a clever client could analyse the query and send part of it to
> the server, retaining the rest to guide local navigation after
> the document / document fragment is received.  Be careful, implementors:
> don't leave yourself holding a query beginning ANCESTOR ...
                                       ^^^^^^^^^^^^^^^^^^
(? or even *containing* ANCESTOR or PRECEDING?  I had interpreted TEI to 
mean that it is possible to end up higher up the tree than where you start.)
 
This is very attractive and I am sure we would use it a lot.

> Seems to be my day for asking ignorant questions.

I'm in good company then... :-)

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Friday, 28 March 1997 07:41:46 UTC