Re: Addressing.... from David G. Durand on 1997-01-08 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Wed, 8 Jan 1997 13:12:02 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130500aef97c3c4872@[128.148.19.149]>
I spent quite a while rethinking everything to see if I am missing
something, and I am still not convinced by Gavin's arguments. First I will
try a coherent restatement of what I think I'm saying, in case the blow by
blow has obscured things, and then I'll answer Gavin's points.

I'd also like to say that I'm in fundamental sympathy with Gavin's
approach, but I don't believe that it fits in with the standards
environment we are operating in.

   1. We need to address sub-parts of XML documents. I don't think there is
any disagreement here.

   2. URLs are intended to be _entirely_ opaque to clients (except for
fragements identifiers -- "#-strings"). I don't agree that this is a
necessary decision, but I have been told in other groups that the W3C will
categorically oppose any standard that sanctions such "URL-hacking". If a
standard format for version identification in URLs is not acceptable, I
don't see why fragment identification is so fundamental that it would be
acceptable.

   3. Fragment IDs are interpreted _any way_ the client wants -- this means
that if we want to do URL hacking, the fragment ID is available. So a
client _is_ premitted to try constructing a URL based on the fragment ID.

   4. Special support on servers (even CGI scripts) is frequently harder to
deploy than special client support. Some sites are so large that running
even the simplest CGI scripts is subject to extreme scrutiny.

   5. If a server supports special URLs such as Gavin wants, it need not
ever generate a fragment ID, but can simply generate a special URL. Since
such a server has to parse the whole document anyway, this is even pretty
easy to do. So a server that is smart is not forced to act dumb just
because fragment IDs are (even a mandatory) part of XML -- it can simply
use its own superior adressing features.

   6. A client will need to be able to address subdocument parts anyway --
If linking to a 1-sentence anchor in a 2K document, it's _much_ easier to
do local navigation than to bounce packets off a specialized server. So
fragment IDs (or some logical equivalent will still be needed even if they
are sometimes very sub-optimal (the 10MB document is an example).

At 2:05 AM 1/8/97, Gavin Nicol wrote:
>David Durand manipulated electrons to produce:
>>    If we want to put sub-resource (ie fragment of URL return value)
>>addressing into XML, the only way that can be put into a URL is via the #
>>string. This is nice because we have a way to point at IDs, (or arbitrary
>>attribute values), or arbitrary XML-dependent substructures.
>
>Quite correct, though the emphasis is on _sub-resource_ addressing,
>which is different to _resource_ addressing.

sure...

>>The kinds of feature that we are talking about (like TEI location
>>ladders) will not be useful if they depend on special servers.
>
>That is quite open to debate.

We can debate it all you want, but pragmatically, specialized servers are
harder to deploy that specialized clients... I agree that in many cases
(though not all) this is technically superior. But I just don't believe
that such deployment will happen.

>>I don't think XML addressing formats should require the use of a
>>special server,  which Gavin's proposal would require.
>
>My proposal would require, at a minimum, an XML processor capable of
>parsing a well-formed instance, creating a tree from it, and then
>traversing/querying the tree. This could easily be done as a CGI
>script, and I think that writing the software required to do this
>would add very little to the cost of implementing an XML processor. I
>could certainly write it in 2 weeks, from scratch, in C/C++/Java.

>
>I object strenously to *requiring* that an entity be retrieved in it's
>entirety in order to transclude a single element.

I believe that this is not a requirement, see points 3, 5.

>Points to remember:
>
>  1) You are talking about special code in the client, which would be
>     easily comparable to the complexity of the code in a server.

Should be exactly the same, but we are putting the work on the client
coder, who is presumably more committed to XML than a server coder, for
whom XML is just another data format.

I just read Paul Prescod's reply, so I will leave most of the rest of the
points to his excellent commentary.

>>On the other hand, a client could recognize that a particular
>>#-string could be resolved by a particular server if it wanted, and
>>translate the URL.
>
>I do not object to fragment specifiers, but this argument is
>specious. You could just as easily say that a client could recognise
>that it could retrieve the entire entity, and then walk it's own parse
>tree based on the URL's I propose.

This latter solution is explicitly denigrated by the W3C. I agree that it
is technically feasible, but it violates an opaqueness condition
stringently held by the HTTP and URL standards people. For a host of
_almost purely nontechnical reasons_ I think that clients are the place for
us to concentrate our efforts.

>Again, I do not object to fragment specifier use, but I do object
>to it being the only thing we can use.

Special server-side URLs are always available, because a server can serve
up anything it wants, under any URL it wants. But I don't think we can
pretend to be able to _enforce_ server-based solutions on the web.

>It does not scale. Worse, it
>would preclude using XML with servers such as DynaWeb/DynaBase that
>generate content dynamically, and may not even have the entity
>structure left for you to address.
    As long as you address a well-formed fragment, this should not be a
problem. Your server can certainly be smart enough to translate address
formats, if it is already parsing the wqhole document.

>I seriously hope your objection to "special servers" doesn't mean that
>you think my motivation lies in the fact that I wrote DynaWeb, and
>wish to promote it... my motivation lies in trying to avoid a solution
>that doesn't scale well, and doesn't easily permit use of servers that
>do not have XML files laying around on them (like RDB, etc).

  No, but I do think that life on the high end may make it more difficult
appreciating the limitations imposed by most server admins. We have a lot
of work to do just convincing browser manufacturers to use XML -- if we
don't need to sign up for more salesmanship, then we shouldn't.

  -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Wednesday, 8 January 1997 13:05:03 UTC