Re: Addressing.... from David G. Durand on 1997-01-13 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Mon, 13 Jan 1997 11:56:34 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130500af00080fa76a@[205.181.197.81]>
At 10:57 PM 1/8/97 (quite some time ago), Gavin Nicol wrote:
I'v been thinking about this issue, and I have some new thoughts in
addition to the same old arguments...

>>   6. A client will need to be able to address subdocument parts anyway --
>>If linking to a 1-sentence anchor in a 2K document, it's _much_ easier to
>>do local navigation than to bounce packets off a specialized server. So
>>fragment IDs (or some logical equivalent will still be needed even if they
>>are sometimes very sub-optimal (the 10MB document is an example).

>This is the scalability problem. It get's worse when a server
>interfaces to databases that *generate* XML, but don't manipulate
>it. They have a harder time with faking entity boundaries.

I think this is a purple herring nailed to the ceiling: Since XML documents
don't need DTDs, a smart server can send any XML element as if it were a
document without doing anything with entity boundaries. It can just
generate several (DTDless) XML documents where a typical XML-at-root
application would have subelements of a large document. I'm not sure how a
URL addressing mechanism makes it easier to generate document-formed data
from other forms of data. What kind of database are you thinking of, where
the addressing format would matter in this way?

>>We can debate it all you want, but pragmatically, specialized servers are
>>harder to deploy that specialized clients... I agree that in many cases
>>(though not all) this is technically superior. But I just don't believe
>>that such deployment will happen.
>
>I believe that the this is not always true, though I guess that if you
>have a browser that can handle XML, it would be safe to assume that it
>can also support fragment id's that are XML-specific. It is less safe
>to assume that all servers that send out XML, also support entities.

Entities _can_ be implemented on a normal filesystem-style HTTP server,
simply by using URLS in the current manner. I can;t really imagine how
avoiding entities makes a server's life eaier or harder. And as I pointed
out above, you don't need to generate entities anyway.

>>>I object strenously to *requiring* that an entity be retrieved in it's
>>>entirety in order to transclude a single element.
>>
>>I believe that this is not a requirement, see points 3, 5.
 I don't have time to go back to the original list of points but I remember
the context well enough...

>Yes, but then you assume server-side support for these cases... the
>thing you are disagreeing with. Remove server-side support, and the
>model becomes that of retrieving entities.
I pointed out that you can use dynamic server-side support _if you want_.
This means that your requirement is not catered-to, but neither is it
obviated.

If you don't want to implement dynamic server support, you have to think
carefully about your entities and author a document that won't run into
this problem.

>My major concern here is that if the only *standard* way of addressing
>individual elements is client-side, then people will only use that,
>building up momentum against finer granularity object addressing.
>It is better to establish a momentum toward superior solutions as
>early as possible in order to avoid pain further down the road
>(different to *requiring* superior solutions from the get-go).

I don't believe that it's worth the effort of defining complex fragement
transport mechanisms to encourage people to do something that they can do
even if we punt on that work. If the world starts beating down our doors on
this we can revisit it at any time.

>>Should be exactly the same, but we are putting the work on the client
>>coder, who is presumably more committed to XML than a server coder, for
>>whom XML is just another data format.
>
>I think that the initial *publishers* of XML content will generally
>have at least partial control over the site, as XML is going to be a
>tiny niche for some time, and so will require publishers dedicated to
>it's deployment. Fragment specifiers are fine for smaller documents,
>but I think many initial XML publishers will be dealing with large
>documents.

This may be a good reason to encourage client implementers to implement
lazy entity retrieval. Or, those publishers can use smart server
technology, since we have not prevented them from doing so (we simply
haven't standardized _how_ to do it).

Since a server can work with naive clients, I still don't see that defining
the data format of XML requires developing new server protocols and URL
conventions. This seems like a place where a simple cgi implementation of
some variety of XML "cooking" might be a great donation to the world.

>Sure, but if we focus on fragment specifiers to the exclusion of
>server-side solutions, eventually we will have nothing but fragment
>specfiers because no-one will use, or support anything different.

Come on. Clients need never support anything different. If you are right,
and scaling requires smarter servers, then the smarter servers will be
written. Actually I suspect that DynaWeb will have XML generation real soon
now, right?

>>    As long as you address a well-formed fragment, this should not be a
>>problem. Your server can certainly be smart enough to translate address
>>formats, if it is already parsing the wqhole document.
>
>The problem is addressing this well-formed fragment you mention...

I don't understand. You can simply create a pseudo-url for every element of
the document (as you proposed), and convert any fragment specifiers in the
source to the appropriate "cooked URL" and send that out. This could be a
separate pass, or might even be the pretty-print operation on the internal
strucuture you generated on parsing the original fragment ID.

No problem.

>>No, but I do think that life on the high end may make it more difficult
>>appreciating the limitations imposed by most server admins. We have a lot
>>of work to do just convincing browser manufacturers to use XML -- if we
>>don't need to sign up for more salesmanship, then we shouldn't.
>
>Well, if we really want to improve the state of the WWW, in the long
>run, servers will have to change.
>
>BTW. When you refer to "high-end", I think you falsely make the
>problem seem hard. It's trivial.

It's still more sophisitcated than 90% of all the servers out there, and
still undeployed. So it's not high-end because it's amaxingly hard, but
it's still a significant cut above present-day practice.


And fromt he beginning:

>>If a standard format for version identification in URLs is not
>>acceptable, I don't see why fragment identification is so fundamental
>>that it would be acceptable.
>
>Well, I'm on the same list as you are, and also cannot understand why
>a standard version syntax is unacceptable. The scheme I propose can
>handle versioning too...

What I am saying is that I have raised this same point repeatedly, on list,
and face-to-face, and the reaction has always been that clients are not
supposed to make assumptions about how URLs are formed. No one denies that
it would work, if you did it universally, but there is a strong committment
to this as an architectural principle, that seems unlikely to change, since
it is always presented as a fundamental architectural principle of URLs. We
don't have to agree with every W3C architecture decision; we still have to
work within that framework where we can't successfully change it in time to
do what we need.

   -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Monday, 13 January 1997 11:49:32 UTC