Re: Addressing.... from Paul Prescod on 1997-01-08 (w3c-sgml-wg@w3.org from January 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Wed, 8 Jan 1997 11:02:46 -0500 (EST)
To: gtn@ebt.com (Gavin Nicol)
Cc: w3c-sgml-wg@www10.w3.org
Message-Id: <199701081602.LAA19004@calum.csclub.uwaterloo.ca>
> David Durand manipulated electrons to produce:
> My proposal would require, at a minimum, an XML processor capable of
> parsing a well-formed instance, creating a tree from it, and then
> traversing/querying the tree. This could easily be done as a CGI
> script, and I think that writing the software required to do this
> would add very little to the cost of implementing an XML processor. I
> could certainly write it in 2 weeks, from scratch, in C/C++/Java.

Writing it is easy. Getting it installed on the Web servers of the world
is a bigger hassle. Right now we much encourage browser writers to embed
XML and authors to use it. We can perhaps cut out the browser vendors if we 
ship XML viewers as applets. Throwing system administrators into the mix 
turns it into a big chicken and egg problem. Netcom, Compuserve or MindSpring
won't install a CGI unless many users ask for it. Many users won't ask for 
it unless they have seen it on the web before and think it is neat.

It also takes us out of the language design business into the protocol 
design business. I think that "SGML people" should be in the protocol design
business, but not necessarily *this group of SGML people right now.

> I object strenously to *requiring* that an entity be retrieved in it's
> entirety in order to transclude a single element. Points to remember:
> 
>   1) You are talking about special code in the client, which would be
>      easily comparable to the complexity of the code in a server.

Sure, but it isn't the code complexity that is the problem: it is the politics
of getting it installed.

>   2) Any instance/entity that is small enough to be transmitted across
>      the internet, will not incur a great parse/traversal overhead on
>      the server: certainly no greater than that required on the client
>      side. 

But most clients are Pentium 100s spinning their wheels.

>   3) With most relative addressing schemes, only a given entity needs
>      to be parsed, not the entire document.

I guess that's true now that we've got rid of exceptions. What a relief!! Note,
though, that we are discussing adding in features that would have "document
scope" such as defaulted attributes based on the first occurance.

>   4) With the scheme I proposed, each URL is unique, and so can be
>      cached effectively by caching proxies.

It could work the other way: retrieving an entire entity (and caching it) may
often be faster if the user is often going to want many elements from that
entity. (for instance footnotes)

>   5) The only scaleable solution is to do it server-side, or as
>      distributed objects (basically the same thing, just different
>      protocol and resolution mechanism).

That's debatable. I tend to agree that this is the only elegant scalable 
solution. Requiring authors or servers to break up their documents into small
entities is an inelegant scalable solution. Negotiating the "entity resolution
service" when available is probably the best compromise.

>   6) The mechanism I propose is easily applicable to domains other
>      than XML.

Agreed. That's why I think that maybe we are not the group to do it.

> I do not object to fragment specifiers, but this argument is
> specious. You could just as easily say that a client could recognise
> that it could retrieve the entire entity, and then walk it's own parse
> tree based on the URL's I propose.

I don't think the client is allowed to play around with the section of the
URL preceding the "#" that much.

Anyhow, there are other reasons for making the special server *optional*.
The #-translated system that was proposed scales from a simple smart-client
system to a smart-client, smart-server system.

> Again, I do not object to fragment specifier use, but I do object
> to it being the only thing we can use. It does not scale. Worse, it
> would preclude using XML with servers such as DynaWeb/DynaBase that
> generate content dynamically, and may not even have the entity
> structure left for you to address. In this case, we would be forced to
> send however MB the source is in it's entirety, or *fake* an entity
> structure (easily done for DynaWeb, *much* harder for other types of
> databases). 

Could you please elaborate on why this is so difficult? If the server can
serve elements separately, then couldn't it make "entity wrappers" for 
every element?

> I seriously hope your objection to "special servers" doesn't mean that
> you think my motivation lies in the fact that I wrote DynaWeb, and
> wish to promote it... my motivation lies in trying to avoid a solution
> that doesn't scale well, and doesn't easily permit use of servers that
> do not have XML files laying around on them (like RDB, etc).

Your motivation does not come from having written DynaWeb, but your point
of view comes from having implemented something high tech and wanting that
functionality to be widely available. We all want that, but I don't believe
it is likely in the short term. Tieing XML to it could be very dangerous,
in my opinion. Even if XML survived the chicken and egg problem, many
users would not have access to its functionality because they could not 
get their ISP to install the CGIs or special server.

 Paul Prescod
Received on Wednesday, 8 January 1997 11:05:47 UTC