Re: Addressing.... from Gavin Nicol on 1997-01-13 (w3c-sgml-wg@w3.org from January 1997)

From: Gavin Nicol <gtn@ebt.com>
Date: Mon, 13 Jan 1997 13:56:24 -0500
To: dgd@cs.bu.edu
CC: w3c-sgml-wg@www10.w3.org
Message-Id: <199701131856.NAA16871@nathaniel.ebt>
>>This is the scalability problem. It get's worse when a server
>>interfaces to databases that *generate* XML, but don't manipulate
>>it. They have a harder time with faking entity boundaries.
>
>I think this is a purple herring nailed to the ceiling: Since XML documents
>don't need DTDs, a smart server can send any XML element as if it were a
>document without doing anything with entity boundaries. 

The problem is deciding how to break things up. When I said "faking
entity boundaries" that includes faking document entity boundaries, or
chunking, or whatever else you want to call it. What you said above
doesn't change the real problem (and BTW, what you describe above is
pretty much what DynaWeb does).

>What kind of database are you thinking of, where the addressing
>format would matter in this way? 

An RDBMS for example: you can refer to the entire database (1,200,000
records of 245 fields each) as:

  http://foo.com/foo       - Whole shebang
  http://foo.com/foo/1     - Record #1
  http://foo.com/foo/1/123 - Record #1, field #2

or how about a document database:

  http://bigdoc.com/bigdoc        - 120MB SGML Document
  http://bigdoc.com/bigdoc/chap=1 - First chapter

It doesn't necessarily have to be a database either: it could be a
bento storage object, an attributed filesystem, a versioned
filesystem, or anything else that *could* be used to generate an XML
document. 

The real point is that you have a simple, fairly intuitive, addressing
syntax that allows you to point to objects in a heirarchy by typed
occurence. Resolving that on the server is more scalable than doing it
on the client, or we'd be sending across entire filesystems and
walking directory paths inside a client. 

I don't care for the term "entitise" (even though I coined it), though
the original discussion was in the context of *large* SGML/XML
documents. A better word would be "chunking"... breaking an object
into it's components in such a way that the objects are addressable.

The main difference is in where you see object boundaries. I see them
at the smallest container level... 

>>Yes, but then you assume server-side support for these cases... the
>>thing you are disagreeing with. Remove server-side support, and the
>>model becomes that of retrieving entities.
>I pointed out that you can use dynamic server-side support _if you want_.
>This means that your requirement is not catered-to, but neither is it
>obviated.

Right, then you and I have been arguing over nothing, because I have
been arguing against your promoting fragment specifiers to the
exclusion of server-side resolution. If we both agree that each has a
place, and that neither deserves preference, we have no argument.

>>My major concern here is that if the only *standard* way of addressing
>>individual elements is client-side, then people will only use that,
>>building up momentum against finer granularity object addressing.
>>It is better to establish a momentum toward superior solutions as
>>early as possible in order to avoid pain further down the road
>>(different to *requiring* superior solutions from the get-go).
>
>I don't believe that it's worth the effort of defining complex fragement
>transport mechanisms to encourage people to do something that they can do
>even if we punt on that work. If the world starts beating down our doors on
>this we can revisit it at any time.

Well *I* have been involved in efforts that took precisely that path,
and many/most have caused more pain than the pain they tried to avoid.
As my father used to say "An ounce of prevention is worth a pound of
cure". 

>>I think that the initial *publishers* of XML content will generally
>>have at least partial control over the site, as XML is going to be a
>>tiny niche for some time, and so will require publishers dedicated to
>>it's deployment. Fragment specifiers are fine for smaller documents,
>>but I think many initial XML publishers will be dealing with large
>>documents.
>
>This may be a good reason to encourage client implementers to implement
>lazy entity retrieval. Or, those publishers can use smart server
>technology, since we have not prevented them from doing so (we simply
>haven't standardized _how_ to do it).

You can't have lazy entity retreival unless you have an engine capable
of generating entity boundaries (or chunk, in other words). I am not
for standardising the *mechanism* either, just the addressing scheme
to be used (which is a simple extension of existing heirarchical,
supposedly opaque, relative URL's (oxymoron in the last two)). 

>>BTW. When you refer to "high-end", I think you falsely make the
>>problem seem hard. It's trivial.
>
>It's still more sophisitcated than 90% of all the servers out there, and
>still undeployed. So it's not high-end because it's amaxingly hard, but
>it's still a significant cut above present-day practice.

Depends. A lot of web servers now have scripting built in. I consider
that far more complex than what I am proposing....

>What I am saying is that I have raised this same point repeatedly, on list,
>and face-to-face, and the reaction has always been that clients are not
>supposed to make assumptions about how URLs are formed. No one denies that
>it would work, if you did it universally, but there is a strong committment
>to this as an architectural principle, that seems unlikely to change, since
>it is always presented as a fundamental architectural principle of
>URLs. 

I see this as a fundamental lie. URL structure is always interpreted
by clients and/or servers. Some people like LM et al. waste a lot of
people's time on issues of conceptual purity where none lies.
Received on Monday, 13 January 1997 14:11:18 UTC