- From: Gavin Nicol <gtn@ebt.com>
- Date: Mon, 13 Jan 1997 13:56:24 -0500
- To: dgd@cs.bu.edu
- CC: w3c-sgml-wg@www10.w3.org
>>This is the scalability problem. It get's worse when a server >>interfaces to databases that *generate* XML, but don't manipulate >>it. They have a harder time with faking entity boundaries. > >I think this is a purple herring nailed to the ceiling: Since XML documents >don't need DTDs, a smart server can send any XML element as if it were a >document without doing anything with entity boundaries. The problem is deciding how to break things up. When I said "faking entity boundaries" that includes faking document entity boundaries, or chunking, or whatever else you want to call it. What you said above doesn't change the real problem (and BTW, what you describe above is pretty much what DynaWeb does). >What kind of database are you thinking of, where the addressing >format would matter in this way? An RDBMS for example: you can refer to the entire database (1,200,000 records of 245 fields each) as: http://foo.com/foo - Whole shebang http://foo.com/foo/1 - Record #1 http://foo.com/foo/1/123 - Record #1, field #2 or how about a document database: http://bigdoc.com/bigdoc - 120MB SGML Document http://bigdoc.com/bigdoc/chap=1 - First chapter It doesn't necessarily have to be a database either: it could be a bento storage object, an attributed filesystem, a versioned filesystem, or anything else that *could* be used to generate an XML document. The real point is that you have a simple, fairly intuitive, addressing syntax that allows you to point to objects in a heirarchy by typed occurence. Resolving that on the server is more scalable than doing it on the client, or we'd be sending across entire filesystems and walking directory paths inside a client. I don't care for the term "entitise" (even though I coined it), though the original discussion was in the context of *large* SGML/XML documents. A better word would be "chunking"... breaking an object into it's components in such a way that the objects are addressable. The main difference is in where you see object boundaries. I see them at the smallest container level... >>Yes, but then you assume server-side support for these cases... the >>thing you are disagreeing with. Remove server-side support, and the >>model becomes that of retrieving entities. >I pointed out that you can use dynamic server-side support _if you want_. >This means that your requirement is not catered-to, but neither is it >obviated. Right, then you and I have been arguing over nothing, because I have been arguing against your promoting fragment specifiers to the exclusion of server-side resolution. If we both agree that each has a place, and that neither deserves preference, we have no argument. >>My major concern here is that if the only *standard* way of addressing >>individual elements is client-side, then people will only use that, >>building up momentum against finer granularity object addressing. >>It is better to establish a momentum toward superior solutions as >>early as possible in order to avoid pain further down the road >>(different to *requiring* superior solutions from the get-go). > >I don't believe that it's worth the effort of defining complex fragement >transport mechanisms to encourage people to do something that they can do >even if we punt on that work. If the world starts beating down our doors on >this we can revisit it at any time. Well *I* have been involved in efforts that took precisely that path, and many/most have caused more pain than the pain they tried to avoid. As my father used to say "An ounce of prevention is worth a pound of cure". >>I think that the initial *publishers* of XML content will generally >>have at least partial control over the site, as XML is going to be a >>tiny niche for some time, and so will require publishers dedicated to >>it's deployment. Fragment specifiers are fine for smaller documents, >>but I think many initial XML publishers will be dealing with large >>documents. > >This may be a good reason to encourage client implementers to implement >lazy entity retrieval. Or, those publishers can use smart server >technology, since we have not prevented them from doing so (we simply >haven't standardized _how_ to do it). You can't have lazy entity retreival unless you have an engine capable of generating entity boundaries (or chunk, in other words). I am not for standardising the *mechanism* either, just the addressing scheme to be used (which is a simple extension of existing heirarchical, supposedly opaque, relative URL's (oxymoron in the last two)). >>BTW. When you refer to "high-end", I think you falsely make the >>problem seem hard. It's trivial. > >It's still more sophisitcated than 90% of all the servers out there, and >still undeployed. So it's not high-end because it's amaxingly hard, but >it's still a significant cut above present-day practice. Depends. A lot of web servers now have scripting built in. I consider that far more complex than what I am proposing.... >What I am saying is that I have raised this same point repeatedly, on list, >and face-to-face, and the reaction has always been that clients are not >supposed to make assumptions about how URLs are formed. No one denies that >it would work, if you did it universally, but there is a strong committment >to this as an architectural principle, that seems unlikely to change, since >it is always presented as a fundamental architectural principle of >URLs. I see this as a fundamental lie. URL structure is always interpreted by clients and/or servers. Some people like LM et al. waste a lot of people's time on issues of conceptual purity where none lies.
Received on Monday, 13 January 1997 14:11:18 UTC