Re: http URIs as names and scalability

On Monday, August 2, 2004, 8:33:07 PM, Dan wrote:

DC> I think we're slated to discuss some comments regarding

DC> "A URI owner SHOULD provide representations of the
DC> identified resource."
DC>   --

DC> I discovered a thread from a while ago that has some useful
DC> input, regarding scalability

Note - for DTD below read 'external DTD subset' if you want to be

Well, the SGML specification is very clear - a DTD must be available for
processing. It says nothing about how fresh that DTD is and doesn't
really concern networks.

The XML specification is also very clear - for validating processors
reading a DTD is required and for non validating ones, its optional.
Again, it says nothing about how fresh the DTD has to be. However if the
URI of the DTD is an HTTP one, the HTTP spec tells you all about cache
timings and expiry dates and checking back one tenth of the time from
the last modify date to now and all that stuff.

Historically, both SGML and XML use catalog files to do a mapping from
FPIs to cached, local copies of the DTD. As long as resolution is seen
as a two step process - see if the FPI is known, then see if the URI can
be retrieved - the fact that this 'cacheing' strategy does not follow
the TTP rues is not an architectural problem. It does mean though that
the FPI is the primary identifier and the URI is a secondary, fallback

For namespace names, fetching the resource is "even more optional" and
there is no particular cacheing strategy deployed.

DC> It's certainly relevant to quality of implementation,
DC> if not webarch; our web servers are spending a lot of their
DC> time serving up DTDs.

I agree this is a quality of implementation issue. I have occasionally
come across SVG implementations that would not run offline because they
used a validating parser and because they did not have Oasis catalog
support and did not have a persistent HTTP cache and thus fetched the
DTD every single time.

DC> I wonder if it's a new issue.

The only architectural issue I can see is where the DTD is served with
Pragma: no-cache or a very recent last-modify date, and where the
catalog never updates. That seems to be an evolvability aspect of the
Oasic catalog spec, though (presumably there should be an option to
refresh the DTDs and to store the URI from which they were obtained and
the metadata (etags, last modify, fetch date, cache policy) so that the
catalog can update itself if required. Although people might also want
to lock to particular versions of a DTD, so it ties into the whole
versioning issue.

DC> It's a little like

DC> but different too.

 Chris Lilley          
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group

Received on Tuesday, 3 August 2004 13:28:50 UTC