- From: Arjun Ray <aray@q2.net>
- Date: Sun, 20 Feb 2000 10:54:13 -0500 (EST)
- To: W3C HTML <www-html@w3.org>
On Fri, 18 Feb 2000, Dan Connolly wrote: > Arjun Ray wrote: > > In terms of the great Undead Debate - Names vs Addresses - PUBLIC > > is basically a "name" and SYSTEM is basically an "address". ISO 8879 doesn't really explain PUBLIC and SYSTEM. That was left to the _SGML Handbook_, I suppose:) The discussion of 10.1.6 "External Identifier" (p.377 ff) has this: : A _system identifier_ is system-specific information that enables : the entity manager of an SGML system to locate the file or the : memory location or the pointer within a file where the entity can be : found. [...] It should also be noted that a system identifier could : be an invocation of a program that controls access to an entity that : is being identified. [...] : A _public identifier_ is a name that is intended to be meaningful : across systems and different user environments. [...] Taking this as descriptions of canonical semantics, I think the case for "Cool URLs" being PUBLIC identifiers is pretty clear cut. If we think that there's already too much definitional baggage in or usage overloading of PUBLIC and SYSTEM, we might be better off with a new name altogether, say EXTERN, but I'd still maintain that EXTERN would be more PUBLIC than SYSTEM:) [Another limited analogy: it seems that #include <stdio.h> is kinda PUBLIC-ish, while #include "foo.h" is kinda SYSTEM-ish; both forms are useful, but never simultaneously for the same entity; which would one want for a clean-compile-anywhere out of the box?:)] > The only issue is the constraints on the syntax of what goes > inside PUBLIC "..." vs. the unconstrained SYSTEM "..." syntax. I > completely agree that we only need one and that it's awkward that > XML 1.0 picked the SYSTEM "..." as the required slot, but (if I > understand correctly) that was the only slot where URIs fit, Yes. The Great Triage had a very tight deadline (Nov 1), so rather than rethink the whole external identifier mess, everything got included in the maximally expanded form. There was no inherent reason to rule out the SGML practice of "implied SYSTEM" (the *superior* and perhaps only proper use of the SYSTEM keyword, IMHO), but if it was a question of accommodating URLs at short notice, the system literal tagging along mandatorily seemed convenient. (I'm pretty sure that a lot of discussion was suppressed voluntarily because it was believed that there would be a 1.1 spec in relatively short order. Since that never materialized, we're stuck with a number of decisions that, had a more generous time limit been allowed, might just have come out quite differently.) > and I'd rather have a misnomer in the syntax than trade in > the benefits of the widely deployed Uniform Identifier syntax. It's a tough call. The point, I think, is not to lose sight of the goal. Right now, we have a how-to-get-there-from-here problem, but bagging PUBLIC to keep SYSTEM is not the way to go: if anything, SGML interoperability would say that a proper catalog mechanism obviates SYSTEM (except perhaps as a minimization), so that's the one that should be obsoleted over the longer haul. I think the syntax deployment issue has a large element of perception involved. That is, the compactness is a convenience, but the syntax per se isn't, necessarily. Most of the time, the software already "knows" that it's looking at a URI and proceeds to use it; we're not at the stage where URIs can be plucked from undifferentiated streams of text. In fact, whether a candidate string *could* be a URI is a very non-trivial exercise, which doesn't rate to get any easier over time: http://www.deja.com/=dnc/getdoc.xp?AN=513160002 http://www.deja.com/=dnc/getdoc.xp?AN=513219055 So, I'd say that deployment of the syntax "as a whole" is much more fragmented than many of us would like to believe. In that sense, it's important (IMHO) not to fall into the KTWSFN [1] mentality which says that the old way of grokking has gotta be the only way to grok. On the contrary, since URIs *do* have internal syntax, isn't it kind of strange to inhibit or prohibit a *syntactic formalism* like SGML/XML if it could be convenient? [1] Keeping The Web Safe For Netploder > > A win/win:) > > Hmm... well... it doesn't seem that the "+//IDN ..." syntax > accomodates URIs that aren't DNS-based; e.g. uuid:23io423oi423oi4 > or oid:12.424.54.34.23.23.45.24 or even mid:l2k3j42lkj3@foo.com > or tel:+1-444-555-1212 or futurescheme:whatever-goes-here . (a) No surprise: IDN is "internet domain name":) (b) This is an "issue" in only one situation: the *minimized* external identifier of a doctype declaration. (e.g. is the 'file:' scheme really entitled to the 'U' in 'URI'?) > The "http:" in "http://www.w3.org/" serves not only as a clue > to what network protocol to try, but also to dispatch between > completely orthogonal naming schemes. > > In a way, the registry of URI schemes (http:, ftp:, ...) is > analagous to the registry of registered owner identifiers (+//xyz). Very reasonable, but that may not be the proper way to look at it, SGML/XML-wise. URI schemes are actually *notations*! That is, it may *not* be advantageous to "forget" that entities can have notations attached to them. (This is actually an exploitation of the normal "data content" semantic, when it's known that the app will be handling it anyway.) It's similar in spirit to the convenience offered by many browsers, where you don't have to type in the 'http://' of a URL if the software already "knows what you mean". Unfortunately, a full exploitation of the idea needs data attributes, which XML doesn't have. But there's probably room for a specialized XML app dealing with URI schemes specifically (if only to reintroduce ideas that got left out when FSIs bit the dust, such as local character encodings and the like.) > [...] Hmm... in practice, I wonder if it would be easier to get > ISO to relax the syntactic constraings on PUBLIC identifiers than > to deploy a convention of mapping +//IDN foo.org/... to > http://foo.org/... or ftp://foo.org/... or whatever. ISO9070 becomes a constraint only in relation to FORMAL YES. Why couldn't YES be expanded to more than one convention? (An issue for WG4, I suppose.) > Probably ISO isn't the rate-limiting-factor; probably, the > consequences of putting PUBLIC "http://..." into deployed SGML > systems is the deployment constraint. But I haven't done much > testing. The only problem I see offhand is that catalog support these days can also assume partial matches of PUBLIC ids. If PUBLIC ids were kept opaque upto the application level, there shouldn't be an issue at all. (*Really* technically, FORMAL NO should warn off assumptions, but I'm not confident about this either.) Arjun
Received on Sunday, 20 February 2000 10:28:37 UTC