- From: Dominique Hazaël-Massieux <dom@w3.org>
- Date: Tue, 17 Aug 2004 15:35:01 +0200
- To: Nick Kew <nick@webthing.com>
- Cc: www-validator@w3.org
- Message-Id: <1092749701.4811.155.camel@stratustier>
Le mar 17/08/2004 à 13:42, Nick Kew a écrit : > There are also serious drawbacks to that. URIs are used by W3C for > two different and mutually-incompatible purposes: > > (1) As addresses that become meaningful only when dereferenced > (e.g. HTTP). > (2) As unique identifiers that are NOT dereferenced (e.g. RDF). Hmm... I don't see how they are mutually incompatible; a URI is an identifier; depending on the URI scheme, the said identifier may or may not be dereferenceable; in some URI scheme, there is an authoritative representation of the URI that can be obtained following a well-defined protocol. For instance, http: URIs can be obtained through the HTTP protocol, which also defines a caching mechanism. > This leads to a lot of confusion: take for example Annotea, which > treats URLs as unique (the RDF sense) yet requires them to be > dereferenced (the HTTP sense), and thus fails spectacularly to deal > with dynamic, negotiated or updated contents. Hmm... We're drifting a long way off the initial discussion :) To reply shortly, Annotea is indeed better used on stable resources rather than changing ones - but stable resources doesn't mean static; also, I think Annotea now deals well with content negotiation, using the Content-Location header as it should. But I guess this should be rather discussed on www-annotations :) > The SGML semantics work better because they don't have that ambiguity. > PUBLIC identifiers are not dereferenced - SYSTEM ones are. > That's what XML inherits. SYSTEM identifiers may be dereferenced, but needs not be. As such, they are probably more interesting than public ones, with which you can't do anything if you don't know them. > > > That depends on how it would be determined whether FPI and SI "differ". > > > For example, my document is > > > > > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" > > > "/dtd/xhtml11"> > > That usage clearly assumes a URI that will be dereferenced - NOT the > identifier usage. Not at all; it assumes that the URI will be made absolute, compared to what the Validators already knows; if it doesn't know this system identifier , the Validator is in a situation where the Public and System Identifier may be in conflict. > The fact that it appears (or appeared) in the XHTML spec at W3C only > serves to illustrate that URIs confuse. I guess I disagree; the fact is that to compare reliable a URI, you need first to go through a well-defined process to make them absolute [ less well-defined but being defined is the question of canonicalization of URI; but that's obviously an edge case in this context ]. The fact that people may have copied blindly the doctype from the W3C Web server where the System identifier was relative shows that: - W3C should have avoided to do so - people don't see the System Identifier as a URI But I don't think there is anything intrinsically wrong... > [ aside: XML Namespace is another source of confusion - is it not > suggested somewhere that dereferencing an xmlns URI should lead to > a schema for the namespace? ] That's an issue the TAG is working on: http://www.w3.org/2001/tag/issues.html#namespaceDocument-8 Again, an XML Namespace is before everything else an identifier; since this identifier is a URI, and when the chosen URI scheme is dereferenceable, it may provide useful information to Web agents, and make it possible to deploy discovery mechanisms. > > The Validator would notice that the System ID URI is not the one it > > associates by default to the FPI; depending on the feasibility of the > > different approaches, it could: > > But both XML and SGML when using a SYSTEM FPI simply dereference it. > In that instance, they load whatever they find at "/dtd/xhtml11" Not necessarily loading; if they have preliminary knowledge of what this system identifier is, they can load it from a cache, a catalogue, etc. """Attempts to retrieve the resource identified by a URI MAY be redirected at the parser level (for example, in an entity resolver) or below (at the protocol level, for example, via an HTTP Location: header)""" http://www.w3.org/TR/REC-xml/#dt-sysid > A warning would be fair enough in principle. But since "/dtd/xhtml11" > is a perfectly valid relative URL, it should be looking for a DTD > on the end-user's webserver if it's to prefer SYSTEM to PUBLIC FPI. > That's a huge overhead - particularly with modular DTDs. I agree that this may > > Given that custom System IDs probably aren't that frequent anyway, I > > think at least starting with 1 could be a benefit for the user. > > Typos are not infrequent. Neither are those that follow the erroneous > examples that were in the XHTML specs at W3C. Agreed; that's why I think the Validator should report this type of errors; my point about "custom System IDs" was that precisely when the System identifier differs from the "official" one, it's more likely to be an error than intended. > But by design, system IDs are allowed to be arbitrary. I guess we could > indeed flag up a warning in the special case of a recognised PUBLIC > identifier with unrecognised SYSTEM ID. But that's not the same as > preferring the latter as a matter of course. That's indeed different, but that's a good first step :) Thanks all for your patience, Dom -- Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/ W3C/ERCIM mailto:dom@w3.org
Received on Tuesday, 17 August 2004 13:35:10 UTC