- From: Nick Kew <nick@webthing.com>
- Date: Tue, 17 Aug 2004 12:42:46 +0100 (BST)
- To: Dominique Hazaël-Massieux <dom@w3.org>
- Cc: www-validator@w3.org
On Tue, 17 Aug 2004, Dominique [ISO-8859-1] Hazaël-Massieux wrote: > Le ven 06/08/2004 à 05:26, Bjoern Hoehrmann a écrit : > > >I think DanC's point was that since URIs are preferred to FPIs in the > > >Web Architecture, > > > > They are not as far as I can tell. > > The WebArch document has > "There are substantial benefits to participating in the existing network > of URIs ... there are substantial costs to creating a new identification > system that has the same properties as URIs." > http://www.w3.org/TR/2004/WD-webarch-20040705/#uri-benefits There are also serious drawbacks to that. URIs are used by W3C for two different and mutually-incompatible purposes: (1) As addresses that become meaningful only when dereferenced (e.g. HTTP). (2) As unique identifiers that are NOT dereferenced (e.g. RDF). This leads to a lot of confusion: take for example Annotea, which treats URLs as unique (the RDF sense) yet requires them to be dereferenced (the HTTP sense), and thus fails spectacularly to deal with dynamic, negotiated or updated contents. The SGML semantics work better because they don't have that ambiguity. PUBLIC identifiers are not dereferenced - SYSTEM ones are. That's what XML inherits. > > If they are, the proper place to > > discuss this would be the XML Core Working Group so they can write > > this important bit of information into the XML 1.0 Recommendation. > > Until that happens, SIs are not preferred to FPIs in any relevant way. > > Note that indeed, SIs are not preferred to FPIs according to any > relevant spec; I think the point is "if you develop something with the > Web in mind, try and use URIs in preference to another identification > system". Since the Validator is definitely developed with the Web in > mind, DanC was suggesting to investigate the benefits one could get of > using URIs. But the very next example demonstrates a problem with that. > > That depends on how it would be determined whether FPI and SI "differ". > > For example, my document is > > > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" > > "/dtd/xhtml11"> That usage clearly assumes a URI that will be dereferenced - NOT the identifier usage. The fact that it appears (or appeared) in the XHTML spec at W3C only serves to illustrate that URIs confuse. [ aside: XML Namespace is another source of confusion - is it not suggested somewhere that dereferencing an xmlns URI should lead to a schema for the namespace? ] > > [...] > > > > It's like that so I can ssh to the server and run `xmlvalid` on the > > entire file tree without need for external resources or a catalog > > system. What would the Validator do exactly? > > The Validator would notice that the System ID URI is not the one it > associates by default to the FPI; depending on the feasibility of the > different approaches, it could: But both XML and SGML when using a SYSTEM FPI simply dereference it. In that instance, they load whatever they find at "/dtd/xhtml11" (if that is meaningful - which it may be). To do otherwise would break the specs and every implementation. > 1. simply emit a warning saying that it doesn't know whether the System > ID matches the FPI, and lists the "officials" System IDs bound to the > FPI > 2. download and cache the DTD, and "compare" it to the official DTD - > I've no idea how feasible it is to compare DTDs though - emitting an > error if they don't match, and validating using the downloaded DTD > 3. download and cache the DTD, validate the document with the downloaded > DTD and emit the warning as in 1. A warning would be fair enough in principle. But since "/dtd/xhtml11" is a perfectly valid relative URL, it should be looking for a DTD on the end-user's webserver if it's to prefer SYSTEM to PUBLIC FPI. That's a huge overhead - particularly with modular DTDs. > Given that custom System IDs probably aren't that frequent anyway, I > think at least starting with 1 could be a benefit for the user. Typos are not infrequent. Neither are those that follow the erroneous examples that were in the XHTML specs at W3C. > > If /dtd/xhtml11 is > > http://www.w3.org/TR/2001/REC-xhtml11-20010531/DTD/xhtml11-flat.dtd > > it would seem inappropriate to fetch additional 150KB document from > > my server any time someone validates one of my documents > > (Note that it wouldn't need to be each time someonce validates the > document; that's what caching is for) Cacheing isn't implemented. Perhaps that should go on the agenda for qa-dev? > > , as it would > > seem inappropriate to suggest that there is anything wrong with > > my document. > > It depends on how wrong this is suggested to be; I don't think a simple > warning that the System ID is different would be inappropriate. But by design, system IDs are allowed to be arbitrary. I guess we could indeed flag up a warning in the special case of a recognised PUBLIC identifier with unrecognised SYSTEM ID. But that's not the same as preferring the latter as a matter of course. -- Nick Kew
Received on Tuesday, 17 August 2004 11:43:19 UTC