Re: suggest validator prefer URI to FPI from Dominique Hazaël-Massieux on 2004-08-17 (www-validator@w3.org from August 2004)

From: Dominique Hazaël-Massieux <dom@w3.org>
Date: Tue, 17 Aug 2004 15:35:01 +0200
To: Nick Kew <nick@webthing.com>
Cc: www-validator@w3.org
Message-Id: <1092749701.4811.155.camel@stratustier>
Le mar 17/08/2004 à 13:42, Nick Kew a écrit :
> There are also serious drawbacks to that.  URIs are used by W3C for
> two different and mutually-incompatible purposes:
> 
>  (1) As addresses that become meaningful only when dereferenced
>      (e.g. HTTP).
>  (2) As unique identifiers that are NOT dereferenced (e.g. RDF).

Hmm... I don't see how they are mutually incompatible; a URI is an
identifier; depending on the URI scheme, the said identifier may or may
not be dereferenceable; in some URI scheme, there is an authoritative
representation of the URI that can be obtained following a well-defined
protocol. For instance, http: URIs can be obtained through the HTTP
protocol, which also defines a caching mechanism.

> This leads to a lot of confusion: take for example Annotea, which
> treats URLs as unique (the RDF sense) yet requires them to be
> dereferenced (the HTTP sense), and thus fails spectacularly to deal
> with dynamic, negotiated or updated contents.

Hmm... We're drifting a long way off the initial discussion :) To reply
shortly, Annotea is indeed better used on stable resources rather than
changing ones - but stable resources doesn't mean static; also, I think
Annotea now deals well with content negotiation, using the
Content-Location header as it should. But I guess this should be rather
discussed on www-annotations :)

> The SGML semantics work better because they don't have that ambiguity.
> PUBLIC identifiers are not dereferenced - SYSTEM ones are.
> That's what XML inherits.

SYSTEM identifiers may be dereferenced, but needs not be. As such, they
are probably more interesting than public ones, with which you can't do
anything if you don't know them.


> > > That depends on how it would be determined whether FPI and SI "differ".
> > > For example, my document is
> > >
> > >   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> > >     "/dtd/xhtml11">
> 
> That usage clearly assumes a URI that will be dereferenced - NOT the
> identifier usage.

Not at all; it assumes that the URI will be made absolute, compared to
what the Validators already knows; if it doesn't know this system
identifier , the Validator is in a situation where the Public and System
Identifier may be in conflict. 

> The fact that it appears (or appeared) in the XHTML spec at W3C only
> serves to illustrate that URIs confuse.

I guess I disagree; the fact is that to compare reliable a URI, you need
first to go through a well-defined process to make them absolute [ less
well-defined but being defined is the question of canonicalization of
URI; but that's obviously an edge case in this context ]. The fact that
people may have copied blindly the doctype from the W3C Web server where
the System identifier was relative shows that:
- W3C should have avoided to do so
- people don't see the System Identifier as a URI
But I don't think there is anything intrinsically wrong...

> [ aside: XML Namespace is another source of confusion - is it not
> suggested somewhere that dereferencing an xmlns URI should lead to
> a schema for the namespace? ]

That's an issue the TAG is working on:
http://www.w3.org/2001/tag/issues.html#namespaceDocument-8

Again, an XML Namespace is before everything else an identifier; since
this identifier is a URI, and when the chosen URI scheme is
dereferenceable, it may provide useful information to Web agents, and
make it possible to deploy discovery mechanisms.

> > The Validator would notice that the System ID URI is not the one it
> > associates by default to the FPI; depending on the feasibility of the
> > different approaches, it could:
> 
> But both XML and SGML when using a SYSTEM FPI simply dereference it.
> In that instance, they load whatever they find at "/dtd/xhtml11"

Not necessarily loading; if they have preliminary knowledge of what this
system identifier is, they can load it from a cache, a catalogue, etc.
"""Attempts to retrieve the resource identified by a URI MAY be
redirected at the parser level (for example, in an entity resolver) or
below (at the protocol level, for example, via an HTTP Location:
header)"""
http://www.w3.org/TR/REC-xml/#dt-sysid

> A warning would be fair enough in principle.  But since "/dtd/xhtml11"
> is a perfectly valid relative URL, it should be looking for a DTD
> on the end-user's webserver if it's to prefer SYSTEM to PUBLIC FPI.
> That's a huge overhead - particularly with modular DTDs.

I agree that this may

> > Given that custom System IDs probably aren't that frequent anyway, I
> > think at least starting with 1 could be a benefit for the user.
> 
> Typos are not infrequent.  Neither are those that follow the erroneous
> examples that were in the XHTML specs at W3C.

Agreed; that's why I think the Validator should report this type of
errors; my point about "custom System IDs" was that precisely when the
System identifier differs from the "official" one, it's more likely to
be an error than intended.

> But by design, system IDs are allowed to be arbitrary.  I guess we could
> indeed flag up a warning in the special case of a recognised PUBLIC
> identifier with unrecognised SYSTEM ID.  But that's not the same as
> preferring the latter as a matter of course.

That's indeed different, but that's a good first step :)
Thanks all for your patience,

Dom
-- 
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C/ERCIM
mailto:dom@w3.org
Received on Tuesday, 17 August 2004 13:35:10 UTC