W3C home > Mailing lists > Public > www-validator@w3.org > August 2004

Re: suggest validator prefer URI to FPI

From: Nick Kew <nick@webthing.com>
Date: Tue, 17 Aug 2004 12:42:46 +0100 (BST)
To: Dominique Haza√ęl-Massieux <dom@w3.org>
Cc: www-validator@w3.org
Message-ID: <Pine.LNX.4.53.0408171214030.1945@hugin.webthing.com>

On Tue, 17 Aug 2004, Dominique [ISO-8859-1] HazaŽl-Massieux wrote:

> Le ven 06/08/2004 ŗ 05:26, Bjoern Hoehrmann a ťcrit :
> > >I think DanC's point was that since URIs are preferred to FPIs in the
> > >Web Architecture,
> >
> > They are not as far as I can tell.
>
> The WebArch document has
> "There are substantial benefits to participating in the existing network
> of URIs ... there are substantial costs to creating a new identification
> system that has the same properties as URIs."
> http://www.w3.org/TR/2004/WD-webarch-20040705/#uri-benefits

There are also serious drawbacks to that.  URIs are used by W3C for
two different and mutually-incompatible purposes:

 (1) As addresses that become meaningful only when dereferenced
     (e.g. HTTP).
 (2) As unique identifiers that are NOT dereferenced (e.g. RDF).

This leads to a lot of confusion: take for example Annotea, which
treats URLs as unique (the RDF sense) yet requires them to be
dereferenced (the HTTP sense), and thus fails spectacularly to deal
with dynamic, negotiated or updated contents.

The SGML semantics work better because they don't have that ambiguity.
PUBLIC identifiers are not dereferenced - SYSTEM ones are.
That's what XML inherits.

> >  If they are, the proper place to
> > discuss this would be the XML Core Working Group so they can write
> > this important bit of information into the XML 1.0 Recommendation.
> > Until that happens, SIs are not preferred to FPIs in any relevant way.
>
> Note that indeed, SIs are not preferred to FPIs according to any
> relevant spec; I think the point is "if you develop something with the
> Web in mind, try and use URIs in preference to another identification
> system". Since the Validator is definitely developed with the Web in
> mind, DanC was suggesting to investigate the benefits one could get of
> using URIs.

But the very next example demonstrates a problem with that.

> > That depends on how it would be determined whether FPI and SI "differ".
> > For example, my document is
> >
> >   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> >     "/dtd/xhtml11">

That usage clearly assumes a URI that will be dereferenced - NOT the
identifier usage.

The fact that it appears (or appeared) in the XHTML spec at W3C only
serves to illustrate that URIs confuse.

[ aside: XML Namespace is another source of confusion - is it not
suggested somewhere that dereferencing an xmlns URI should lead to
a schema for the namespace? ]

> > [...]
> >
> > It's like that so I can ssh to the server and run `xmlvalid` on the
> > entire file tree without need for external resources or a catalog
> > system. What would the Validator do exactly?
>
> The Validator would notice that the System ID URI is not the one it
> associates by default to the FPI; depending on the feasibility of the
> different approaches, it could:

But both XML and SGML when using a SYSTEM FPI simply dereference it.
In that instance, they load whatever they find at "/dtd/xhtml11"
(if that is meaningful - which it may be).  To do otherwise would
break the specs and every implementation.

> 1. simply emit a warning saying that it doesn't know whether the System
> ID matches the FPI, and lists the "officials" System IDs bound to the
> FPI
> 2. download and cache the DTD, and "compare" it to the official DTD -
> I've no idea how feasible it is to compare DTDs though - emitting an
> error if they don't match, and validating using the downloaded DTD
> 3. download and cache the DTD, validate the document with the downloaded
> DTD and emit the warning as in 1.

A warning would be fair enough in principle.  But since "/dtd/xhtml11"
is a perfectly valid relative URL, it should be looking for a DTD
on the end-user's webserver if it's to prefer SYSTEM to PUBLIC FPI.
That's a huge overhead - particularly with modular DTDs.

> Given that custom System IDs probably aren't that frequent anyway, I
> think at least starting with 1 could be a benefit for the user.

Typos are not infrequent.  Neither are those that follow the erroneous
examples that were in the XHTML specs at W3C.

> >  If /dtd/xhtml11 is
> > http://www.w3.org/TR/2001/REC-xhtml11-20010531/DTD/xhtml11-flat.dtd
> > it would seem inappropriate to fetch additional 150KB document from
> > my server any time someone validates one of my documents
>
> (Note that it wouldn't need to be each time someonce validates the
> document; that's what caching is for)

Cacheing isn't implemented.  Perhaps that should go on the agenda for
qa-dev?

> > , as it would
> > seem inappropriate to suggest that there is anything wrong with
> > my document.
>
> It depends on how wrong this is suggested to be; I don't think a simple
> warning that the System ID is different would be inappropriate.

But by design, system IDs are allowed to be arbitrary.  I guess we could
indeed flag up a warning in the special case of a recognised PUBLIC
identifier with unrecognised SYSTEM ID.  But that's not the same as
preferring the latter as a matter of course.

-- 
Nick Kew
Received on Tuesday, 17 August 2004 11:43:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:14:08 UTC