W3C home > Mailing lists > Public > semantic-web@w3.org > August 2006

Re: In defence of 404 ...

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Mon, 07 Aug 2006 12:15:52 +0200
Message-ID: <44D712D8.3050407@mondeca.com>
To: Dan Connolly <connolly@w3.org>
CC: semantic-web@w3.org, Eric van der Vlist <vdv@dyomedea.com>, Franck Cotton <franck.cotton@insee.fr>

Dan

Out of your exchanges, Eric will certainly come out with a smart and 
elegant solution for serving something else than 404 pages for the 
various URIs in the namespace http://rdf.insee.fr/geo/, and (almost) 
everybody and the TAG will be happy with it. Nevertheless, I would like 
to go further in this discussion.

Don Connolly a écrit :
> On Fri, 2006-08-04 at 23:53 +0200, Bernard Vatant wrote:
> [...]
>   
>> And actually, this should be the general situation in SW publication :
>> there is no authoritative, definitive, complete, description of a
>> resource, packaged in one file, with a single access point.
>>     
>
> Yes, there are authoritative descriptions in the Semantic Web.
>   
Yes, there are. But if you look at them, those are rather exceptions 
than the general rule. When I write the *general situation*, it means 
what it means : most resources can't have a *single* authoritative 
description, for various reasons. More below.
> Perhaps not complete definitions, but for a URI such
> as  http://www.w3.org/2000/01/rdf-schema#subClassOf
> any document you get back by doing an http GET
> of http://www.w3.org/2000/01/rdf-schema
> is authoritative. 
Indeed. That looks an exceptional case, because the semantics of 
rdfs:subClassOf are cast in stone in the W3C specification and everyone 
doing a GET of http://www.w3.org/2000/01/rdf-schema will get the same 
authoritative description in secula seculorum. Or so you think. Suppose 
W3C is disbanded some time in the future (no human organisation is 
forever), and the Web is still there anyway, and the domain w3.org is 
for sale, and some entity buys it, and does not care about whatever has 
been published before and puts any silly document at 
http://www.w3.org/2000/01/rdf-schema. Or, some hacker takes the control 
of W3C servers and publishes Ben Laden declarations at this page. Or, 
more simply, the servers are down, and you get 404. How will 
applications know, in any of those cases that the content is not 
authoritative any more? I guess all applications using RDFS will not 
stop running in any of those cases, because they have cached or 
built-in  the semantics of 
http://www.w3.org/2000/01/rdf-schema#subClassOf. When I use Protégé 
off-line, it does not stop running because it can't check the semantics 
of rdfs:subClassOf at runtime, right?
> That's how the web works: we all agree
> that if you lease/buy a domain name, you get to say what
> the URIs starting with http:// and that domain mean, and
> we agree that if you run a web server and serve up
> documents there, they are authoritative w.r.t. the meanings
> of those URIs.
>   
As long as you have control, yes, but anything can happen. Euclide's 
elements were not made obsolete when they burnt with Alexandria's 
Library, thanks to many cached copies.
> Anybody else is free to say things about rdfs:subClassOf,
> but the document that W3C serves up at
> http://www.w3.org/2000/01/rdf-schema says that rdfs:subClassOf
> is an rdf:Property, and is some other document 
> says that it's not an rdf:Property, that other document should
> be considered in error.
>
>   
As I will show in INSEE example, it can happen that the same publisher, 
under the same namespace, can make different descriptions, with 
different and even conflicting semantics, of the same entity (defined by 
the same URI). The RDFS specification is a bad counter-example because 
it defines entities like logical and mathematical entities, which are 
defined by time-independent axioms, but don't pretend to represent 
real-world entities like cities, products or people, of which main 
characteristic is to be both permanent and changing, like the Ship of 
Theseus.
>>  So, the best an URI can do, when its referent is not an accessible
>> thing, and that its main purpose is identifying the resource in
>> distributed descriptions, if one wants to make sense of it through
>> http protocol - since it's an http URI after all - is to get acces
>> some information like : "Sorry, what you try to access by this URI is
>> not an accessible resource. But its description can be found in RDF
>> files X, Y, Z, ...".
>>     
>
> That's not the best we can do.
> If you use URIs of the form DOC#TERM for non-information
> resources, then the information resource DOC can
> say things like { <#TERM> rdf:type geo:City }.
>
>   
I think you miss here the mots important point. It's not yet another 
hash-vs-slash discussion. Be it one document or a fragment, the point is 
that I can't have a *single consistent* description of the resource.
>>  And the more I think about it, the more I think that the 404 page
>> that you get through http://rdf.insee.fr/geo/COM_80078 is close to
>> that. Agreed, the current message displayed on the page is suboptimal,
>> independently of the fact that it is in French, but replace it by the
>> quote I suggest above, and it makes much more sense that any fragment
>> identifier. 
>>     
>
> Really? It doesn't appeal to me at all.
>
>   
Well, the point is not to be sexy here, but conformant to what happens 
in the real worl. Inaccessible means inaccessible. Full stop.
>> Maybe in contradiction with what I wrote in a previous message, where
>> I suggested that maybe we could have kept the # namespace for the
>> ontology, I think now that this argument holds for ontology elements
>> as well. Granted, we have now published a single ontology file
>> containing a description of e.g., http://rdf.insee.fr/geo/Commune. But
>> next year we can have another version, or another ontology defining
>> the same entity, with the same URI, at another level of detail, and
>> which the publisher would not like to see merged with the previous
>> one.
>>     
>
> Hmm... I can't imagine why not. Care to elaborate?
>   
You know we got something in Europe called History. It means that things 
change over time. I heard you have something of the like on your side of 
the ocean, is that correct? Take for example the class 
http://rdf.insee.fr/geo/Region. See 
http://fr.wikipedia.org/wiki/R%C3%A9gion_fran%C3%A7aise to figure that 
the concept of "Région" in its current definition, as an administrative 
territory, was defined by a long and difficult process (including 
resignation of Charles de Gaulle from Presidency in 1969) of which 
formal achievment is quite recent (1982). Is INSEE had published a geo 
ontology in the 60's, this class would not have been included, nor the 
subdivision of regions in departments. But departments were there, most 
of them were defined by Napoleon about 200 years ago, some of them have 
changed names over time (Seine-Inférieure became Seine-Maritime), some 
have been splitted, like Corsica which was splitted in two departments 
in 1976, etc ...

So, a description which was authoritative in 1960 is maybe not more so 
in 2006. That's why INSEE will publish RDF files with a time stamp, and 
had it published an ontology and instances back in 1960, some entities 
would have different - and when I write different I mean not mutually 
consistent - descriptions. So which description is authoritative, 1960's 
or 2006's? Both, because I can be interested in information of today, or 
dealing with 1960 documents. And, as Eric pointed out, I don't want to 
have new URIs for the entities that are permanent, in each new publication.
BTW, what would you recommend to capture the information that a triple 
which was valid in 1960 is no more so in 2006? Is there a way to put a 
validity time span on an RDF description, apart of reification?
>>  There again, packaging considerations naturally lead to define
>> several files containing partial descriptions of the same resource.
>>     
>
> It seems very unnatural to me to use anything other than a single
> static file for the case of an ontology with just a few dozen terms.
> Maybe a handful of content-negotiated static files. But not more than
> that.
>
>   
Just a few dozen terms, yes. But with semantics not so "static" as you 
would like them to be. geo:Region is not rdfs:subClassOf - The real 
world apologizes for being so messy, changing and unstable :-) .

Bernard.
Received on Monday, 7 August 2006 10:16:20 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:47:17 UTC