- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Mon, 07 Aug 2006 12:15:52 +0200
- To: Dan Connolly <connolly@w3.org>
- CC: semantic-web@w3.org, Eric van der Vlist <vdv@dyomedea.com>, Franck Cotton <franck.cotton@insee.fr>
Dan
Out of your exchanges, Eric will certainly come out with a smart and
elegant solution for serving something else than 404 pages for the
various URIs in the namespace http://rdf.insee.fr/geo/, and (almost)
everybody and the TAG will be happy with it. Nevertheless, I would like
to go further in this discussion.
Don Connolly a écrit :
> On Fri, 2006-08-04 at 23:53 +0200, Bernard Vatant wrote:
> [...]
>
>> And actually, this should be the general situation in SW publication :
>> there is no authoritative, definitive, complete, description of a
>> resource, packaged in one file, with a single access point.
>>
>
> Yes, there are authoritative descriptions in the Semantic Web.
>
Yes, there are. But if you look at them, those are rather exceptions
than the general rule. When I write the *general situation*, it means
what it means : most resources can't have a *single* authoritative
description, for various reasons. More below.
> Perhaps not complete definitions, but for a URI such
> as http://www.w3.org/2000/01/rdf-schema#subClassOf
> any document you get back by doing an http GET
> of http://www.w3.org/2000/01/rdf-schema
> is authoritative.
Indeed. That looks an exceptional case, because the semantics of
rdfs:subClassOf are cast in stone in the W3C specification and everyone
doing a GET of http://www.w3.org/2000/01/rdf-schema will get the same
authoritative description in secula seculorum. Or so you think. Suppose
W3C is disbanded some time in the future (no human organisation is
forever), and the Web is still there anyway, and the domain w3.org is
for sale, and some entity buys it, and does not care about whatever has
been published before and puts any silly document at
http://www.w3.org/2000/01/rdf-schema. Or, some hacker takes the control
of W3C servers and publishes Ben Laden declarations at this page. Or,
more simply, the servers are down, and you get 404. How will
applications know, in any of those cases that the content is not
authoritative any more? I guess all applications using RDFS will not
stop running in any of those cases, because they have cached or
built-in the semantics of
http://www.w3.org/2000/01/rdf-schema#subClassOf. When I use Protégé
off-line, it does not stop running because it can't check the semantics
of rdfs:subClassOf at runtime, right?
> That's how the web works: we all agree
> that if you lease/buy a domain name, you get to say what
> the URIs starting with http:// and that domain mean, and
> we agree that if you run a web server and serve up
> documents there, they are authoritative w.r.t. the meanings
> of those URIs.
>
As long as you have control, yes, but anything can happen. Euclide's
elements were not made obsolete when they burnt with Alexandria's
Library, thanks to many cached copies.
> Anybody else is free to say things about rdfs:subClassOf,
> but the document that W3C serves up at
> http://www.w3.org/2000/01/rdf-schema says that rdfs:subClassOf
> is an rdf:Property, and is some other document
> says that it's not an rdf:Property, that other document should
> be considered in error.
>
>
As I will show in INSEE example, it can happen that the same publisher,
under the same namespace, can make different descriptions, with
different and even conflicting semantics, of the same entity (defined by
the same URI). The RDFS specification is a bad counter-example because
it defines entities like logical and mathematical entities, which are
defined by time-independent axioms, but don't pretend to represent
real-world entities like cities, products or people, of which main
characteristic is to be both permanent and changing, like the Ship of
Theseus.
>> So, the best an URI can do, when its referent is not an accessible
>> thing, and that its main purpose is identifying the resource in
>> distributed descriptions, if one wants to make sense of it through
>> http protocol - since it's an http URI after all - is to get acces
>> some information like : "Sorry, what you try to access by this URI is
>> not an accessible resource. But its description can be found in RDF
>> files X, Y, Z, ...".
>>
>
> That's not the best we can do.
> If you use URIs of the form DOC#TERM for non-information
> resources, then the information resource DOC can
> say things like { <#TERM> rdf:type geo:City }.
>
>
I think you miss here the mots important point. It's not yet another
hash-vs-slash discussion. Be it one document or a fragment, the point is
that I can't have a *single consistent* description of the resource.
>> And the more I think about it, the more I think that the 404 page
>> that you get through http://rdf.insee.fr/geo/COM_80078 is close to
>> that. Agreed, the current message displayed on the page is suboptimal,
>> independently of the fact that it is in French, but replace it by the
>> quote I suggest above, and it makes much more sense that any fragment
>> identifier.
>>
>
> Really? It doesn't appeal to me at all.
>
>
Well, the point is not to be sexy here, but conformant to what happens
in the real worl. Inaccessible means inaccessible. Full stop.
>> Maybe in contradiction with what I wrote in a previous message, where
>> I suggested that maybe we could have kept the # namespace for the
>> ontology, I think now that this argument holds for ontology elements
>> as well. Granted, we have now published a single ontology file
>> containing a description of e.g., http://rdf.insee.fr/geo/Commune. But
>> next year we can have another version, or another ontology defining
>> the same entity, with the same URI, at another level of detail, and
>> which the publisher would not like to see merged with the previous
>> one.
>>
>
> Hmm... I can't imagine why not. Care to elaborate?
>
You know we got something in Europe called History. It means that things
change over time. I heard you have something of the like on your side of
the ocean, is that correct? Take for example the class
http://rdf.insee.fr/geo/Region. See
http://fr.wikipedia.org/wiki/R%C3%A9gion_fran%C3%A7aise to figure that
the concept of "Région" in its current definition, as an administrative
territory, was defined by a long and difficult process (including
resignation of Charles de Gaulle from Presidency in 1969) of which
formal achievment is quite recent (1982). Is INSEE had published a geo
ontology in the 60's, this class would not have been included, nor the
subdivision of regions in departments. But departments were there, most
of them were defined by Napoleon about 200 years ago, some of them have
changed names over time (Seine-Inférieure became Seine-Maritime), some
have been splitted, like Corsica which was splitted in two departments
in 1976, etc ...
So, a description which was authoritative in 1960 is maybe not more so
in 2006. That's why INSEE will publish RDF files with a time stamp, and
had it published an ontology and instances back in 1960, some entities
would have different - and when I write different I mean not mutually
consistent - descriptions. So which description is authoritative, 1960's
or 2006's? Both, because I can be interested in information of today, or
dealing with 1960 documents. And, as Eric pointed out, I don't want to
have new URIs for the entities that are permanent, in each new publication.
BTW, what would you recommend to capture the information that a triple
which was valid in 1960 is no more so in 2006? Is there a way to put a
validity time span on an RDF description, apart of reification?
>> There again, packaging considerations naturally lead to define
>> several files containing partial descriptions of the same resource.
>>
>
> It seems very unnatural to me to use anything other than a single
> static file for the case of an ontology with just a few dozen terms.
> Maybe a handful of content-negotiated static files. But not more than
> that.
>
>
Just a few dozen terms, yes. But with semantics not so "static" as you
would like them to be. geo:Region is not rdfs:subClassOf - The real
world apologizes for being so messy, changing and unstable :-) .
Bernard.
Received on Monday, 7 August 2006 10:16:20 UTC