Re: INSEE releases OWL ontology and RDF data for geographical entities from Eric van der Vlist on 2006-08-04 (public-xg-geo@w3.org from August 2006)

From: Eric van der Vlist <vdv@dyomedea.com>
Date: Fri, 04 Aug 2006 22:43:18 +0200
To: Dan Connolly <connolly@w3.org>
Cc: Bernard Vatant <bernard.vatant@mondeca.com>, semantic-web@w3.org, public-xg-geo@w3.org, Franck Cotton <franck.cotton@insee.fr>
Message-Id: <1154724198.16762.44.camel@localhost>
Dan,

Le vendredi 04 août 2006 à 09:32 -0500, Dan Connolly a écrit :
> On Fri, 2006-08-04 at 09:28 +0200, Eric van der Vlist wrote: 
> > Hi,
> 
> Hi Eric,
> 
> > Le jeudi 03 août 2006 à 23:26 +0200, Bernard Vatant a écrit :
> > > 
> > > Dan
> > > > did you consider using # rather than /? i.e.
> > > >   http://rdf.insee.fr/geo#code_commune
> > > > rather than
> > > >   http://rdf.insee.fr/geo/code_commune
> > > > especially for ontologies, it's a lot easier to manage.
> > > >   
> > > We did consider. Actually my first version of the ontology used a #
> > > namespace. Eric (in cc)  was the one who suggested a / namespace,
> > > especially for the data and somehow convinced the rest of us. That was
> > > six months ago, but if I remember correctly, the idea was that at some
> > > point, each instance URI would  be (should be, hopefully will be)
> > > associated  with, and access to, a  separate resource, which is not
> > > the case now. 
> > 
> > Yes, that was the first comment I did on your first proposal end of
> > January.
> > 
> > The idea was that to identify a city, http://rdf.insee.fr/geo/COM_80078
> > is better than http://rdf.insee.fr/geo#COM_80078.
> 
> You might also consider http://rdf.insee.fr/geo/COM_80078#city for
> the city itself and http://rdf.insee.fr/geo/COM_80078 for a document
> about the city.
> 
> If the cities come in natural chunks, perhaps
> http://rdf.insee.fr/geo/COM_800#city78
> for the city and http://rdf.insee.fr/geo/COM_800 for a document about
> the cities in some region.

You mean that we should use the same URI to identify geographical
entities and locate the fragment where there are defined?

We have rejected this idea for a number of reasons. I think that the
most important of these reasons is that it would assume that the entity
is described at only one location in only one RDF document and that's
not true in our case.

If you take an entity such as a city, this entity can be located over
two higher level entities and its description is then split between the
different higher level entities to which it belongs.

Even when a city belongs to only one higher level entities, important
pieces of its description can be found in the description of the
different layers of higher level entities and the description of
entities such as department is spread over four different documents.

We also think that splitting entities into RDF documents is a packaging
issue that may evolve over time and shouldn't impact the URIs
identifying the entities.

Furthermore, we believe that hard coding the links between entities
identifiers and RDF documents would make the version management of these
documents more complex. We have included a year in the URIs for the RDF
documents so that we can easily publish new versions and keep the
previous one (an "old" version carries valid information about the
ontology for a specific date and we think that it should remain online).
And of course, we wouldn't like that the URIs identifying the entities
change over time.

> >  Of course, these URIs 
> > are only identifiers but who konws, we might want some day to publish
> > some kind of documentation (like we do in RDDL to document namespaces)
> > at these URIs. 
> 
> "only identifiers"? sigh. I got the impression you wanted to publish
> information about them in the Semantic Web.

These are semantic information conform to the W3C recommendations and
published on the World Wide Web. Isn't it sufficient to be part of the
Semantic Web? 

> > If we do so, the first URI makes each city a standalone entity while the
> > second one means that they need to be fragments in a huge document which
> > can cause a lot of issues (we don't know which media types we might want
> > to publish and the definition of fragments is inconsistent between media
> > types
> 
> It's within your control to choose media types where the definition
> of fragments is consistent. The easiest way is to just use one
> media type: application/rdf+xml .

What we have in mind for these URIs isn't necessarily limited to RDF but
could include XHTML documentations or other kind of resources. Both RDF
and XHTML can be published at the same location using content
negotiation... What I meant by being inconsistent between media types is
that if you use content negotiation you need to make sure that each
content has the same fragments which is a further complication.

BTW, If we ever serve RDF at these addresses, I guess that it would
kind of placeholders with seeAlso attributes to point to the different
documents in which an entity is described rather than the actual
definition of the entity.
> 
> >  (some of them don't even support fragments), the document might
> > grow very large, ...). 
> > 
> > Now, the thing that we've not considered is to have a namespace URI
> > different from the RDF base.
> > 
> > > Agreed, we could have kept the # namespace for the ontology at least.
> > 
> > Dan, can you elaborate why that makes ontologies a lot easier to manage?
> 
> Because with a # namespace, publishing the ontology just involves
> sticking one static file on a web server. (the URI looks nicer
> if the web server can handle leaving the .rdf or .owl off, but
> that's not completely essential).
> 
> And then to look up http://rdf.insee.fr/geo#code_commune , a consumer
> just GETs http://rdf.insee.fr/geo as usual; then when they want
> to look up another term such as http://rdf.insee.fr/geo#subdivision,
> they can save a round trip because they already have it.
> 
> Using a / namespace has a higher cost for the producer (redirects)
> and for the consumer (one GET per term rather than one GET
> for the ontology).

That's true only if you assume that these identifiers are also used as
locations...

I know that this is a highly controversial debate, but I have always
thought that the big advantage of RDF over XML vocabularies such as
XLink is that it differentiates the two notions and I wouldn't want to
loose this benefit!

Thanks your clarifications!

Eric

-- 
GPG-PGP: 2A528005
Have you ever thought about unit testing XSLT templates?
                                                     http://xsltunit.org
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------
Received on Friday, 4 August 2006 20:43:40 UTC