Re: Geonames enters the Semantic Web from Bernard vatant on 2006-10-17 (semantic-web@w3.org from October 2006)

From: Bernard vatant <bernard.vatant@mondeca.com>
Date: Tue, 17 Oct 2006 16:41:35 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Semantic Web <semantic-web@w3.org>, Marc <marc@geonames.org>
Message-ID: <4534EB9F.9020505@mondeca.com>
Hi Richard
> You have to either do "the hash thing" or "the 303 thing" (see [1]), 
> combining both like this won't work because you'd have to set up a 
> document http://ws.geonames.org/rdf which contains the descriptions of 
> *all* your concepts.
Thank you. I thought I had understood it all at last, but now I'm 
completely lost again. :-D
You mean if we do the hash thing, we don't need redirection? So what is 
supposed to happen then with 
"http://ws.geonames.org/rdf#geonameId=3014258 " in a http GET ?
I'm lost for the moment, sorry. I guess I have to go and re-read all 
this prose I've tried desperately to make sense of ...

> I think this is all you'd have to do to get the TAG stamp of approval.
<rant>
Please folks, if you want to have people jump happily in the SW train, 
stop saying in specifications and recommendations : you can do this way, 
or that way, or otherwise, "it does not matter". This is the worst way 
toward standards adoption, because people get mixed up, and end up doing 
"this way" *and* "that way", and well, it matters a lot at the end of 
the day. Think about the mess people are in because of so many damned 
serializations of RDF in and out XML. Find two OWL editors on the market 
now which are able to exchange RDF-XML files, and make a loss-less round 
trip. So don't give two dozens of recipes. Say : if you want to be 
conformant, do that and only that.
See the presentation by Tommy Usdin at Extreme Markup 2002, which was 
maybe the most excellent I ever attended in my life. If you were not 
there, too bad, you can at least read the proceedings
http://www.idealliance.org/papers/extreme/proceedings/html/2002/Usdin01/EML2002Usdin01.html

Quote (my highlights).

"Ideally, it seems to me that a tightly specified document type can be 
thought of as one in which all knowledgeable encoders would create the 
same XML files given identical content. There is one and only one way to 
tag any particular content. Variations in tagging the same content are 
due to errors. Thus, any differences in correct XML for two instances of 
the document type are meaningful. Any time a specification allows more 
than one way to encode the same content, and there is no documented 
difference in the meaning of the two encodings, there is looseness in 
the specification of the document type. Two people with the same 
understanding of the content might produce different, correct, XML 
documents that have no difference in meaning. (The syntax may differ but 
the semantics do not.)^
And this is not only not a good thing, it is a very bad thing. Users 
keep demanding flexibility and extensibility, and in one sense giving 
them syntactical choices is giving them flexibility. *But it is not 
giving them useful, usable flexibility; it is giving them headaches.* 
They want flexibility in the content they create and communicate; they 
want rich semantics. They are not interested in flexible syntax!"

</rant>

>
> The other thing -- RDF backlinks -- are about creating *linked data*, 
> as TimBL calls it [2]. Linked data is important to make RDF data 
> consumable by Semantic Web browsers and crawlers and dynamic-dataset 
> query engines, but it's an area of active experimentation and there 
> are few clearly defined standards or best practices.
>
>> OK. But since (2) is actually the result of a query on a data base 
>> through a web service, we could add several parameters, like in other 
>> geonames web services, like including or not in the description the  
>> related  features (parentFeature, childFeature, nearbyFeature ...), 
>> limiting the number of related features, languages of attributes, 
>> whatever.
>
> Adding parameters to the query interface is certainly a good thing. 
> But remember that RDF data is intended for *machine* consumption, and 
> a machine (such as an RDF browser or crawler) will have no simple way 
> of finding out what parameters are available or what parameters would 
> be sensible in a given situation. Thus I think it doesn't really make 
> the data more useful on the Semantic Web.
Hmm. I have to chew that one. Some "machines"  can very well know the 
web service they call and what to do with the results. There are not 
only random crawlers and browsers. There are also integrated 
applications calling an external data base / service, but knowing preety 
well how to call it and what to do with the results. That is what is 
called Service Oriented Architecture, no?
>> Why so? Because, for example, a "complete" description at 
>> http://ws.geonames.org/rdf?geonameId=3017382 would contain each 
>> description of each feature in France - which is a lot.
>
> Hm ... why?
>
> First, there's no need to include the *description* of the features in 
> France. *Linking* to the features using a geonames:childPlace property 
> is all you need; a processor that wants more information can follow 
> the link and fetch the description.
This is already a lot.
>
> Second, I believe the Geonames data is hierarchical, so you would only 
> need to include links to features on the next-lower level (the 
> régions?). A processor that want all levels can just follow the link.
Makes sense
>
> Third, I want to echo one of the points from Tim's comment: You might 
> consider to put the lists of the childPlaces and nearbyPlaces of 
> France into documents separate from the main description of France, 
> and point to the lists using rdfs:seeAlso or a subproperty thereof 
> (e.g. geonames:childPlaceList). Example (imagine the appropriate URIs 
> between the angle brackets):
>
> <franceDocument> would contain:
>
>     <franceConcept> geonames:name "France" .
>     <franceConcept> geonames:parentPlace <europeConcept> .
>     <franceConcept> geonames:childPlaceList <franceChildplacesDocument> .
>     <franceConcept> geonames:neighbourPlaceList 
> <franceNeighboursDocument> .
>     ...
>
> <franceChildplacesDocument> would contain:
>
>     <franceConcept> geonames:childPlace <alsaceConcept> .
>     <franceConcept> geonames:childPlace <aquitaineConcept> .
>     ...
>
> Be sure to note the distinction between documents and concepts. Some 
> properties point to concepts, others to documents. But those pointing 
> to other documents provide additional information about the current 
> concept, and are just there to partition the data into more manageable 
> chunks.
We've been talking with Marc today about something along those lines
>
>> Maybe some user would like that kind of description to feed a data 
>> base, others would like only the direct children of type A.ADM1 etc.
>
> Note that SPARQL provides a better and standardized solution to that 
> kind of problem. AFAIK you already provide a database dump of the 
> Geonames data; third parties could use this to set up a SPARQL server 
> (e.g. using D2R Server [3]). Or, if your data is linked up properly, 
> use a dynamic-dataset SPARQL engine (e.g. using the SemWebClient 
> library [4]).
Sure enough. but in fact all the point of the exercice (from my 
viewpoint) was to figure to which extent a "classical" data base (that 
is, not built on RDF) like geonames, with already a bunch of web 
services, was able to also provide RDF data without installing a proper 
RDF server. If afterwards someone (may geonames themselves) want to set 
up all you say, all the best. But the original objective was smooth, 
effortless, costless migration.
>> So we are likely to deliver various RDF descriptions of the *same* 
>> feature (1) at various URIs such as
>> (3) 
>> http://ws.geonames.org/rdf?geonameId=3014258&childFeatures=true&maxChildrenFeatures=50 
>>
>> (4) 
>> http://ws.geonames.org/rdf?geonameId=3014258&nearbyFeatures=true&maxNearbyFeatures=5 
>> <http://ws.geonames.org/rdf?geonameId=3014258>
>
> As I said, generic Semantic Web clients will have a hard time to 
> automatically discover the alternate documents or to decide which one 
> is appropriate. It's nice for tools that use *just* the Geonames data 
> though. (But then it's not really Semantic Web ;-)
That is quite another issue. There is no (IMO) such a thing as *THE 
REAL* Semantic Web, but applications more or less able to leverage 
common semantics in order to exchange with each other. There is no 
"phase transition" when you put your stuff in RDF. Well, I willl not 
expand on that here ...

Bernard
>
> Keep it up,
> Richard
>
> [1] http://dowhatimean.net/2006/10/fixing-ambiguous-concept-uris
> [2] http://www.w3.org/DesignIssues/LinkedData.html
> [3] http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/
> [4] http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/
>
>
>
>> Note that such hypothetical URIs actually work right now, but the 
>> added parameters are not really processed and they are equivalent to 
>> (2).
>>
>> And I guess, like in other geonames Web services, (2) would yield a 
>> description with default values for parameters used in (3) and (4) 
>> and the like. In that case, the description of (1) yielded by (2) 
>> through 303 redirects will be neither "complete", nor "canonical", 
>> nor "authoritative" ... but just a "default" description.
>>
>> Is this correct TAG-wise? This time I ask before ... :-)
>>
>>
>> Bernard vatant a écrit :
>>>
>>> Hello all
>>>
>>> I'm very pleased to announce that, through a quick and efficient 
>>> collaboration with Marc (in cc), the 6 million and growing 
>>> geographical features in the data base of Geonames [1] are now 
>>> described by a OWL ontology [2], and the RDF description of each 
>>> instance, including names, type, of course geolocation elements, is 
>>> now available through Geonames Webservice,  adding to an already 
>>> impressive pack of  services [3].
>>> The ontology is very simple, and leverage elements of the wgs84_pos 
>>> vocabulary [4]. The feature types are described using a simple SKOS 
>>> vocabulary, which has been embedded in the OWL ontology.
>>>
>>> If you add that, thanks to Google Maps API, the geonames features 
>>> can be created and edited through a wiki-like interface [5], this as 
>>> Web 2.0 as can be.
>>>
>>> Comments welcome, either here or in the Geonames forum [6]
>>>
>>> Bernard
>>>
>>>
>>> [1] http://www.geonames.org
>>> [2] http://www.geonames.org/ontology/
>>> [3] http://www.geonames.org/export/
>>> [4] http://www.w3.org/2003/01/geo/#vocabulary
>>> [5] http://www.geonames.org/recent-changes/
>>> [6] http://forum.geonames.org/gforum/posts/list/156.page
>>>
>>
>> -- 
>> *Bernard Vatant
>> *Knowledge Engineering
>> ----------------------------------------------------
>> *Mondeca **
>> *3, cité Nollez 75018 Paris France
>> Web: www.mondeca.com <http://www.mondeca.com>
>> ----------------------------------------------------
>> Tel. +33 (0) 871 488 459 Mail: bernard.vatant@mondeca.com 
>> <mailto:bernard.vatant@mondeca.com>
>> Wikipedia:universimmedia 
>> <http://en.wikipedia.org/wiki/User:Universimmedia>
>>
>>
>>
>
>
>
> --No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.408 / Virus Database: 268.13.4/477 - Release Date: 
> 16/10/2006
>
>

-- 

*Bernard Vatant
*Knowledge Engineering
----------------------------------------------------
*Mondeca **
*3, cité Nollez 75018 Paris France
Web: www.mondeca.com <http://www.mondeca.com>
----------------------------------------------------
Tel. +33 (0) 871 488 459 
Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
Wikipedia:universimmedia <http://en.wikipedia.org/wiki/User:Universimmedia>
Received on Tuesday, 17 October 2006 14:42:54 UTC