Countries Re: [GeoNames] GeoNames RDF dataset improvements

Hi Alexander and all

[cc to LOD list since there is a parallel thread on "what is a country"]

I'll try to sump up clearly the "countries" current status in both the
Geonames ontology and the RDF service output (letting the dump alone) and
how it will (should) be changed in future releases.

Let's take the example of Argentina, ISO code = AR.

Currently Geonames has way too many URIs to describe this country.
[1] The feature id=3865483 : http://sws.geonames.org/3865483/
[2] An anchor in the countries page : http://www.geonames.org/countries/#AR
[3] An HTML description linked from the above
http://www.geonames.org/countries/AR/argentina.html

Each of those URIs provide some description, but only [1] should be used in
the RDF output as value of the "inCountry" attribute. OTOH only [2] provides
a clear list of countries based on the existence of an ISO code, but this
URI is not linked-data friendly at all. The URI such as [2] are used as
values of the "inCountry" object property, put on each faeture inside the
country territory.

Now how do you figure out when looking only at a feature RDF description as
the one provided at [1], if it has a match in the list at [2]. maybe the
feature code would help. We find <featureCode rdf:resource="
http://www.geonames.org/ontology#A.PCLI"/>
Which means that Argentina is an "independent political entity". Is not that
the same thing as a country? Well, most of the time, yes, but all the time,
no. The list of countries as per ISO code at [2] is 248, the number of
features with code PCLI is 192 ... which let us with 56 countries with
either no matching feature or a code different from PCLI. Is every PCLI a
country? I won't swear it is, but let's assume this for a moment.

Now http://download.geonames.org/export/dump/countryInfo.txt gives you more
info on the 248 countries, including ... the matching geonames id. But not
the feature code. But every country has its feature match. Good news. But so
far, we have still no clue to infer from the description at [1] that
Argentina is indeed a country, and we're left with 56 countries at least
which are not "independent political entities". What a world ... no wonder
we have so many wars ...

Actually the description at [1] does not say that [1] *is* a country, but it
says it is *in* the country defined by [2] ...
<inCountry rdf:resource="http://www.geonames.org/countries/#AR"/>

I don't think we can clear this mess, because the world is messy. So what I
propose is the following :

- Deprecate the ObjectProperty "inCountry" altogether.

- Replace it by a "countryCode" property giving the ISO 2-letter code. The
transformation is completely straighforward, e.g.,

<inCountry rdf:resource="http://www.geonames.org/countries/#AR"/>
will be replaced by
<countryCode>AR</countryCode>

and indeed can be used the same way

- For each feature matching a country, whichever its feature code, put a
rdfs:seeAlso link to the URI at [2] in its description
<rdfs:seeAlso rdf:resource="http://www.geonames.org/countries/#AR"/>

And we're done.

Bernard

Received on Thursday, 29 April 2010 14:40:56 UTC