Re: homonym URIs (Re: What if an URI also is a URL) from Chris Bizer on 2007-06-12 (semantic-web@w3.org from June 2007)

From: Chris Bizer <chris@bizer.de>
Date: Tue, 12 Jun 2007 10:29:10 +0200
To: "Pat Hayes" <phayes@ihmc.us>, "Sandro Hawke" <sandro@w3.org>
Cc: <semantic-web@w3.org>, "Linking Open Data" <linking-open-data@simile.mit.edu>
Message-ID: <003301c7accb$c06d0fe0$c4e84d57@named4gc1asnuj>
Hi Sandro and Pat,

> My advice here is, I confess, not widely followed.  But I hear more and
> more people converging on the idea that this is both practical and
> likely to be sufficiently effective.

Sandro: Just to back your claim that more and more people are converging 
with some hard facts:

Within the W3C SWEO Linking Open Data project, people are collaborating to 
publish and interlink huge amounts of RDF data on the Web according to Tim's 
Linked Data principles
http://www.w3.org/DesignIssues/LinkedData.html

Currently, this collaborative effort has "specified the meaning" (if you 
want to see it this way) of maybe 10 million URIs covering topics like 
geographic locations, books, publications, music, .... The descriptions 
altogether amount to a dataset of about one billion RDF triples.

Any of this 10 million URIs can be looked up over the HTTP protocol to 
retrieve a description of its meaning.

Some example URIs from DBpedia (http://dbpedia.org/docs/) which forms part 
of the Linking Open Data project:

URI denoting to the concept of Berlin as a town in Germany:

http://dbpedia.org/resource/Berlin

RDF description about Berlin, which you get by dereferencing the URI above 
with the mime type application/rdf+xml

http://dbpedia.org/data/Berlin

Human-readabale HTML description about Berlin, which you get by 
dereferencing the URI above with the mime type text/html

http://dbpedia.org/page/Berlin

As you can see, the meaning of the term is pretty clearly defined by putting 
it into several SKOS categories, having several rdf:type statements about it 
and describing in in 10 different natural languages.
All other 1 600 000 DBpedia terms are described in a similar way.

An overview about the other 8 million concepts with dereferencable URIs that 
were created in the project is given in 
http://linkeddata.org/documents/eswc2007-poster-linking-open-data.pdf
and on the project website 
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

> So when users paste that URI into their browser, they get the official
> documentation about it.

This behavior can be demonstrated with Semantic Web browsers like Tabulator 
or DISCO or the OpenLink Data browser.

Just click on a link below to start exploring the meaning of terms using 
DISCO.

The WWW 2006 conference
http://www4.wiwiss.fu-berlin.de/rdf_browser/?browse_uri=http%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fdblp%2Fresource%2Frecord%2Fconf%2Fwww%2F2006

The Tetris computer game
http://www4.wiwiss.fu-berlin.de/rdf_browser/?browse_uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FTetris

Tim Berners-Lee
http://www4.wiwiss.fu-berlin.de/rdf_browser/?browse_uri=http%3A%2F%2Fwww.w3.org%2FPeople%2FBerners-Lee%2Fcard%23i

Concerning "practical and sufficently effictive", I liked a recent paper by 
Google about their plans for the Web-of-Data.

"Web-scale Data Integration: You can only afford to Pay As You Go"
http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p40.pdf

The basic argumentation line is that we don't need completely unambiguous 
terms and schemata to provide usefull services to the end user. Even if the 
answers are only approximate they will be usefull for the user. Google seams 
to handle this by using uncertainty on all levels of their architecture 
including information extraction, schema matching and query routing. At the 
end, this uncertainty goes into their ranking algorithm and as the 
experience from the Web shows, users are very happy with ranked approximate 
results where high quality stuff tends to show up at the beginning of the 
list.

Cheers

Chris


--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message ----- 
From: "Sandro Hawke" <sandro@w3.org>
To: "Pat Hayes" <phayes@ihmc.us>
Cc: <semantic-web@w3.org>
Sent: Tuesday, June 12, 2007 12:11 AM
Subject: homonym URIs (Re: What if an URI also is a URL)


>
>
> Pat Hayes <phayes@ihmc.us> writes:
>> Tim, as this discussion gets to the heart of what
>> Ive been trying to argue for several years,
>> please take the comments below as intended in a
>> spirit of analysis rather than just pins and
>> angels.
>
> Pat, I'm going to jump in here, if you don't mind.  I think my position
> on these issues is pretty much the same as Tim's but I could be wrong.
> I don't argue that John's "dance" isn't required, just that part of the
> Semantic Web version of the dance is: don't make your URIs unnecessarily
> ambiguous.  One might even say: don't pun.
>
>> And what about a URI
>> that I own and wish it to denote, say, the planet
>> Venus, or my pet cat? What do I do, to attach the
>> URI to my intended referent for it?
>
> You publish a document (an ontology) so it's available through that URI.
> If it's a hash URI, you publish the ontology at the non-hash version.
> If it's a slash URI, you publish the ontology at the far end of a 303
> redirect.  And you content-negotiate HTML and RDF.
>
> So when users paste that URI into their browser, they get the official
> documentation about it.
>
> And when RDF software dereferences that URI, it gets some logical
> formulas which should be understood (like the HTML) to be asserted by the
> URI's owner/host/publisher.  Those formulas constrain the possible
> meanings of that URI, relative to other URIs.  They can't nail a URI to
> Venus, but they can use other ontologies to provide useful (and possibly
> very constraining) information, like that it's an astronomical body with
> a mass of about 5e+24kg.
>
> My advice here is, I confess, not widely followed.  But I hear more and
> more people converging on the idea that this is both practical and
> likely to be sufficiently effective.
>
>> The point surely is that URIs used to refer (not
>> as in HTTP, but as in OWL) do *not* have a
>> standardized meaning. Standards are certainly a
>> chore to create, but they only go so far. OWL
>> defines the meanings of the OWL namespace, but it
>> does not define the meanings of the FOAF
>> vocabulary,
>
> No, that's up to the owner(s) of the FOAF terms.
>
>> or the URIrefs used in, say,
>> ontologies published by the NIH or by JPL.
>
> And that's up to the NIH and JPL, respectively.
>
>> The
>> only way those meanings can be specified is by
>> writing ontologies: and finite ontologies do not
>> - cannot possibly - nail down referents
>> *uniquely*.
>
> Ah -- there we go.  There must be a long history of this subject in
> philosophy.  Can things ever be nailed down uniquely?  I haven't a clue.
> But that's the wrong question.  In this thread, I don't think we're
> talking about whether we can really be sure what we mean when we say
> such a URI denotes Venus.  Instead, we're talking about whether it's a
> good practice to use a single URI to denote clearly distinct things,
> such as:
>   (1) the second rock from the sun
>   (2) the Roman goddess of love
>   (3) a star tennis player
>   (4) ... etc
> The term "ambiguity" covers both these issues, but we don't need to
> combine them.   The first is a kind of imprecision, a fuzziness, while
> the second is the re-use of a word for a second meaning, a homonym.
> (Homonyms seem to be called "overloading" in computer programming.)
>
> I think we know how to work with homonyms, but since we're engineering a
> new system, it seems like a good design decision to forbid them, doesn't
> it?
>
>    -- Sandro
>
Received on Tuesday, 12 June 2007 08:29:33 UTC