Re: [semanticweb] Ontology of interests from Hamish Harvey on 2006-07-27 (semantic-web@w3.org from July 2006)

From: Hamish Harvey <hamish@hamishharvey.com>
Date: Thu, 27 Jul 2006 12:11:47 +0100
To: semantic-web@w3c.org
Message-ID: <8f9aaf260607270411q7552636p28b96810907b9342@mail.gmail.com>
On 26/07/06, Knud Hinnerk Möller <knud.moeller@deri.org> wrote:

> Am 26.07.2006 um 10:35 schrieb Hamish Harvey:

> > Presumably you can strongly encourage people (e.g. in the FOAF docs; I
> > haven't looked) to prefer pages from e.g. Wikipedia over some
> > arbitrary even if apparently canonical page. If the page isn't there,
> > add it.
>
> Yes, you could, and such a convention would be nice. However, it will be
> hard to get everyone to follow it.

Of course.

>  Again, the heart of the problem I
> mentioned (the "URI crisis") is that a lot of things can get messed up if
> people use the URLs of physical things on the internet (e.g. a web page) to
> _also_ denote an abstract concept that this URL is somehow related to.

Of course. But, as you point out, the URI crisis has been designed out
in this case.

> If
> you only have one homogeneous source of data or community, an informal
> convention like the one you suggest might be enough. However, if you imagine
> an agent traversing the whole wide SW, collecting and integrating data from
> all kinds of sources (and that's what would be so cool about the SW), then a
> solution like that could easily break.

Any reasoning applied over data collected from all over the Internet
had better be able to cope with all sorts of incompleteness and
inconsistency. There are no 100% solutions at that scale.

If you start coining URLs to identify the foaf:topics uniquely, how
are you going to persuade people to use *those*? The problem isn't
solved, if anything it's magnified, as now you have an n:m problem
over the foaf:topic relation.

You're going to have to start using imprecise methods---page content
similarity, for example---to get anywhere.

> As said before, this problem does not
> arise in the foaf:interest example, due to the way this predicate is defined
> (i.e. the object _is_ a foaf:Document).

I was interested to see that. This seems to introduce the approach
used in Topic Maps of explicitly marking the use of a URI as a subject
indicator. So at least here you don't have to contend a problem baked
into the Semantic Web at the RDF level.

Of course encouraging the use of this approach cannot guarantee that
use so, as you note above, this can alleviate the problems relating to
the URI crisis, but it can't solve it.

Topic Maps don't add the next layer of saying formally "this document
describes *this* concept". Probably because TMs came from a world
populated by people who already knew that to attempt this could lead
only to madness. Concepts can't, in general, be pinned down down like
that.

> > Besides, what's wrong with saying, without using another URI:
> >
> > <http://en.wikipedia.org/wiki/Resource_Description_Framework>
> > foaf:topic _:rdf .
> > <http://www.w3.org/RDF/> foaf:topic _:rdf .
> >
> > ?
>
> Using a blank node only works if these two statements were made in the same
> model. If they come from different places, you need an explicit URI.

Which of course they would be, but not necessarily in the same place
as the foaf:interest statements. You have statements from (up to)
three documents, one of which contains these two statements
establishing the identity of the concepts described in the two web
pages.

It then becomes important to know *who says* that the topics/subjects
are the same and whether you trust them, to deal somehow with
conflicts, and so on. In the big messy world of the SW, perhaps
finding such an assertion somewhere would be worth a little more than
finding that the documents referred to in foaf:interest statements are
textually similar. Looking at the pages yourself and making that
assertion is then worth much more again.

Bernard picks up a more technical problem with using foaf:topic, but
also suggests a solution.

> > Another concern is the drift in meaning of assertions you'll get using
> > something like Wikipedia to provide a "controlled" vocabulary. When I
> > assert
> >
> > _:me foaf:interest <http://en.wikipedia.org/wiki/...> .
> >
> > I really mean I have an interest which is described by that page at
> > the time the assertion was made.
> >
> > Probably not such an issue in geek space with well defined technologies.
>
> No, I think that is indeed an issue - the well defined technologies do not
> define very well what using a URL like
> <http://en.wikipedia.org/wiki/Resource_Description_Framework>
>  actually means!

That seems to be a different issue. My point was that the concept
"RDF" is much better defined and less contentious than, say,
"communism". Although "Semantic Web", which describes an idea not a
technology, comes probably closer to the latter than the former (even
to the extent of questions about whether it will ever actually exist
;).

The problem I was getting at has nothing to do with technologies being
precise about in what way a URI means. In the end, you have to ground
your symbols (URIs) somewhere, and that somewhere has to be outside
what is defined by your technology standards; it always comes back to
natural language. Wikipedia is a handy collection of stable symbols
with natural language descriptions of subjects, so offers itself as a
grounding mechanism. Using it as such has some compelling advantages,
but as a semantic foundation it is also rather like quicksand, since
although the symbols are stable, the descriptions are not; the natural
language content of the pages can change at any time. That problem
remains whatever the details of the technology you are using, and
certainly isn't solved by making specific foaf:topic (or
skos:primarySubject) assertions about Wikipedia pages.

Cheers,
Hamish

-- 
Hamish Harvey
Research Associate, School of Civil Engineering and Geosciences,
Newcastle University
Received on Thursday, 27 July 2006 11:13:49 UTC