Re: Semantic Web Hackings from Dan Brickley on 2000-12-29 (www-rdf-interest@w3.org from December 2000)

From: Dan Brickley <danbri@w3.org>
Date: Fri, 29 Dec 2000 07:42:57 -0500 (EST)
To: Martin Bryan <mtbryan@sgml.u-net.com>
cc: <xml-dev@lists.xml.org>, <www-rdf-interest@w3.org>
Message-ID: <Pine.LNX.4.30.0012290705260.26357-100000@tux.w3.org>
On Fri, 29 Dec 2000, Martin Bryan wrote:

>
> Sean et al
> >      http://xmlns.com/foaf/0.1/mbox
> >      A web-identifiable Internet mailbox associated with
> >      exactly one owner.
>
> But what happens if you have multiple mailboxes or share mail addresses with
> others in a virtual workgroup? I have my main address, one for the virtual
> project I am currently working on and a number that identify working groups
> that I am part of. In different roles I use different addresses to have mail
> sent. How does one know know that mtbryan@sgml.u-net.com =
> mtbryan@diffuse.org but represents only one part of xml-dev@lists.xml.org?

Using metadata ;)

In RDF terms: it's a property of a property. Some properties
(foaf:personalMailbox, foo:humanGenomeChecksum, chat:aolScreenName,
foaf:personalHomepage...) have the characteristic that they pick out
at-most-one individual. For any given value for one of these properties,
we know that there exists at-most-one identifiable resource who has
that particular value for that property. This is really useful: it
depoliticises URI space by allowing for multiple ways of identifying
(through description) the selfsame resource. So useful in fact that this
meta-property is cropping up in RDF extension vocabularies (eg. DAML, OIL,
FOAF...).

The trick is to define these properties in a way that works around the
real-world fuzziness. Not all mailboxes are personal, but most can be
considered to have a single owner. So we can define for eg.
'personalMailbox' as a relation between indviduals and their (possibly
many) mailboxes. For any given personal mailbox, there is at most one
individual whose personal mailbox it is. Same for homepage,
aol-screenname or whatever. The fact that you might have multiple
personal mailboxes is just fine. That's just more data -- you can write
down in RDF/XML that there is a person who has mailboxes danbri@w3.org and
daniel.brickley@bristol.ac.uk (or you can extract same from my PGP key,
acquiring info about my public key at same time). The fact that some
mailboxes (eg. xml-dev@lists.xml.org) are not anybody's personal mailbox
is fine. The fact that some are shared is fine too, either we consider
them to have a primary owner, or we don't. We're in the business of making
a cartoon characture of the world here, not representing it in full
nitpicky detail.

Point of all this being that we shouldn't conflate individuals with their
various online representations and activities, but that we can
nevertheless use properties of the latter to indirectly identify the
former. People being just one (political) example; audio content
(MP3s/CDs) are another equally political case to consider. Compare the
flack you'd get for attempting to create a single URI scheme for
representing people, or MP3'd songs with the ease of using arbitrary
existing properties (screennames, mailboxes, tracklist-checksums etc).
Identification through description seems to be the way to go, IMHO.


(A slight aside...)

We claim RDF is good at merging data from multiple sources; in my
experience this is true. The current discussion suggests a crude taxonomy
of RDF data aggregation mechanisms:

(1) out-out-of-the-box aggregation ("naive graph merge")
    All RDF systems do this, by virtue of using URIs for identifiers
    to merge data from multiple sources.

(2) 2nd pass node convergence ("data smushing")
    As discussed above, strategies that merge together RDF from multiple
    sources in such a way as to figure out (in some cases) where
    anonymously-mentioned resources are descriptions of the same thing.

(3) Fancy Semantic Web inference stuff ("don't hold your breath...")
    As above but drawing additional conclusions based on complex rules
    and re-application of (2).


From where I'm standing, (1) seems really handy, (2) is critical to
deploying this stuff in the grubby real world where things don't have
URIs, and (3) is, er, something to keep an eye on.

My working hypothesis (FOAF etc., more on which another time) is that (2),
ie. basic techniques for folding RDF data together even when URIs are
scarse, is enough to build something pretty cool.  Sure we have to make
some simplifying assumptions, but then that's what the Web's all about...

Dan
Received on Friday, 29 December 2000 07:43:04 UTC