RE: Explicit Disambiguation Via RDF bNodes, more Process from Narahari, Sateesh on 2002-04-26 (www-rdf-interest@w3.org from April 2002)

From: Narahari, Sateesh <Sateesh_Narahari@jdedwards.com>
Date: Fri, 26 Apr 2002 11:40:02 -0600
To: "'Sandro Hawke'" <sandro@w3.org>, www-rdf-interest@w3.org
Message-ID: <C5E6B2ABE291D5119A3800508B9553A50359A65D@cormails8.jdedwards.com>
The assumption that semantic web can not be messy is false. It is like
saying web can not have 404 and a HTML page must ensure the existence of
every link it references.

So, what's the big deal if the merged RDF graph is messy. Real world is
messy, should we refactor it?.

I think aiming for consistent system with no contradictions and elegant
graphs is laudable, but is not within the scope of semantic web, IMO.

Regards,
Sateesh 
-----Original Message-----
From: Sandro Hawke [mailto:sandro@w3.org]
Sent: Friday, April 26, 2002 11:08 AM
To: www-rdf-interest@w3.org
Subject: Explicit Disambiguation Via RDF bNodes, more Process



From the traffic on this list, there is clearly a problem with using
URIs as logical constant symbols.  We want a nearly-universally-shared
mapping between identifier strings and things they denote, but we
obviously don't have it, since we can't agree whether the string
"http://www.ibm.com/" denotes a way to access information, a
collection of information, a linguistic expression of some
information, or a company about which one can get some information!

If we don't stick to one consistent approach, we're going to have a
messier semantic web.  I don't think anything will break, but a large
merged RDF graph, coming from many sources, will have properties of
<http://www.microsoft.com> telling you when it was down, when it was
last defaced, what it's stock price is, when it was last modified, who
it's CEO is, that it's written in English, that it owns several
patents, that it's 26684 bytes long, etc.  It wont form a very
coherent picture.  Maybe that's okay.

I'm not sure the right forum for coming to rough consensus.  RDF Core
sadly declared the issue out of scope [1], and while the TAG might
make a recommendation, it might not be broad enough to reach real
consensus.  It's also not clear this matter much outside of RDF.

(Apparently the RDF Core WG thinks it's okay to use an HTTP URI to
denote a person and fetch some RDF from that URI.  If the RDF were
written by the person, it seems like it would be logically valid for
them to write
    <> dc:creator <>.   <> dc:subject <>.
The document, its creator, and its subject are all the same thing.
It's not a very coherent picture.   Maybe that's okay.)

Anyway.  I'm here to suggest a solution that works today.  It's not
perfect, but it avoids 97.3% [2] of the semantic overlap.

Background: you don't really need URIs in RDF when you have bNodes and
string literals, as long as you have at least one other symbol.  I
call that one extra symbol <http://www.w3.org/2001/12/uname#uname>.
(It's about the same as TimBL's <http://www.w3.org/2000/10/swap/log#uri>.)

Read 
     _:sandro uname:uname "http://www.w3.org/People/Sandro#me"
as
     In this document, we use the term _:sandro to denote the
     one object in the universe which has a uname which is the
     string "http://www.w3.org/People/Sandro#me".  If you find 
     something else with that same uname, it's really the same
     thing as _:sandro.

So instead of 
     <a> <b> <c>
you write
     _:a <http://www.w3.org/2001/12/uname#uname> "a".
     _:b <http://www.w3.org/2001/12/uname#uname> "b".
     _:a <http://www.w3.org/2001/12/uname#uname> "c".
     _:c _:b _:c.

I called that uname-normal-form, and found it useful for such things
as comparing RDF graphs where a name had changed.  Now I see that this
approach could be helpful here.   The reason for our semantic
messiness is that we have different "uname" mappings.  Here are the
ones we seem to talk about:

   # used by SOAP folks, CGI writers, ...
   _:computerSubsystem uname:communicationAddress
"http://www.microsoft.com/".
   # _:computerSubsystem is the thing you POST to, the thing which
   # generates your dynamic content, etc.  It's also the thing which
   # receives the mail on a mailto: URI.

   # used by TimBL (this is log:uri)
   _:abstractDigitalContent uname:retreivalAddress
"http://www.microsoft.com/".
   # _:abstractDigitalContent is the text, pictures, etc which may be served
   # by _:computerSubsystem in many different formats and languages.

   # naively used by many
   _:negotiatedDigitalContent xuname:retreivalNegotiationAddress
                                               "http://www.microsoft.com/".
   # _:negotiatedDigitalContent is the thing (a string of bytes along
   # with a content-type), a form of _:abstractDigitalContent,
   # returned by _:computerSubsystem.   This one is NOT an unambiguous
   # property.

   # used by Mark Baker et al
   _:theCompany uname:markName "http://www.microsoft.com/".
   # _:theCompany is, as I understand it, the primary subject of 
   # _:negotiatedDigitalContent.  Unlike the others this makes
   # perfect sense in the absense of connectivity or communication. 

I can't think of the proper name for uname:markName.  Fill in the
blank:
    Mark Baker is the one being or thing in the universe who has a
    __________ which is the string "http://www.markbaker.ca".

I think this all works, but of course it involves a lot more nodes.
The remaining 2.7% of incoherence comes from the uname predicate URIs
themselves, which are still simultaneously properties, web pages,
digital content, etc.

So take your pick: (1) use this approach, (2) allow some messy merged
graphs, or (3) achieve consensus.  (or find a better approach.)
Personally, I'd like (3) but I don't know how to do it.  Maybe when
people start actually merging graphs, there will be enough social
pressure on whoever looks the messiest to get them to shape up and
conform.  I wonder who that will be....

    -- sandro

[1]  http://www.w3.org/2000/03/rdf-tracking/#rdfms-resource-semantics
[2]  21% of all statistics are made up on the spot
Received on Friday, 26 April 2002 13:33:18 UTC