- From: Narahari, Sateesh <Sateesh_Narahari@jdedwards.com>
- Date: Fri, 26 Apr 2002 11:40:02 -0600
- To: "'Sandro Hawke'" <sandro@w3.org>, www-rdf-interest@w3.org
The assumption that semantic web can not be messy is false. It is like saying web can not have 404 and a HTML page must ensure the existence of every link it references. So, what's the big deal if the merged RDF graph is messy. Real world is messy, should we refactor it?. I think aiming for consistent system with no contradictions and elegant graphs is laudable, but is not within the scope of semantic web, IMO. Regards, Sateesh -----Original Message----- From: Sandro Hawke [mailto:sandro@w3.org] Sent: Friday, April 26, 2002 11:08 AM To: www-rdf-interest@w3.org Subject: Explicit Disambiguation Via RDF bNodes, more Process From the traffic on this list, there is clearly a problem with using URIs as logical constant symbols. We want a nearly-universally-shared mapping between identifier strings and things they denote, but we obviously don't have it, since we can't agree whether the string "http://www.ibm.com/" denotes a way to access information, a collection of information, a linguistic expression of some information, or a company about which one can get some information! If we don't stick to one consistent approach, we're going to have a messier semantic web. I don't think anything will break, but a large merged RDF graph, coming from many sources, will have properties of <http://www.microsoft.com> telling you when it was down, when it was last defaced, what it's stock price is, when it was last modified, who it's CEO is, that it's written in English, that it owns several patents, that it's 26684 bytes long, etc. It wont form a very coherent picture. Maybe that's okay. I'm not sure the right forum for coming to rough consensus. RDF Core sadly declared the issue out of scope [1], and while the TAG might make a recommendation, it might not be broad enough to reach real consensus. It's also not clear this matter much outside of RDF. (Apparently the RDF Core WG thinks it's okay to use an HTTP URI to denote a person and fetch some RDF from that URI. If the RDF were written by the person, it seems like it would be logically valid for them to write <> dc:creator <>. <> dc:subject <>. The document, its creator, and its subject are all the same thing. It's not a very coherent picture. Maybe that's okay.) Anyway. I'm here to suggest a solution that works today. It's not perfect, but it avoids 97.3% [2] of the semantic overlap. Background: you don't really need URIs in RDF when you have bNodes and string literals, as long as you have at least one other symbol. I call that one extra symbol <http://www.w3.org/2001/12/uname#uname>. (It's about the same as TimBL's <http://www.w3.org/2000/10/swap/log#uri>.) Read _:sandro uname:uname "http://www.w3.org/People/Sandro#me" as In this document, we use the term _:sandro to denote the one object in the universe which has a uname which is the string "http://www.w3.org/People/Sandro#me". If you find something else with that same uname, it's really the same thing as _:sandro. So instead of <a> <b> <c> you write _:a <http://www.w3.org/2001/12/uname#uname> "a". _:b <http://www.w3.org/2001/12/uname#uname> "b". _:a <http://www.w3.org/2001/12/uname#uname> "c". _:c _:b _:c. I called that uname-normal-form, and found it useful for such things as comparing RDF graphs where a name had changed. Now I see that this approach could be helpful here. The reason for our semantic messiness is that we have different "uname" mappings. Here are the ones we seem to talk about: # used by SOAP folks, CGI writers, ... _:computerSubsystem uname:communicationAddress "http://www.microsoft.com/". # _:computerSubsystem is the thing you POST to, the thing which # generates your dynamic content, etc. It's also the thing which # receives the mail on a mailto: URI. # used by TimBL (this is log:uri) _:abstractDigitalContent uname:retreivalAddress "http://www.microsoft.com/". # _:abstractDigitalContent is the text, pictures, etc which may be served # by _:computerSubsystem in many different formats and languages. # naively used by many _:negotiatedDigitalContent xuname:retreivalNegotiationAddress "http://www.microsoft.com/". # _:negotiatedDigitalContent is the thing (a string of bytes along # with a content-type), a form of _:abstractDigitalContent, # returned by _:computerSubsystem. This one is NOT an unambiguous # property. # used by Mark Baker et al _:theCompany uname:markName "http://www.microsoft.com/". # _:theCompany is, as I understand it, the primary subject of # _:negotiatedDigitalContent. Unlike the others this makes # perfect sense in the absense of connectivity or communication. I can't think of the proper name for uname:markName. Fill in the blank: Mark Baker is the one being or thing in the universe who has a __________ which is the string "http://www.markbaker.ca". I think this all works, but of course it involves a lot more nodes. The remaining 2.7% of incoherence comes from the uname predicate URIs themselves, which are still simultaneously properties, web pages, digital content, etc. So take your pick: (1) use this approach, (2) allow some messy merged graphs, or (3) achieve consensus. (or find a better approach.) Personally, I'd like (3) but I don't know how to do it. Maybe when people start actually merging graphs, there will be enough social pressure on whoever looks the messiest to get them to shape up and conform. I wonder who that will be.... -- sandro [1] http://www.w3.org/2000/03/rdf-tracking/#rdfms-resource-semantics [2] 21% of all statistics are made up on the spot
Received on Friday, 26 April 2002 13:33:18 UTC