Re: Converting SHOE to RDF: about 2/3 done; some gotchas from Dan Connolly on 2000-05-14 (www-rdf-interest@w3.org from May 2000)

From: Dan Connolly <connolly@w3.org>
Date: Sun, 14 May 2000 00:26:40 -0500
To: Sean Luke <seanl@cs.umd.edu>
CC: Jeff Heflin <heflin@cs.umd.edu>, www-rdf-interest@w3.org
Message-ID: <391E3910.6C2CCEF7@w3.org>
Sean Luke wrote:
> 
> [context everyone: Jeff does SHOE, Dan is working informally on a SHOE ->
> RDF converter and had asked some SHOE questions]
> 
> On Fri, 12 May 2000, Dan Connolly wrote:
> 
> > Jeff Heflin wrote:
> >
> > > <rdfs:Class rdf:about="http://schema.org/web#Web_Developer">
> > >   <rdfs:subclassOf rdf:resource="#Silly_Person">
> > > </rdfs:Class>
> > >
> > > I do not see any restrictions in the RDFS spec to prevent such a
> > > statement.
> >
> > Why should we prevent such a statement? Anyone can say anything
> > about anything, no?
> 
> [snip]
> 
> > > Also, as I understand it way namespaces are used in RDF is
> > > only to uniquely identify what object you're talking about, not which
> > > sets of definitions you subscribe to. Thus, if I state that I am
> > > Web_Developer, then do I also imply that I am a Silly_Person?
> >
> > If you say P and P->Q, then you imply Q, yes. But
> > if you say P and somebody else says P->Q, then a third
> > party may or may not decide to trust you both enough
> > to conclude Q.
> 
> What if a third party Foo says P->Q, you rely on this to make all sorts of
> statements about P under the assumption that your clients will use Foo's
> claim to understand that you're really talking about Q, and then Foo drops
> off the face of the earth?  Say, because the ILOVEYOU virus brings down
> their system for a week. They're still very trustworthy: everything they
> say is true.  But right now they're not saying anything, and your clients
> don't find their stuff and so interpret your semantics in a different way
> than you did when you posted your data originally.

If by "interpret your semantics in a different way" you mean they
get "I don't know" answers when I meant them to get "yes" answers,
then that seems like exactly the right thing... just as if
in normal conversation, I said "it's just like in the
movie Say Anything..." and they hadn't seen the movie.

But if my clients making some sort of closed-world assumption
and assuming not(P) just because they can't find a statment
of P... and hence getting "no" answers where I meant
them to get "yes", then that's broken. So don't do that.

>  Or Foo updated its
> claims and dropped a few statements that you relied on, not knowing that
> you were doing so.  You're left hanging in the wind.

Yes: if I rely on somebody, and they do something antisocial, I lose.
That seems like an accurate model of the real world, no?

But I don't have to trust them not to change their data... I can
include the Md5 of their data as of when I saw it in my link,
and say "only use the stuff at this address if the MD5 is l2k3jl23j".

cf


 "... In same cases one might want to constrian the simple
        invokation to protect the reader. We can use a conditional, for
example, to
        require a partiuclar checksum or digital signature:

        <foo:bar>
          <head>
            <if>
              <ds:hash rdf:about=part1.rdf">
                 md5:1287129371237..12738127398712</ds:hash>
              <then>
  ..."
  -- http://www.w3.org/DesignIssues/Toolbox

> It seems to me that trust isn't a great model for this kind of stuff in a
> distributed, uncontrolled environment.

Really? hmm...

> I think that dealing with the
> weirdness caused by a lack of control inherent in a distributed system is
> one of the features that RDF needs to work more on.

Well... sure... RDF 1.0 is just a syntax for ground assertions
using 2-place predicates. There's lots more work on all
sorts of issues, combining issues from untrusted sources
being a noteable one.

cf
 http://www.w3.org/DesignIssues/Semantic.html


>  Because RDF lets
> "anyone say anything about anything", with no notion of authority at all,
> then people are free to redefine the semantics of a language in a very
> fine-grained, highly distributed fashion.

The RDF Schema specification prohibits changing schemas at all.
(mechanisms of enforcement aren't discussed, but I think
it's clear how to implement them.)

>  Combine this with a cooperative
> reliance on who said what about what in order to understand the proper
> semantics of some claim X, and you're gonna see a lot of semantic
> misinterpretation as little parts of the distributed RDF web come up and
> down.

As I say, yes, if party A relies on party B and party B does something
antisocial, party A can be screwed. That's life. Party A can take
defensive measures to whatever extent they choose, depending
on how much they mistrust B and what the cost of getting screwed
is.

> SHOE's model counters this in two ways, which Jeff is getting at I think.
> First, it separates schema from data, which at least attempts to provide
> *some* semblence of authority, or at least a well-defined semantic
> language so we're very clear on what I intend to be inferred when I say
> thing Foo.  SHOE also takes a stab at providing versioning in schema,
> since in the real world schema change over time, independent of the claims
> made using them.

I just read the SHOE versioning paper in the last few days...
SHOE just follows the simple
"if you want to publish a new schema, you use a new URI" rule that
RDF Schema (and COM and lots of other sensible systems) use:

	SHOE maintains each version of the ontology as a separate
	web page and an instance must state which version it
	commits to. As a result, data sources can upgrade to the
	new ontology at their own pace and some may never upgrade.

	-- section 3.2 Versioning
	http://www.cs.umd.edu/projects/plus/SHOE/pubs/#aaai2000

I don't see how the version numbers in a SHOE ontology have
any actual effect on anything... nor the schema IDs,
for that matter. But maybe I missed it... could you give
an example of how the version numbers actually matter?

>  Second for those few "inferential" statements that can
> be made in data (basically classification statements), SHOE's model does
> not permit you to rely on another person's statement.  So if person X says
> that Fido is a Dog, and you rely on this to state chases(Fido,Bell), where
> the first argument of chases must be a Dog, and then person X retracts his
> claim that Fido is a Dog (or the client doesn't see his statement, or it's
> unavailable, or whatever), in SHOE by saying chases(Fido,Bell), what
> you're *really* saying is threefold:
> 
>         1. I claim that chases(Fido,Bell).
>         2. I claim that Fido is a Dog.
>         2. I claim that Bell is a Cat.
> 
> Thus what person X said about Fido is really immaterial -- you're not
> relying on him for semantic context.  But in RDF this is not always the
> case, as Jeff had pointed out.  Without a full collection of statements,
> you could have a lot of disconnected claims without their full semantic
> understanding.

Hmm... I'm not sure what you mean by "full semantic understanding."
RDF has no built-in logic whatsoever. The "full semantic understanding"
depends on more than just the availability of various things...
it depends on what inference rules you choose to use, what
sort of logic, etc.


> RDF would work great in a "sandbox environment" typical of most KR
> systems, where the system has the entire body of knowledge (so to speak)
> stored internally and so can interpret statements in its system under the
> semantics of the full universe of facts at its disposal.  But I think that
> RDF does not take into consideration the uncomfortable fact that the web
> is not a sandbox.  People say anything they want, even trusted ones.  And
> they can stop saying these things at any time, and for any reason.
> Relying on the availability, correctness, and completeness of others'
> statements in order to put your own into context is pretty dangerous.

The RDF 1.0 specification doesn't address issues of semantics in a
distributed system... heck, it doesn't even address the traditional
inference mechanism done in sandboxed KR systems: no variables,
no if-then nor not, etc.

But RDF 1.0 is intended to be the bottom layer of systems that
do address these uncomfortable facts... even model them
explicitly.

We only standardize stuff after we're pretty certain about it.
And at this point, the only thing we're pretty certain about
is that everybody needs to be able to make assertions
using 2-place predicates. The rest, we're all just experimenting
with.



-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Sunday, 14 May 2000 01:26:21 UTC