Re: dataset semantics

Pat,

Firstly I'd like to say that I greatly value your contributions and
have learned a lot from reading what you have written. It would be a
significant loss to the working group and the wider community were you
to leave. I would hope that the unpleasant tone of some of the recent
messages can be attributed to what happens too often with email when
people type too quickly without thinking how it will come across.

On Wed, 21 Dec 2011 00:27:51 -0600, Pat Hayes <phayes@ihmc.us> said:


    phayes> Taken to an extreme, this amounts to the claim
    phayes> that each IRI has a whole spectrum of meanings, determined
    phayes> by the graph in which it appears, and hence that every
    phayes> occurrence of it in a different graph is, in effect, a
    phayes> distinct IRI.  It is difficult to emphasise the extent to
    phayes> which this idea is wrong.

    >> As, according to you, this thing is independent of the context,
    >> we can stop making reasoners :)

    phayes> I can't even understand what this is supposed to mean, so
    phayes> I fail to follow your intended point.

Trying to unpick this. The examples from AZ hinge on using the
owl:sameAs predicate. In one sense sameAs has a very well defined
meaning in terms of the entailments that follow from using it. The
problem is that very often it is used in a different way (there have
been a couple of papers from Harry Halpin recently documenting this in
the wild).

So suppose we read ":phayes owl:sameAs 1" in AZ's example to mean,
"whatever is denoted by :phayes is similar enough for my purposes to
the number 1". The "for my purposes" is the context and is tied up in
the author's intention when writing the document. This ties into the
graph name inasmuch as the graph name is often used as a marker for
the context.

The first point is, thought of in this way, I think that the :phayes
part and whatever it denotes is stable. But this just shifts the
problem to the predicate because then owl:sameAs would have to be
allowed to denote different things in different contexts.

Maybe one way to encode this context is to have a way of saying the
entailment regime that is intended to be used with a particular graph.
"This graph certified for use with OWL-DL", "You can safely use the
rules in foo.n3 on this graph", etc.. Graphs with compatible
entailment regimes might safely be combined, otherwise not.

We can say it is wrong to allow for URIs to have a spectrum of
meanings depending on the context. But then the barrier to authors
becomes very, perhaps unreasonably, high. We would be trying to force
them to write down exactly what they mean, to work out in advance all
the implications of what they have written, perhaps run their work
through a reasoner, debug the proofs to find the problem when they
find contradictions or nonsense statements coming out the other side.

You could argue that this is a good thing, and people should do this
as a matter of course the same way people should at least check that
their software compiles before publishing it. But actually I suspect
this burden means people ought to avoid using vocabularies such as OWL
and even RDFS with well defined entailment rules and favour
underspecified ones that can be counted on not to send reasoners off
the deep end.

The utility of URIs in this case becomes at best a way to find some
documentation for humans to read and software for processing the data
hand-crafted and strongly tied to particular data patterns, full of
heuristics and kludges and special cases. RDF serialisations become an
interchange format, no longer particularly self-describing or coherent
but nevertheless far more useful than excel spreadsheets and word
documents because of the common interchange format and query language.

Personally I don't find that very intellectually satisfying. I think
it is representative of the "linked data" approach of getting as much
data published in a reasonably consistent machine readable way as
possible, immensely better than what we have had in the past, but
still falling far short of what I imagine it could be.

Of these three,

  - Web of Data without explicit semantics
  - Globally coherent Semantic Web
  - Locally coherent (contextual) Semantic Web

we can do the first with RDF as it is without worrying very much about
denotation or entailments and apart from blessing the existence of the
fourth column and sorting out some more modern serialisations there is
probably not too much for the WG to do. The second is hard to get right
and might ask too much of publishers in any case at least in the
near-medium term. I suspect the third might not be possible without
radical changes that are probably out of scope for this WG.

So I propose a tiny step forward - provide a way to link entailment
regimes such as [1] to a graph (understood simply as a name for a
possibly mutable set of triples with no additional baggage about how
those triples came to have that name). That way we will be able to at
least distinguish these cases and allow people to mark something about
their intentions about how it should be used.

Then if AZ says,

  :foo { :phayes owl:sameAs 1 }

and

  :foo :entailmentRegime <http://www.w3.org/ns/entailment/OWL-RDF-Based>.

he is clearly wrong. But if we have

  :foo :entailmentRegime :azWorld .

then we can shrug and say, "sure, whatever you say".

Hoping that something I have written here makes sense :)

-w

[1] http://www.w3.org/ns/entailment/

Received on Wednesday, 21 December 2011 10:55:55 UTC