a new spin on contexts -- do they really help with vocabulary mapping?

[This is a brainstorming discussion, maybe off-topic for the WG.]

On Mon, 2012-04-23 at 00:45 -0700, Pat Hayes wrote:
> First, regrets for next Wednesday, I will be driving through Texas. 

I drove across the panhandle once.  Surely, you can just lash the wheel,
put a rock on the gas pedal, and stretch out in the back seat to talk to
us.....     (not that we'll be talking about this, from what I gather.)

> Second, I have written up essentially the same proposal in a slightly different terminology which might (?) be more palatable, anyway it is there for inspection at http://www.w3.org/2011/rdf-wg/wiki/AnotherSpin

I'm just going to respond, right now, to the middle of Why Bother,
because I think everything flows from there.

As background, I subscribe to the philosophy that the meaning of an RDF
graph depends on the meanings of the predicates it uses.  So, to extend
the semantics of RDF to include equality, you just *use* owl:sameAs.  To
extend the semantics of RDF to include subproperty reasoning, you just
*use* the predicate rdfs:subClassOf.   This seems very simple and
elegant, although I will grant it has not been well understood or well
deployed, to date, and isn't exactly how the specs are written.

I don't see a need for rdf:inherits -- I think the semantics should,
essentially, automatically 'inherit' each predicate.   I think this
should be implemented, in the general (low performance) case, by having
clients download inference rules from the predicate URLs. [1]
Reasoners can be specialized to use a tableau algorithm, for example,
instead of running rules, if the result will be the same (or predictably
and usefully different, I guess).   

Trying to understand the Why Bother section...   I like the chemistry
example.   As I understand the science: there was Carbon, pretty well
understood, and then suddenly a hundred years ago, it started to look
like there was actually Carbon 12, Carbon 13, Carbon 14, and more.
Unseen, unknown, until radioactivity was understood.  

So, in RDF, we can imagine there's lots of data about chem:Carbon, and
then suddenly, starting around 1912, we have iso:Carbon12, iso:Carbon13,
etc.  Statistically, 99% of chem:Carbon is iso:Carbon12.  Only
0.0000000001% [2] of the chem:Carbon is iso:Carbon14, but still that
little bit is enormously useful, eg for carbon dating.  Chemical
formulas don't care about isotopes: propane is C3H8, whichever carbon
isotopes are used in it.  When we're just doing normal chemistry, we
want to ignore isotopes; when we're doing carbon dating, or looking at
precise atomic weights, we do not.

So, how do we do this in RDF?   Can we do it without contexts?

Alice is an organic chemist, and all her software uses the chem:
vocabulary.  Her experimental results all talk about chem:Carbon.  She
doesn't care about isotopes.

Bob is a radiochemist.  Pretty much all of his software uses the iso:
vocabulary, because he works with specific isotopes.   His data includes
lots of references to iso:Carbon12, and other isotopes.

Things get interesting when Bob wants to use some of Alice's data, or
Alice wants to use some of Bob's data.

First attempt: use an OWL reasoner on the data, and mix in the "facts"
that:

        chem:Carbon owl:sameAs iso:Carbon12, iso:Carbon13, iso:Carbon14.
        
This will probably work for some things, and completely break for
others.  The chemical properties are the same, so that sounds okay.
Nearly everything Alice findsa to be true for chem:Carbon is almost
certainly also true for each of the carbon isotopes. 

However, there are some properties (atomic weights, number of neutrons,
rate of radioactive decay) which are different.   If some of that is
given in the iso ontology, the reasoner will quickly determine (if it
looks) that the combined ontology is inconsistent.

We'd like something a little more like a subclass relation.  If these
ontologies treated substances as classes containing, as instances, items
composed of that substance, the mixin 'facts' could be:

   iso:Carbon12 rdfs:subClassOf chem:Carbon

That's much closer to true.  It could even be true, if chem:Carbon were
designed with an understanding of isotopes, but in our scenario, it
wasn't.   With the new understanding of carbon, the atomic weight is
defined as that of carbon-12, but before they understood about isotopes,
the atomic weight ended up being the combined atomic weights of the
isotopes found in the carbon the chemists worked with.    So our naive
chem:Carbon definition gives it a single atomic weight, which is not the
same as any of the isotopes.    so that subclass rule isn't quite right
either.

Maybe that was Alice's experiment, measuring the atomic weight of
Carbon.  She doesn't know about isotopes, so her answer wont be exactly
the weight of any of the isotopes; it'll be the weight of whatever
combination she happened to use.

We have a world here where the ontologies line up 99% and maybe with
some work we can get another .9% or maybe .99%.  But never 100%.
There's carbon-12 and there's "carbon", which is a mixture of the
various carbon isotopes as they happened to occur for that particular
observer.

Can contexts help with this?  Is it better to use the same term for
these two concepts?   I wouldn't think so.

I think the answer here is to use "shims", such as these "facts" I've
used above.  I think people should publish various shims, for various
purposes.   The shims are web documents which say how to map from data
in one vocabulary to data with a similar meaning in another vocabulary.
I'm not sure how much they should be OWL vs RIF vs Javascript (using
some convention not-yet-determined).    I think the shims will have to
be labeled in ways that help people, and sometimes machines, figure out
which ones are best for their purposes and understand in what ways they
are wrong/broken.  Maybe Bob writes one that allows him to use Alice's
data, then publishes it for others to use under similar circumstances.

I can see how to do that as long as we use different IRIs for the
different notions of carbon.   If we used the same IRI but labeled the
graphs in some way, I think it would get harder.

(Change-over-time is a different use case, which I'd like to talk about
separately.)

    -- Sandro

[1] Pat, I think you were in the room when I gave this talk.  I don't
remember your reaction. 
http://www.w3.org/2009/Talks/1026-semrus/#%2831%29
[2] I love wikipedia.

Received on Wednesday, 25 April 2012 02:27:53 UTC