Re: OWL equivalentClass question from Alan Ruttenberg on 2012-07-15 (semantic-web@w3.org from July 2012)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Sun, 15 Jul 2012 03:48:50 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: David Booth <david@dbooth.org>, Michael Schneider <schneid@fzi.de>, semantic-web@w3.org, nathan@webr3.org, W3C OWL Working Group <public-owl-wg@w3.org>
Message-ID: <CAFKQJ8n6rJwpCJT64CJ6w4Gye9e3HRK1n=qAB9oOqNFWPrGhKw@mail.gmail.com>
On Sat, Jul 14, 2012 at 10:52 PM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On Jul 14, 2012, at 12:15 PM, Alan Ruttenberg wrote:
>
> >
> >
> > On Sat, Jul 14, 2012 at 12:35 AM, Pat Hayes <phayes@ihmc.us> wrote:
> > Alan has drawn my attention to this thread., which I confess I find
> rather confusing.
> >
> > I've used the "i'm confused" thing too. But I doubt you are. It's
> perfectly normal for the chair of one group, on seeing a change in a
> specification on which it depends, trying to figure out what the
> implications are for their spec. Second, David showed a misunderstanding of
> what the situation was from a logical point of view, and this made me worry
> that others (even the editors of the RDF 1.1 spec) might also share such
> misconceptions.
>
> Perhaps 'puzzling' would have been better. I really was, no rhetoric
> involved.
>

OK.

>
> >
> > First, some basics. Regarding skolemization, it is important to remember
> that skolemization is not a valid inference process, strictly speaking. If
> you start with a graph G contaning a bnode and skolemize it to get another
> graph GS where the bnode has been replaced by a URI, then G does not entail
> GS.
> >
> > Good. That's important, as it means that the RDF document needs to
> specify in which situations, and with what consequences, skolemization may
> be done.
>
> It MAY be done at any time. The RDF specs do not set out to say what may
> or may not be done to RDF.

You are free to do anything you like. What the specs do specify is what
> changes to RDF graphs are valid entailments. Skolemization is not a valid
> entailment. It is very close to being valid, but it is not strictly valid.
>

It seems the spec contradicts what you say.

What the specs do specify is what changes to RDF graphs are valid
entailments
+
The RDF specs do not set out to say what may or may not be done to RDF
+
The draft at
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html section
3.5 Replacing Blank Nodes with IRIs say: In situations where stronger
identification is needed, systems may systematically transform some or all
of the blank nodes in an RDF graph into IRIs
+
Skolemization is not a valid entailment

Am I not understanding something?


>
> >
> > The relationship between them is subtler: it is that: **if H does not
> contain the skolem URI **, then G entails H iff GS entails H. Now, GS
> entails GS, if course, so you might think that this implies that G entails
> GS, but it doesnt because GS of course *does* contain the skolem URI.
> >
> > It can't by what was specified - that the skolem URI should not appear
> anywhere else.
>
> But it does appear in the skolemized graph GS, by construction. That is
> why we call it 'skolemized'.
>

I misunderstood the point you are making here. What I think I understand: G
does not entail GS (+ subtle relation). What I don't understand: How
skolemization can be allowed at will but not break RDF entailment.


> > So, it is not at all surprising that the skolemization of a graph might
> have logical properties that are not shared by the unskolemized graph. Such
> a situation does not break RDF entailment, nor does it render skolemization
> impossible. It just means that you have to use skolemization carefully, as
> it is not a valid inference mode all by itself; but this always was the
> case.
>

Again, what I'm having trouble with is why the proposed spec says (only)
you can "In situations where stronger identification is needed". Nothing
about being careful, or how to be careful.


> > I will re-read the draft spec to see how this is stated.
> >
> > As to the OWL mappings. This is one instance of a general phenomenon,
> that when a 'higher' language (OWL-DL) is embedded into RDF, there are
> going to be restrictions on the legal forms that are used to encode the
> higher language. You will not get perfect freedom to perform even valid RDF
> entailments on the OWL-DL/RDF without risking making the RDF into something
> that is no longer a legal RDF encoding of OWL-DL syntax.
> >
> > Agreed for OWL2 under the DL semantics. However OWL according to the RDF
> semantics is a different story and that is part of the spec I worry about
> too.
> >
> > Put another way, the OWL-DL imposes its own syntactic and semantic
> retrictions which go beyond those imposed by RDF itself, and engines which
> need to use the OWL-DL/RDF *as OWL-DL* must be able to respect those
> OWL-DL-imposed restrictions.
> >
> > It is not only engines that may need OWL-DL/RDF, but users of those
> engines. A sanction for engines to do skolemization at will will affect
> those users inadvertently. Sometimes triple stores are used solely to
> *store* OWL.
>
> And it would be dangerous to skolemize such stores, for fear of brealking
> the OWL conventions. Forgive me, but this seems *obvious*, so I wonder why
> you are making such a big deal out of it.


Because prior to this proposal about skolemization, I don't know of
anything in the RDF specs that would sanction corrupting (transforming) RDF
graphs in such a way. After this I need to worry about whether "systems"
and this means any RDF processing system, it appears, will corrupt OWL that
is put through them. Forgive *me*, but this seems *obvious* too, so I
wonder why *you* are making such a big deal about my concern.


>  I must be missing something that you are assuming, or something (?)
>

Hmm. I believe my assumption is simple. If I use an RDF system according to
the specifications, then tools further up the semantic web stack won't
break.

>

> > This is hardly surprising. The "Full" subfamily is there for people who
> wish to have complete freedom at the RDF level, but they necessarily pay
> the price of sacrificing deductive efficiencies available only for the more
> restricted higher-level language.
> >
> > Yes, but they also define OWL using the RDF Semantics as a semantic
> extension of RDF (as it existed when the spec was finalized). Changes
> thereafter that change RDF semantics will, and should be, examined
> carefully.
>
> Indeed. So far, nobody (except me) has suggested any changes in the RDF
> semantics, partly for this reason.
>

The concept document is supposed to be in sync with the semantics. It's
hard for me to understand how the RDF concept document offers that tools
"MAY" change the semantics of a graph.


>
> > So, overall: nothing here is particularly surprising or alarming, and
> nothing is (any more) broken (than the world has always been.)
> >
> > I'm not so sure. For example, looking at the current draft we see.
> >
> >
> > "Blank nodes do not have identifiers in the RDF abstract syntax. The
> blank node identifiers introduced by some concrete syntaxes have only local
> scope and are purely an artifact of the serialization."
> >
> > It is incorrect that blank notes are purely an artifact of
> serialization. In *any*  serialization that I am aware of. Please correct
> me if I am wrong.
>
> The blank node *identifiers* are artifacts of the serialization. In the
> RDF abstract syntax, blank nodes have no identifiers.
>

OK. I see that in place of this there must merely be a way to say whether
two bnodes are the same.


> >
> > Then:
> >
> > "In situations where stronger identification is needed, systems may
> systematically transform some or all of the blank nodes in an RDF graph
> into IRIs [IRI]. Systems wishing to do this should mint a new, globally
> unique IRI (a Skolem IRI) for each blank node so transformed."
> >
> > This amounts to, from my point of view, entailing
>
> No, it does not say that that this change is an entailment. However, I
> agree that this wording  could be misleading, and perhaps we should make
> the situation clearer.
>
>
I used "entailing" to relate the statement in the proposed spec to the
sanctioned breaking of OWL serialized by any RDF system, not to say that
the skolemization is an entailment.

> : systems may systematically change OWL ontologies (under the
> DL-semantics) stored in them to become RDF that is no longer an OWL (under
> the DL semantics) ontology.
> >
> > That sounds bad to me. It definitely sounds more broken then things were
> before. Before I could put an OWL ontology into a named graph and get it
> out unscathed. Now I can't count on that.
>
> If all you do is put it in and then take it out, it will be unscathed. If
> someone makes a change to the graph while it is in there, it might get
> changed, yes. That includes skolemizing it.
>

It does not say that at all - how do you conclude this? The statement says
that "systems" can do this for reasons seemingly justified by the "system".
I don't consider a system a someone. This is not stated as something a user
controls, but rather as a prerogative of an implementer. Most importantly
it allows some entity *other than the author of the RDF* to make changes to
it.

Changes such as this should be viewed the same way one would a system that
decided (for whatever reason) to change all xsd:decimal numbers into
xsd:floats. Now certainly someone should be able to write a system that
does exactly this. But they shouldn't be able to say it obeys RDF or OWL
semantics. If an RDF document is saying you can make transformations to the
content that change its meaning (it's hard to say it has the same meaning
if different entailments follow) then I think that's simply wrong and
should be flagged as such.

That's what I'm doing.


> > "This transformation does not change the meaning of an RDF graph,
> provided that the Skolem IRIs do not occur anywhere else."
> >
> > This also seems just wrong. Under what sense of "meaning" would this be
> true? You say above that this operation is not a valid inference model.
>
> We went around in circles on this wording. We needed an intuitively
> acceptable form of words which could be understood by people who cannot
> follow formal semantics, which conveys the basic idea. The sense of
> "meaning" is that the skolemized graph entails the same things as the
> original graph does, *provided* they don't contain the skolem URI. So the
> skolemized graph has the exact same, one might say, inferential capacity as
> the original graph, provided we are only testing it against graphs which do
> not contain the skolem URI.
>

a) Meaning isn't only inferential power. We also expect that if we take two
things that mean the same, then under meaning-preserving transformation
(like saving them in some system, or computing some statistic on them) they
continue to mean the same thing.
b) The people who can't follow formal semantics depend on us not to mess
around. Meaning is not an intuitively understandable thing.

I write
[1] :alan :likes _:someone.

In system 1 it gets transformed to

[2] :alan :likes <http://breakme.org/.well-known/genid/1>.

In system 2 it gets transformed to

[3] :alan :likes <http://wreckme.org/.well-known/genid/1>.

Now [2] doesn't mean [3] unless <http://breakme.org/.well-known/genid/1>
sameAs <http://wreckme.org/.well-known/genid/1>.

But that sameAs can't be concluded from anything. So I conclude [2] doesn't
mean the same as [3].

Do *you* think (or do you think "people who cannot follow formal
semantics") will think that they mean the same?

What about aggregate operators like count (or count distinct) in SPARQL.
Will they work the same if we happen to combine RDF that has gone through
two paths which skolemize differently and are then merged?


> >
> > "Systems may wish to mint Skolem IRIs in such a way that they can
> recognize the IRIs as having been introduced solely to replace a blank
> node, and map back to the source blank node where possible."
> >
> > Where would it not be possible?
>
> Almost always. I would prefer to not have this particular wording in the
> spec, as it is logically meaningless.
>
> > Wouldn't this "feature" better be specified as part of the SPARQL
> specification?
>
> There is a strong contingent of 'linked data" enthusiasts who want RDF to
> be blank-node-free, and this skolemization stuff is there partly to keep
> them (and the developers who make tools for them to use) happy.
>

You can probably guess exactly how much I care to keep people happy by
letting them wreck other people's stuff.


> By the way, skolemization was mentioned in the 2004 RDF specs, so its not
> exactly something new.
>

IIRC skolemization was around substantially before the 2004 specs ;) I
don't have a problem with skolemization. I have a problem with writing a
spec that allows any conformant system to skolemize *my data* without my
permission. Particularly given that this transformation is "not itself
strictly a valid operation".


> > There you could say that given some keyword, blank nodes in the result
> should be skolemized, and that subsequent queries which retrieve the same
> blank nodes, asking them to be skolemized, MUST get the same skolems back
> each time.
> >
> > Sanctioning such changes for any process that handles RDF looks to me to
> be a bad idea.
>
> I agree. I dont think that anything is being "sanctioned" in the sense
> that it can be done without any attention to the consequences. Even
> performing a valid inference might change a graph in a way which interferes
> with some engines (such as an OWL parser).
>

That's why even inference tends to be by applied by choice, not applied "if
a system thinks it will produce better answers to queries" (sorry, that
query you wanted now takes 4 years. I know you liked that it took 10
seconds yesterday, but my engineers assure me that they are operating
within the bounds of the spec, and our contract only says we'll conform to
the spec)

I disagree about your conclusion about sanctioning. "MAY" is a technical
term that means precisely that an action is allowed,  the spec makes no
qualification concerning consequences, and the previous spec did not allow
this. Nothing in this discussion has convinced me that this is anything but
a wrongheaded, non-backward-compatible change, or that the benefit it
brings is anywhere near the damage it will do.

Put the capability somewhere other than in RDF proper. Seems to me there's
plenty of possibilities.

-Alan

systems *may* systematically transform some or all of the blank nodes in an
RDF graph into IRIs

MAY   This word, or the adjective "OPTIONAL", mean that an item is
   truly optional.  One vendor may choose to include the item because a
   particular marketplace requires it or because the vendor feels that
   it enhances the product while another vendor may omit the same item.




> Pat
>
> >
> > I'm happy to hear explanations of how I am wrong in each case I list
> above - I'm anxious to learn. But let's stay away from the "I'm confused"
> rhetoric, please.
> >
> > -Alan
> >
> >
> >
> > On Jul 13, 2012, at 1:03 PM, Alan Ruttenberg wrote:
> >
> > >
> > >
> > > On Fri, Jul 13, 2012 at 1:47 PM, David Booth <david@dbooth.org> wrote:
> > > On Fri, 2012-07-13 at 13:08 -0400, Alan Ruttenberg wrote:But that
> would render skolemization impossible, and it would conflict
> > > with the treatment of blank nodes as existentially qualified variables
> > > http://www.w3.org/TR/rdf-mt/#unlabel
> > > since it would be like saying "there exists an x, but you're not
> allowed
> > > to name x with a URI".
> > > >
> > >
> > > It would be like saying, you can't change an expression "there exists
> an x" to "x". They don't mean the same thing. If you have "y" then it
> implies there exists an x. But it doesn't imply "x". Blank nodes, according
> to the RDF semantics, mean "there exists an x".
> > >
> > > As such, it would seem to break RDF entailment.
> > >
> > > And if this is correct I would expect there to be a formal objection
> to the proposal.
> > >
> > > Perhaps Micheal could shed some light.
> > >
> > > -Alan
> > >
> > >
> > > --
> > > David Booth, Ph.D.
> > > http://dbooth.org/
> > >
> > > Opinions expressed herein are those of the author and do not
> necessarily
> > > reflect those of his employer.
> > >
> > >
> >
> > ------------------------------------------------------------
> > IHMC                                     (850)434 8903 or (650)494 3973
> > 40 South Alcaniz St.           (850)202 4416   office
> > Pensacola                            (850)202 4440   fax
> > FL 32502                              (850)291 0667   mobile
> > phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >
> >
> >
> >
> >
> >
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
Received on Sunday, 15 July 2012 07:49:51 UTC