Re: OWL equivalentClass question from Pat Hayes on 2012-07-15 (public-owl-wg@w3.org from July 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 15 Jul 2012 11:42:20 -0500
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: David Booth <david@dbooth.org>, Michael Schneider <schneid@fzi.de>, semantic-web@w3.org, nathan@webr3.org, W3C OWL Working Group <public-owl-wg@w3.org>
Message-Id: <7F3E03CF-5650-463A-AD72-9D8CB78C1D1C@ihmc.us>
On Jul 15, 2012, at 2:48 AM, Alan Ruttenberg wrote:

> 
> 
> On Sat, Jul 14, 2012 at 10:52 PM, Pat Hayes <phayes@ihmc.us> wrote:
> 
> On Jul 14, 2012, at 12:15 PM, Alan Ruttenberg wrote:
> 
> >
> >
> > On Sat, Jul 14, 2012 at 12:35 AM, Pat Hayes <phayes@ihmc.us> wrote:
> > Alan has drawn my attention to this thread., which I confess I find rather confusing.
> >
> > I've used the "i'm confused" thing too. But I doubt you are. It's perfectly normal for the chair of one group, on seeing a change in a specification on which it depends, trying to figure out what the implications are for their spec. Second, David showed a misunderstanding of what the situation was from a logical point of view, and this made me worry that others (even the editors of the RDF 1.1 spec) might also share such misconceptions.
> 
> Perhaps 'puzzling' would have been better. I really was, no rhetoric involved.
> 
> OK. 
> 
> >
> > First, some basics. Regarding skolemization, it is important to remember that skolemization is not a valid inference process, strictly speaking. If you start with a graph G contaning a bnode and skolemize it to get another graph GS where the bnode has been replaced by a URI, then G does not entail GS.
> >
> > Good. That's important, as it means that the RDF document needs to specify in which situations, and with what consequences, skolemization may be done.
> 
> It MAY be done at any time. The RDF specs do not set out to say what may or may not be done to RDF.
> You are free to do anything you like. What the specs do specify is what changes to RDF graphs are valid entailments. Skolemization is not a valid entailment. It is very close to being valid, but it is not strictly valid.
> 
> It seems the spec contradicts what you say.
> 
> What the specs do specify is what changes to RDF graphs are valid entailments
> +
> The RDF specs do not set out to say what may or may not be done to RDF
> +
> The draft at http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html section 3.5 Replacing Blank Nodes with IRIs say: In situations where stronger identification is needed, systems may systematically transform some or all of the blank nodes in an RDF graph into IRIs
> +
> Skolemization is not a valid entailment
> 
> Am I not understanding something?

Maybe I am the one not understanding, but whatever the reason, you and I are failing to communicate, it seems. You have listed a number of facts above, from which it seems to me that nothing particularly interesting or noteworthy follows. I'm not quite sure what to say at this point. 

>  
> 
> >
> > The relationship between them is subtler: it is that: **if H does not contain the skolem URI **, then G entails H iff GS entails H. Now, GS entails GS, if course, so you might think that this implies that G entails GS, but it doesnt because GS of course *does* contain the skolem URI.
> >
> > It can't by what was specified - that the skolem URI should not appear anywhere else.
> 
> But it does appear in the skolemized graph GS, by construction. That is why we call it 'skolemized'.
> 
> I misunderstood the point you are making here. What I think I understand: G does not entail GS (+ subtle relation). What I don't understand: How skolemization can be allowed at will but not break RDF entailment.   

What do you mean, "allowed at will"? Tell me what operations on RDF graphs are NOT allowed at will? You can make a cheese omelette out of an RDF graph as far as the specs are concerned. You can read the entire RDF 2004 semantics document and not find anything that prohibits any operation on any RDF graph. 

>  
> > So, it is not at all surprising that the skolemization of a graph might have logical properties that are not shared by the unskolemized graph. Such a situation does not break RDF entailment, nor does it render skolemization impossible. It just means that you have to use skolemization carefully, as it is not a valid inference mode all by itself; but this always was the case.
> 
> Again, what I'm having trouble with is why the proposed spec says (only) you can "In situations where stronger identification is needed". Nothing about being careful, or how to be careful.

The specs don't tell you how to cross the road, either. Why would a specification have to explain what "be careful" means? 

>  
> > I will re-read the draft spec to see how this is stated.
> >
> > As to the OWL mappings. This is one instance of a general phenomenon, that when a 'higher' language (OWL-DL) is embedded into RDF, there are going to be restrictions on the legal forms that are used to encode the higher language. You will not get perfect freedom to perform even valid RDF entailments on the OWL-DL/RDF without risking making the RDF into something that is no longer a legal RDF encoding of OWL-DL syntax.
> >
> > Agreed for OWL2 under the DL semantics. However OWL according to the RDF semantics is a different story and that is part of the spec I worry about too.
> >
> > Put another way, the OWL-DL imposes its own syntactic and semantic retrictions which go beyond those imposed by RDF itself, and engines which need to use the OWL-DL/RDF *as OWL-DL* must be able to respect those OWL-DL-imposed restrictions.
> >
> > It is not only engines that may need OWL-DL/RDF, but users of those engines. A sanction for engines to do skolemization at will will affect those users inadvertently. Sometimes triple stores are used solely to *store* OWL.
> 
> And it would be dangerous to skolemize such stores, for fear of brealking the OWL conventions. Forgive me, but this seems *obvious*, so I wonder why you are making such a big deal out of it.
>  
> Because prior to this proposal about skolemization, I don't know of anything in the RDF specs that would sanction corrupting (transforming) RDF graphs in such a way.

There were always these possibilties. Clearly, OWL-DL imposes conditions on its representation in RDF which might get broken if you perform transformations on the RDF graph, even if those transformations are valid RDF consequences. Skolemization breaks OWL syntax, yes, but there are others. Just omitting some of the OWL/RDF triples will break the OWL, for example. So the validity or non-validity **in RDF** of the transformation is irrelevant to whether or not it breaks OWL syntax rules. 

> After this I need to worry about whether "systems" and this means any RDF processing system, it appears, will corrupt OWL that is put through them.

This always was the case. This has never not been the case. If you want to preserve OWL syntax, you need to make sure you use tools that are OWL-savvy. 

> Forgive *me*, but this seems *obvious* too, so I wonder why *you* are making such a big deal about my concern.  

I think you have just discovered that the world is not as comfortable a place as you thought it was. But it never was that comfortable, is my point. 

>  
>  I must be missing something that you are assuming, or something (?)
> 
> Hmm. I believe my assumption is simple. If I use an RDF system according to the specifications, then tools further up the semantic web stack won't break.  

Sorry, wrong. If you want OWL/RDF to not break, you have to take account of the OWL specifications as well as the RDF specifications. I think that for any XXX past RDFS, present or future, if you want XXX/RDF to not break, you have to pay attention to XXX as well as to RDF. 

>  
> > This is hardly surprising. The "Full" subfamily is there for people who wish to have complete freedom at the RDF level, but they necessarily pay the price of sacrificing deductive efficiencies available only for the more restricted higher-level language.
> >
> > Yes, but they also define OWL using the RDF Semantics as a semantic extension of RDF (as it existed when the spec was finalized). Changes thereafter that change RDF semantics will, and should be, examined carefully.
> 
> Indeed. So far, nobody (except me) has suggested any changes in the RDF semantics, partly for this reason.
> 
> The concept document is supposed to be in sync with the semantics. It's hard for me to understand how the RDF concept document offers that tools "MAY" change the semantics of a graph. 

I have no problem with this. In the real world, people are going to be doing all kinds of things to RDF data that are not strictly logically valid. Obviously we don't want to make all this illegal or nonconformant. 

> > So, overall: nothing here is particularly surprising or alarming, and nothing is (any more) broken (than the world has always been.)
> >
> > I'm not so sure. For example, looking at the current draft we see.
> >
> >
> > "Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization."
> >
> > It is incorrect that blank notes are purely an artifact of serialization. In *any*  serialization that I am aware of. Please correct me if I am wrong.
> 
> The blank node *identifiers* are artifacts of the serialization. In the RDF abstract syntax, blank nodes have no identifiers.
> 
> OK. I see that in place of this there must merely be a way to say whether two bnodes are the same. 
>  
> >
> > Then:
> >
> > "In situations where stronger identification is needed, systems may systematically transform some or all of the blank nodes in an RDF graph into IRIs [IRI]. Systems wishing to do this should mint a new, globally unique IRI (a Skolem IRI) for each blank node so transformed."
> >
> > This amounts to, from my point of view, entailing
> 
> No, it does not say that that this change is an entailment. However, I agree that this wording  could be misleading, and perhaps we should make the situation clearer.
> 
> 
> I used "entailing" to relate the statement in the proposed spec to the sanctioned breaking of OWL serialized by any RDF system, not to say that the skolemization is an entailment. 

Sorry.

> 
> > : systems may systematically change OWL ontologies (under the DL-semantics) stored in them to become RDF that is no longer an OWL (under the DL semantics) ontology.
> >
> > That sounds bad to me. It definitely sounds more broken then things were before. Before I could put an OWL ontology into a named graph and get it out unscathed. Now I can't count on that.
> 
> If all you do is put it in and then take it out, it will be unscathed. If someone makes a change to the graph while it is in there, it might get changed, yes. That includes skolemizing it.
> 
> It does not say that at all - how do you conclude this? The statement says that "systems" can do this for reasons seemingly justified by the "system". I don't consider a system a someone. This is not stated as something a user controls, but rather as a prerogative of an implementer. Most importantly it allows some entity *other than the author of the RDF* to make changes to it. 

Yes. Of COURSE anyone can make changes to any RDF. I can make changes to any HTML or any XML, for that matter. I can't overwrite the original files, but thats a matter of file access permissions, not about logical validity.

> 
> Changes such as this should be viewed the same way one would a system that decided (for whatever reason) to change all xsd:decimal numbers into xsd:floats. Now certainly someone should be able to write a system that does exactly this. But they shouldn't be able to say it obeys RDF or OWL semantics.

And we don't say that skolemization is valid, either. 

> If an RDF document is saying you can make transformations to the content that change its meaning (it's hard to say it has the same meaning if different entailments follow)

Well, its NEARLY the case that the same entailments follow. (If RDF were second-order, they would be exactly the same entailments and skolemization would be valid. Sigh.) 

> then I think that's simply wrong and should be flagged as such. 
> 
> That's what I'm doing.
> 
> 
> > "This transformation does not change the meaning of an RDF graph, provided that the Skolem IRIs do not occur anywhere else."
> >
> > This also seems just wrong. Under what sense of "meaning" would this be true? You say above that this operation is not a valid inference model.
> 
> We went around in circles on this wording. We needed an intuitively acceptable form of words which could be understood by people who cannot follow formal semantics, which conveys the basic idea. The sense of "meaning" is that the skolemized graph entails the same things as the original graph does, *provided* they don't contain the skolem URI. So the skolemized graph has the exact same, one might say, inferential capacity as the original graph, provided we are only testing it against graphs which do not contain the skolem URI.
> 
> a) Meaning isn't only inferential power. We also expect that if we take two things that mean the same, then under meaning-preserving transformation (like saving them in some system, or computing some statistic on them) they continue to mean the same thing.

Statistics on syntax is typically not preserved under even valid entailments, so that is a red herring. 

> b) The people who can't follow formal semantics depend on us not to mess around. Meaning is not an intuitively understandable thing.

OK, I agree, I don't like the use of the "meaning" word either. But we had to say something without using words like "entail" or "interpretation" or "valid". Can you suggest a different phrase?

> I write  
> [1] :alan :likes _:someone.
> 
> In system 1 it gets transformed to
> 
> [2] :alan :likes <http://breakme.org/.well-known/genid/1>.
> 
> In system 2 it gets transformed to 
> 
> [3] :alan :likes <http://wreckme.org/.well-known/genid/1>.
> 
> Now [2] doesn't mean [3] unless <http://breakme.org/.well-known/genid/1> sameAs <http://wreckme.org/.well-known/genid/1>.

True. They both entail the original, however. And if you had some way to know that these were skolem names, and could trace their provenance, then you *could* infer they were the same. 

But look, this is exactly like the situation without skolemization. Skolemization is just coining a new name for a thing that is known to exist. It is always possible for two sources to give different names for the same thing, and nobody to notice or record the necessary sameAs.  If you come across [2] and [3] in the wild, you might guess that the sameAs holds. Not strictly valid, but some data-cleaning-up-engines might make that inference, perhaps given rather more data to go on.  And you could validly infer [1] from either or both of them. So in a very real sense, no information has been lost by this skolemization process. You can still draw the same conclusions you could before the skolemization was done. 

> But that sameAs can't be concluded from anything. So I conclude [2] doesn't mean the same as [3].

Correct. This, BTW, is why it is recommended that skolem URIs are chosen so that they can be universally recognized as such, and hence replaced by blank nodes when required. (Which also, by the way, goes a long way to solving the OWL issue: any OWL/RDF that has skolem names in it needs to be de-skolemized to make it back into legal OWL. This is a fast, one-pass, operation. I'm guessing that if skolem forms become common, a grad student will write a fast deskolemizer and it will get incorporated into OWL tools wihtout any fuss.) 

> Do *you* think (or do you think "people who cannot follow formal semantics") will think that they mean the same?
> 
> What about aggregate operators like count (or count distinct) in SPARQL. Will they work the same if we happen to combine RDF that has gone through two paths which skolemize differently and are then merged?

No, but they might well not work even if the two transformations were both deductively valid. 

> >
> > "Systems may wish to mint Skolem IRIs in such a way that they can recognize the IRIs as having been introduced solely to replace a blank node, and map back to the source blank node where possible."
> >
> > Where would it not be possible?
> 
> Almost always. I would prefer to not have this particular wording in the spec, as it is logically meaningless.
> 
> > Wouldn't this "feature" better be specified as part of the SPARQL specification?
> 
> There is a strong contingent of 'linked data" enthusiasts who want RDF to be blank-node-free, and this skolemization stuff is there partly to keep them (and the developers who make tools for them to use) happy.
> 
> You can probably guess exactly how much I care to keep people happy by letting them wreck other people's stuff. 

I don't see that anyone is telling anyone to wreck anything. 

> By the way, skolemization was mentioned in the 2004 RDF specs, so its not exactly something new.
> 
> IIRC skolemization was around substantially before the 2004 specs ;) I don't have a problem with skolemization. I have a problem with writing a spec that allows any conformant system to skolemize *my data* without my permission.

Nobody is saying anyone has permission to do anything to your data without your permission. Permissions are one issue, valid entailments are another. They have almost nothing to do woth one another, AFAICS. Now, I can draw conclusions from your data, of course, but that is my business, not yours. I might use strictly logically valid rules, or I might use non-valid rules, to draw my conclusions. (You say something exists, I decide to give it a name.  The name is traceable to me, not to you.) But as I say, that is my business, and you, the publisher of some RDF, have no say over that. The act of publishing your RDF makes it publicly available to me to use as I see fit. 

> Particularly given that this transformation is "not itself strictly a valid operation".
>  
> > There you could say that given some keyword, blank nodes in the result should be skolemized, and that subsequent queries which retrieve the same blank nodes, asking them to be skolemized, MUST get the same skolems back each time.
> >
> > Sanctioning such changes for any process that handles RDF looks to me to be a bad idea.
> 
> I agree. I dont think that anything is being "sanctioned" in the sense that it can be done without any attention to the consequences. Even performing a valid inference might change a graph in a way which interferes with some engines (such as an OWL parser).
> 
> That's why even inference tends to be by applied by choice, not applied "if a system thinks it will produce better answers to queries" (sorry, that query you wanted now takes 4 years. I know you liked that it took 10 seconds yesterday, but my engineers assure me that they are operating within the bounds of the spec, and our contract only says we'll conform to the spec)
> 
> I disagree about your conclusion about sanctioning. "MAY" is a technical term that means precisely that an action is allowed,

Ah. The RDF specs use a special text format to capture the RFC 2119 meanings of the technical permissions vocabulary. This particular use was informal, and not intended to convey the strict RFC 2119 meaning. Perhaps we should make this clearer. 

>  the spec makes no qualification concerning consequences

Perhaps it should, yes. 

> , and the previous spec did not allow this.

No, the previous spec *did* allow this. The previous spec allowed ANYTHING to be done to RDF graphs.

> Nothing in this discussion has convinced me that this is anything but a wrongheaded, non-backward-compatible change, or that the benefit it brings is anywhere near the damage it will do. 
> 
> Put the capability somewhere other than in RDF proper. Seems to me there's plenty of possibilities.

I'm not sure what you mean by "RDF proper". All the spec does is draw attention to one of the many ways to transform RDF graphs into a different graph, and point out a use case for it. 

Pat

> 
> -Alan
> 
> systems may systematically transform some or all of the blank nodes in an RDF graph into IRIs
> MAY   This word, or the adjective "OPTIONAL", mean that an item is
>    truly optional.  One vendor may choose to include the item because a
>    particular marketplace requires it or because the vendor feels that
>    it enhances the product while another vendor may omit the same item.
> 
> 
>  
> Pat
> 
> >
> > I'm happy to hear explanations of how I am wrong in each case I list above - I'm anxious to learn. But let's stay away from the "I'm confused" rhetoric, please.
> >
> > -Alan
> >
> >
> >
> > On Jul 13, 2012, at 1:03 PM, Alan Ruttenberg wrote:
> >
> > >
> > >
> > > On Fri, Jul 13, 2012 at 1:47 PM, David Booth <david@dbooth.org> wrote:
> > > On Fri, 2012-07-13 at 13:08 -0400, Alan Ruttenberg wrote:But that would render skolemization impossible, and it would conflict
> > > with the treatment of blank nodes as existentially qualified variables
> > > http://www.w3.org/TR/rdf-mt/#unlabel
> > > since it would be like saying "there exists an x, but you're not allowed
> > > to name x with a URI".
> > > >
> > >
> > > It would be like saying, you can't change an expression "there exists an x" to "x". They don't mean the same thing. If you have "y" then it implies there exists an x. But it doesn't imply "x". Blank nodes, according to the RDF semantics, mean "there exists an x".
> > >
> > > As such, it would seem to break RDF entailment.
> > >
> > > And if this is correct I would expect there to be a formal objection to the proposal.
> > >
> > > Perhaps Micheal could shed some light.
> > >
> > > -Alan
> > >
> > >
> > > --
> > > David Booth, Ph.D.
> > > http://dbooth.org/
> > >
> > > Opinions expressed herein are those of the author and do not necessarily
> > > reflect those of his employer.
> > >
> > >
> >
> > ------------------------------------------------------------
> > IHMC                                     (850)434 8903 or (650)494 3973
> > 40 South Alcaniz St.           (850)202 4416   office
> > Pensacola                            (850)202 4440   fax
> > FL 32502                              (850)291 0667   mobile
> > phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >
> >
> >
> >
> >
> >
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 15 July 2012 16:42:58 UTC