Re: OWL equivalentClass question from Bijan Parsia on 2012-07-15 (semantic-web@w3.org from July 2012)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Sun, 15 Jul 2012 10:55:51 +0100
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Pat Hayes <phayes@ihmc.us>, David Booth <david@dbooth.org>, Michael Schneider <schneid@fzi.de>, semantic-web@w3.org, nathan@webr3.org, W3C OWL Working Group <public-owl-wg@w3.org>
Message-Id: <8E3A1563-653A-4757-A39F-473361F043D5@cs.man.ac.uk>

Oh dear. I suppose I should weigh in.

On 15 Jul 2012, at 08:48, Alan Ruttenberg wrote:

> On Sat, Jul 14, 2012 at 10:52 PM, Pat Hayes <phayes@ihmc.us> wrote:
>
> On Jul 14, 2012, at 12:15 PM, Alan Ruttenberg wrote:
[snip]
> It MAY be done at any time. The RDF specs do not set out to say what may or may not be done to RDF.

I have to go with Alan here.

At least in the concepts document, MAY (in the RFC2119 sense) is used:
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#conformance
and used in precisely that sense for IRItization:
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-skolemization
as is clear by the RFC2119 SHOULD shortly after.

I'll note, in passing, that the document sometimes misuses MAY, e.g.,
"IRIs in the RDF abstract syntax must be absolute, and may contain a fragment identifier."

No no no no no no no. It's not *optional* to contain a fragment identifier (if the IRIs originally contained identifier). It should be a non conformance to reject:
<http://ex.org#s> <http://ex.org#p> "object".
or worse, to transform it to:
<http://ex.org> <http://ex.org> "object".

Part of the problem here is that you're using *conformance* terminology interchangably with *grammar* specs. The right way to spec it is with a grammar of IRIs (including fragments).

I urge you to clean this up as soon as possible!

This leads to the other problem with using MAY etc.: You need to specify the class of processors (or the sort of operation) that these conformance requirements apply to. For example, a parser (IMHO) shouldn't lean graphs. In fact, I would hope that it is nonconforming of a parser to lean a graph (in the parsing phase). Contrariwise, it is acceptable for a parser to remove strictly duplicate triples. Now whether they MUST do so or merely SHOULD do so is an interesting question. (I think MUST here is good, esp. given the history, but I can imagine reasons not too, though you might want SPARQL engines to act as if the dataset is, indeed, a set.)

Similarly, a parser MUST NOT generate the ground RDFS closure of a graph as the output of the parser. I'd guess, anyway. I'd say that's non-conforming. An interesting system might do that, but then it has to advertize itself differently.

So, not all tools are free to do whatever they like so long as it is equivalence graph preserving. And there's no reason why the group can't spec or talk about non-equivlance preserving manipulations.

BUT, there obviously be dragons. For example, I notice in an issue:

"the fact that graphs merge easily. This will be added once the Working Group has finalised a design."

Oy. Graphs do *not* "merge easily". At least no more so than wellformed XML documents merge easily (and they do! It's quite easy to produce a well formed xml documents from two wellformed XML douments! Peasy, even.)

Depending on how you do it, merging two RDF graphs can destroy all sorts of properties (e.g., results of sparql queries, leanness).

If you are going to keep this document (which always was rather confusing, I mean the "introduction" isn't an introduction, it's the concepts part! everything after the conformance is syntax!!! except 1.1 is syntax too! and so is 1.4), I strongly suggest having a section called Abstract Syntax wherein you specify, well, an abstract syntax. Though it annoyed me, I've found the use of UML + a grammar to be pretty damn helpful. The current natural language stuff is dangerous in its wooliness.

I would also suggest another section on "common operations" wherein you define all the common things one might do with a graph, with some discussion of the implications of each operation. So:

Parsing:
A parse operation takes an input, typically in the form of a string containing a purported instance of some concrete RDF serialization and either produces an RDF graph in abstract syntax or an error. Since an abstract syntax graph is a set, a conforming system MUST remove all explicit duplicates.

However, there are situations and components for which this is unreasonable. A streaming parser MAY not do deduping, but leave that for downstream components. If a streaming parser is used directly for graph operations (e.g., counting triples), then the results may be inaccurate wrt the spec.

Serializing:
A serializing operation takes an input which is functionally equivalent to an abstract graph syntax graph and generates a string which conforms to some concrete syntax for RDF. [[BUNCH MORE STUFF]]

Note that there are very limited circumstances wherein a parsing-serializaiton cycle results in exactly the same string.

Deduping:
If we consider a slight liberalization of the abstract syntax graph to be a bag rather than a set (as might be the output of a non-conforming parser), then deduping is the conversion of the bag version to the equivalent set version. Deduping is a common operation and is required for various other conforming operations. Users or applications should not rely on bag semantics for RDF graphs.

Leaning:
etc.

Skolemizing:

Closeing:

Etc.

Put all the sensible operations in one place and then use them elsewhere.

Cheers,
Bijan.

Received on Sunday, 15 July 2012 09:56:17 UTC