RE: RDF-Based Semantics and n-ary dataranges

Hi all!

Concerning the question how to treat n-ary datatypes in the RDF-Based Semantics, I have decided to STOP NOW, given the extremely tight time schedule, and the strong existing dissent.

After some sketching of a possible solution in the way I imagine, I became aware that, while such a solution seems possible, its development would need a lot of care, and would take me significant time (several days, at least) to bring it into a proper form. And it would need to be reviewed, then perhaps revised, and so on. And this all would postulate that we can settle on the way I see things, which seems far from realistic to mee. So it's too late! It's my fault to not have brought this topic on the table much earlier, perhaps at F2F4, where there would have been enough time to discuss and work on a solution.

This means that there will be no change in the RDF-Based Semantics concerning n-aries. This is exactly what Peter proposes. And this means that, after the still outstanding review by Zhe (which I hope to be able to answer very rapidly), the RDF-Based Semantics will be ready to go. So we don't need to discuss this topic at our next TC.

This does not mean, however, that I am happy with the outcome, and, in fact, I will have something to say below for those who care. But, as far *I* am concerned, *I* do not intend to pursue this topic any further within this working group (I don't know whether people external to the WG will do, but I am not going to hit the drums). 

Anyway, now there *is* *some* solution for n-aries, with which some people in the WG are happy. I really believe it's broken (see below), but it doesn't matter much, because I don't believe that n-aries will be an immediate topic for existing RDF-ish implementations such as Jena or OWLIM. There isn't even support in OWL 2 RL for it. If n-aries will ever become a topic one day for RDF-Semantics based reasoners (and I can tell you that this is not unlikely), then there will be results from actual implementation endeavor which may or may not lead to a revision of the current treatment of n-aries in the RDF-Based Semantics... perhaps in OWL 3. :)

Cheers,
Michael


PS (long, only for those who care): Here is where I stand in the discussion about n-ary datatypes in the RDF-Based Semantics.

The domain of an RDF Semantics interpretation is a set of individuals, and every name occurring in the vocabulary of an RDF interpretation is mapped to such an individual in the domain, and not to anything else (not to subsets of the domain, and not to relations over the domain, as it is possible, for example, for the Direct Semantics).

Of course, such an individual may be everything, because we have a /formal/ semantics, i.e. we do not put restrictions on what a name can denote. Consequently, one can interprete some name in the vocabulary even by a certain tuple or set. But this does not mean that one can also refer to (or talk about) such an individual in the form of a tuple or set. All the individuals in the domain of an RDF interpretation are, by default, opaque when regarded through the RDF Semantics glasses, regardless of what we otherwise know about their nature or internal structure. 

One can, indeed, be more specific for certain "concrete" individuals, but then one has to be explicit. But, if at all, then the RDF Semantics is rather explicit in that it is /not/ intended to directly talk about sets or relations as instances of the domain of an RDF interpretation. Because, if this were allowed, then there would be absolutely no need for the IEXT property extension function and the ICEXT class extension function to exist. One could then simply map property names and class names directly to binary relations and sets within the domain, respectively.

To be clear: There is no problem with using some arbitrary individual in the domain, let it be a cow (or whatever), as the denotation of, for example, the term "rdfs:Class". Likewise, there is also no problem to have this cow be the argument of the ICEXT function with the function value being the set of all classes. One can even put this cow into the data domain, and by this turning the cow into a data value. This doesn't hurt the cow in any form. There are no restrictions on which things can be used as individuals (not even data values), and no restrictions on which thing a given name can denote. And this is not only ok for the RDF Semantics, but shouldn't be a problem for any /formal/ semantics one can think of, including the Direct Semantics. And, in fact, the data domain and the remaining part of the universe of any RDF interpretation should make a perfect pair of data and object domain for some Direct interpretation, no?

But, again, just that everything may be an individual (or data value) does not mean that one can refer to the specific aspects of all these "everythings". Whether a semantic condition of the RDF-Based Semantics can contain expressions such as "<x,y> in S" or "{ 1, {2,3} }", is exclusively determined by the /formalism/ we choose to define the semantics. So this isn't even really a question about the semantics itself, which essentially only determines the way how certain sets of names are mapped to certain sets of things, and their subsets, and so on. This is rather a question about the describing formalism.

Of course, such a formalism should be expressive enough to allow everything the semantics wants to express. But not more than this! We cannot assume for the domain of an RDF interpretation that "everything that isn't explicitly disallowed (such as sets as instances of sets), is allowed". If this was true, then it would in particular be allowed to build RDF interpretations with a domain containing all its subsets as instances. This would break the system, because the powerset of each set has a larger cardinality than the set itself, and this would just be the most obvious formal problem resulting from such a "relaxed" assumption. Consequently, there must be any restrictions on what can be said about the domain of an RDF interpretation and what not. But which restrictions these are is not obvious.

Ideally, it should be /explicitly/ stated what can be said and what not, i.e. one would explicitly have to say which formalism is used for the definition of a particular semantics (RDF-Based Semantics or Direct Semantics in our case, or XYZ Semantics). In principle, every model theory has to expose its used formalism, in order to determine what all these "and"s and "forall"s and "{...}" mean on the right-hand-side of the tables defining the semantics. This can be done, by, for example, pointing to a concrete book. 

We never do this in our documents, though. The typical rational is probably somewhat like that we use "ordinary mathematics" for this purpose. And this is, in general, perfectly fine for me... as long as we *really* keep with "ordinary mathematics" without any strange embellishments. So far, this was always the case, AFAICT. But now, there are suddenly ideas of the form that one can have tuples as instances of subsets of the universe, i.e. tuples within "ordinary" sets. How many mathematics teachers (or even people with a PhD in mathematics) will consider an expression such as "<1,2> in IR" to be "ordinary mathematics"? Maybe, some of them will agree that one *may* define a *variant* of "ordinary mathematics" which then allows to give this statement at least a truth value ("false" :)). But in general, most people will consider this expression to be a meaningless sequence of letters (in particular mathematics teachers, if their students write expressions of this form in a test :)). So, if we want to go this path that would allow us to write such expressions when defining the RDF-Based Semantics, then we have to be *explicit* and say what our concrete formalism is. 

And we should then be explicit for /both/ semantics, because it would be strange to learn that different formalisms are used for the Direct Semantics and the RDF-Based Semantics. I, for one, had never the feeling that I am referring to different such formalisms when trying to understand the definitions in the two semantics. Which different formalisms? If there would be differences, then the two semantics would be basically uncomparable, and the correspondence theorem would be (and would have ever been) a comparison of apples and oranges.

But, to reiterate, there is absolutely no need for having such "strange embellishments" for at least sets or binary relations as instances in the domain of an RDF interpretation, simply due to the IEXT and ICEXT functions. These functions have just been introduced to work around the need of such "strange embellishments". So, why should then suddenly be n-ary relations and tuples be allowed as instances of the domain? Wouldn't it just be natural to follow the already taken path and introduce n-ary extensions? That's just the obvious next step to me, and it wouldn't hurt the basic notion of the RDF universe being a set of (opaquely percepted) individuals, and nothing else. And with n-ary extensions, n-tuples would have their obvious home, just as in the Direct Semantics.

Ok, so much for this, now back to real work! :)

>-----Original Message-----
>From: Peter F. Patel-Schneider [mailto:pfps@research.bell-labs.com]
>Sent: Friday, April 03, 2009 1:18 PM
>To: Michael Schneider
>Cc: public-owl-wg@w3.org
>Subject: Re: RDF-Based Semantics and n-ary dataranges
>
>From: "Michael Schneider" <schneid@fzi.de>
>Subject: RE: RDF-Based Semantics and n-ary dataranges
>Date: Fri, 3 Apr 2009 12:31:30 +0200
>
>> Hi Peter,
>>
>> I see a lot of "no"s to all of my statements, but I am not quite
>certain
>> that I understand your arguments.
>>
>> Do you claim that a set such as
>>
>>   S := { 1, <2,3>, <4,5,6> }
>>
>> is allowed to be a subset of the data domain? I mean, where "<,>" is
>not
>> just some random way to write down composite data values (such as the
>"/" in
>> terms like "2/3" for expressing rationals), but it is really meant to
>be the
>> n-tuple operator appearing in statements such as
>>
>>   <x,y> in DR
>>
>> ?
>
>Absolutely.  This is a perfectly fine data domain in RDF and thus in the
>OWL
>2 RDF-Based semantics and, further, there is no reason not to have parts
>of the RDF-Based semantics that look inside these elements of the
>domain.  In fact, the entire idea behind datatypes is precisely that
>those
>elements of the data domain that matter do have some internal structure.
>
>> Or, more generally, do you say that all n-ary data ranges (or better,
>their
>> "value spaces") must be subsets of the data domain?
>
>The interpretation of every datatype and data range (unary or n-ary, it
>doesn't matter) would be a subset of the data domain.
>
>> Let me say that I would not be particularly happy with this sort of
>> "ontological mixup" (what about sets like "{ 1, {2},
>{3,<{4,<5,6>}},7>},
>> {1,2}x{3,4,5} }"? they should then consequently be allowed as subsets
>of the
>> data domain as well!), but telling my reasons would probably lead to
>far.
>
>It is not that such sets *should* be allowed to be subsets of the data
>domain, it is that they *currently* are allowed as subsets of the data
>domain.  There is nothing in the RDF semantics that prohibits elements
>of the data domain from being *anything* - they could be integers, they
>could be pairs of integers, they could be infinite strings, they could
>even be actual people.
>
>> However, what is more important, I then do not correctly understand
>the
>> Direct Semantics:
>>
>>   Direct Semantics, 2.2.2: "Data Ranges"
>>   <http://www.w3.org/2007/OWL/wiki/Semantics#Data_Ranges>
>>
>> [[
>> All datatypes in OWL 2 are unary, so each datatype DT is interpreted
>as
>> a unary relation over ?_D — that is, a set (DT)^DT subset ?_D.
>> Data ranges, however, can be n-ary, as this allows implementations
>> to extend OWL 2 with built-in operations such as comparisons or
>> arithmetic.
>> --> An n-ary data range DR is interpreted as an n-ary relation (DR)^DT
>over
>> ?_D.
>>                                                       ^^^^^^^^
>^^^^
>> ]]
>>
>> Until now, I have understood the word "over" in the context of the
>word
>> "relation" to mean
>>
>>   (DR)^DT subset (?_D)^n
>>
>> But you seem to suggest that "over" means
>>
>>   (DR)^DT subset ?_D
>>
>> without the exponent "n", meaning that dataranges, independent on
>their
>> arity, are always subsets of the data domain?
>
>This is about the direct semantics, which may (and indeed does) have a
>different basis, so the question is not very germane here.
>
>> Since I am an official reviewer of the Direct Semantics, I feel
>obliged to
>> ask for clarification of the Direct Semantics in this point. Whatever
>the
>> actual meaning will be in the end, the RDF-Based Semantics will then
>need to
>> be aligned with the Direct Semantics.
>
>The two semantics do have to be aligned, true, but that doesn't mean
>that the two semantics have to look completely the same.  They already
>look quite different in many areas but nonetheless end up being in close
>alignment.
>
>> Michael
>
>peter
>
>
>
>>>-----Original Message-----
>>>From: Peter F. Patel-Schneider [mailto:pfps@research.bell-labs.com]
>>>Sent: Wednesday, April 01, 2009 11:10 PM
>>>To: Michael Schneider
>>>Cc: public-owl-wg@w3.org
>>>Subject: Re: RDF-Based Semantics and n-ary dataranges
>>>
>>>From: "Michael Schneider" <schneid@fzi.de>
>>>Subject: RE: RDF-Based Semantics and n-ary dataranges
>>>Date: Wed, 1 Apr 2009 21:44:54 +0200
>>>
>>>>>-----Original Message-----
>>>>>From: public-owl-wg-request@w3.org [mailto:public-owl-wg-
>>>request@w3.org]
>>>>>On Behalf Of Ian Horrocks
>>>>>Sent: Wednesday, April 01, 2009 8:47 PM
>>>>>To: W3C OWL Working Group
>>>>>Subject: RDF-Based Semantics and n-ary dataranges
>>>>>
>>>>>We didn't manage to conclude this discussion.
>>>>>
>>>>>Summary of (my understanding of) the discussion so far:
>>>
>>>[...]
>>>
>>>>>* the structure of n-ary restrictions is defined in SS&FS, but
>>>>>(hopefully) only the unary case can occur in conforming ontologies
>>>>>(as above)
>>>>>* Michael believes that as a result the RDF-Based semantics is
>broken
>>>>
>>>> Yes, it is _syntactically_ broken. It essentially contains an
>>>expression of
>>>> the form
>>>>
>>>>   "<x1,...,xn> in S"
>>>>
>>>> where "S" is defined to denote a subset of the object domain.
>>>
>>>I still don't understand why this can be considered to be
>syntactically
>>>or semantically or even pragmatically broken.
>>>
>>>It is entirely possible to have an OWL 2 Full interpretation
>>> I = <IR, IP, IEXT, IS, IL, LV>
>>>where LV and thus IR contains not only things like the integers but
>>>also things like pairs, triples, quads, quints, ... over the integers
>>>(or over reals, or over complex numbers, or even over elements in
>>>IR-LV).
>>>
>>>However, even if LV only contains "standard" data values there is
>>>nothing wrong with asking whether LV contains a tuple.  This is a
>>>perfectly well-formed question even in this case, it is just that the
>>>answer is then always false.  (Which is, of course, the expected and
>>>desirable answer.)
>>>
>>>> If something like this would be written in the Direct Semantics, you
>>>would
>>>> certainly be horrified.
>>>
>>>Why?  Again, the answer would just be false.
>>>
>>>> And so you should be for the RDF-Based Semantics as
>>>> well.
>>>
>>>I'm certainly not horrified, and I don't see why anyone would be
>>>horrified.
>>>
>>>> Because this has nothing to do with the distinction between the
>Direct
>>>> Semantics and the RDF-Based Semantics. It only has to do with what
>can
>>>be
>>>> written syntactically in the set theory that underlies both our
>>>semantics.
>>>
>>>There is nothing in even set theory that requires that the atomic set
>>>elements not have some internal structure.
>>>
>>>> (There are other problems as well, but I think this is the simplest
>>>one to
>>>> acknowledge.)
>>>
>>>I don't see this problem, nor can I think of any other problems.
>>>
>>>> The problem is: Interpretation function under the semantics of RDF
>are
>>>> restricted to interpret names by individuals (instances of the
>domain
>>>IR).
>>>> In addition (in RDFS), there are two functions that allow me to
>>>/indirectly/
>>>> talk about subsets of the domain IR (the class extension function
>>>> "ICEXT()"), and subsets of the product IRxIR (the property extension
>>>> function "IEXT()"). But there is not yet such a function (or a
>>>collection of
>>>> functions) that allow me to talk about subsets of the products IR^n
>>>for
>>>> arbitrary n.
>>>
>>>I don't follow this reasoning at all.  Certainly there is nothing (so
>>>far) that requires tuples to be present in IR, but there is also
>nothing
>>>(so far) that forbids tuples from being present in IR.
>>>
>>>> So the underlying logic may allow me to write statements as above,
>at
>>>least
>>>> for an "S" representing a set of n-ary tuples. The problem is that I
>>>do not
>>>> reach this functionality of the underlying logic from within the
>>>current
>>>> framework of the RDFS semantics. So I need to extend this framework.
>>>This is
>>>> what I suggest to do (before April 15th...).
>>>
>>>Again, I don't think that any change is required.  As far as I can
>see,
>>>the RDF-Based Semantics is currently entirely coherent.
>>>
>>>>>* Peter doesn't agree.
>>>
>>>Yep.
>>>
>>>>>Comments?
>>>>>
>>>>>Ian
>>>>>
>>>
>>>> Cheers,
>>>> Michael
>>>
>>>peter

--
Dipl.-Inform. Michael Schneider
Research Scientist, Dept. Information Process Engineering (IPE)
Tel  : +49-721-9654-726
Fax  : +49-721-9654-727
Email: michael.schneider@fzi.de
WWW  : http://www.fzi.de/michael.schneider
=======================================================================
FZI Forschungszentrum Informatik an der Universität Karlsruhe
Haid-und-Neu-Str. 10-14, D-76131 Karlsruhe
Tel.: +49-721-9654-0, Fax: +49-721-9654-959
Stiftung des bürgerlichen Rechts, Az 14-0563.1, RP Karlsruhe
Vorstand: Prof. Dr.-Ing. Rüdiger Dillmann, Dipl. Wi.-Ing. Michael Flor,
Prof. Dr. Dr. h.c. Wolffried Stucky, Prof. Dr. Rudi Studer
Vorsitzender des Kuratoriums: Ministerialdirigent Günther Leßnerkraus
=======================================================================

Received on Tuesday, 7 April 2009 12:28:05 UTC