(Ref.: ISSUE-12: Conjunction and disjunction) Semantics of resource set definitions

Following the document prepared by Phil [1], some notes about the
conjunction issue which may turn useful in order to decide the best option.

NB: I don't consider separately all the options in [1], but just two
ones (corresponding to a variant of Option 2 [2] and to Option 4A [3])
as examples of the two proposed strategies, namely, explicit and
implicit semantics, respectively.

The purpose of a resource set definition is to denote a set of resources
in a both safe and flexible way. "Safe" means that it must not be
error-prone; "flexible", that it should be possible to combine
constraints in possibly nested formulas using the ∧ (AND), ∨ (OR), and ¬
(NOT) Boolean operators.

In other words, resource set definitions should be an abstraction layer
wrt already existing formal tools, which prevents errors but does not
limit their expressiveness.

Conjunction, disjunction, and negation are supported by corresponding
OWL properties (respectively, owl:intersectionOf, owl:unionOf,
owl:complementOf [4]). By using OWL properties, a resource set
definition denoting the set of resources hosted by machines with name
ending with example.org and example.net, where the path component of
their URI starts with foo or bar, can be expressed as follows:

[Example 1]

<wdr:Set>
  <owl:intersectionOf rdf:parseType="Collection">
    <owl:unionOf rdf:parseType="Collection">
      <wdr:hasHost>example.org</wdr:hasHost>
      <wdr:hasHost>example.net</wdr:hasHost>
    </owl:unionOf>
    <owl:unionOf rdf:parseType="Collection">
      <wdr:pathStartsWith>foo</wdr:pathStartsWith>
      <wdr:pathStartsWith>bar</wdr:pathStartsWith>
    </owl:unionOf>
  </owl:intersectionOf>
</wdr:Set>

whereas, if we wish to denote all the resources hosted either by
example.org, where the path component of their URIs starts with foo, or
example.net, where the path component of their URIs starts with bar, the
scope will be defined as follows:

[Example 2]

<wdr:Set>
  <owl:unionOf rdf:parseType="Collection">
    <owl:intersectionOf rdf:parseType="Collection">
      <wdr:hasHost>example.org</wdr:hasHost>
      <wdr:pathStartsWith>foo</wdr:pathStartsWith>
    </owl:intersectionOf>
    <owl:intersectionOf rdf:parseType="Collection">
      <wdr:hasHost>example.net</wdr:hasHost>
      <wdr:pathStartsWith>bar</wdr:pathStartsWith>
    </owl:intersectionOf>
  </owl:unionOf>
</wdr:Set>

Also more complex, nested, formulas can be expressed this way.

NB: if this solution is adopted, negative RDF properties should
expressed by using owl:complementOf, and not by specific properties.

The problem here is that AND/OR/NOT can be combined with no constraint,
and this means that we may have incorrect scope definitions, as the
following:

[Example 3]

<wdr:Set>
  <owl:intersectionOf rdf:parseType="Collection">
    <wdr:hasHost>example.org</wdr:hasHost>
    <wdr:hasHost>example.net</wdr:hasHost>
  </owl:intersectionOf>
</wdr:Set>

REs have the same problem.

By contrast, the current implicit semantics of resource set definitions
(same properties in OR, different properties in AND), avoids errors as
the ones in the example above. However, the problem is that such an
approach has limited expressiveness.

For instance, Example 1 may be expressed as follows:

[Example 4]

<wdr:Set>
  <wdr:hasHost>example.org</wdr:hasHost>
  <wdr:hasHost>example.net</wdr:hasHost>
  <wdr:pathStartsWith>foo</wdr:pathStartsWith>
  <wdr:pathStartsWith>bar</wdr:pathStartsWith>
</wdr:Set>

By contrast, Example 2 cannot be expressed by using this approach,
unless we support the possibility of denoting a DR scope by multiple
resource set definitions (see Option 4A [3]):

[Example 5]

<wdr:hasScope rdf:parseType="Collection">
  <wdr:Set>
    <wdr:hasHost>example.org</wdr:hasHost>
    <wdr:pathStartsWith>foo</wdr:pathStartsWith>
  </wdr:Set>
  <wdr:Set>
    <wdr:hasHost>example.net</wdr:hasHost>
    <wdr:pathStartsWith>bar</wdr:pathStartsWith>
  </wdr:Set>
</wdr:hasScope>

The problem here is that, if we wish to denote all the resources hosted
by example.org with a path starting with foo OR ending with bar, we must
specify redundant resource set definition:

[Example 6]

<wdr:hasScope rdf:parseType="Collection">
  <wdr:Set>
    <wdr:hasHost>example.org</wdr:hasHost>
    <wdr:pathStartsWith>foo</wdr:pathStartsWith>
  </wdr:Set>
  <wdr:Set>
    <wdr:hasHost>example.org</wdr:hasHost>
    <wdr:pathEndsWith>bar</wdr:pathEndsWith>
  </wdr:Set>
</wdr:hasScope>

Of course in such cases we can use REs, but not if we are using grouping
by resource property.

However, this works with all the RDF properties we defined, except the
following:

*RDF properties using as matching rule contains*

    Such properties may be in AND. By contrast, properties using as
matching rule starts with, ends with, or exact cannot be in AND (e.g.,
the same string cannot start with two different substrings).

    Possible solutions:

       1. do not support RDF properties using contains as matching rule:
use wdr:hasUri for them
       2. support RDF properties using contains as matching rule, use
them in case the should be in OR, but use wdr:hasUri in case the should
be in AND.
       3. support the possibility of specifying a set of patterns as
object of the RDF property, to be considered in AND:

          <wdr:pathContains>
            <wdr:pattern>foo</wdr:pattern>
            <wdr:pattern>bar</wdr:pattern>
          </wdr:pathContains>

       4. define a variant for such RDF properties
(wdr:pathAlsoContains), which will be in AND with the base one
(wdr:pathContains)

    Note that solution 1 and 2 work only when resources are grouped by
address, but not by property, whereas solution 3 applies also to the latter.

    By contrast, solution 4 works only in case the conjuncts are two; in
fact

    <wdr:pathContains>foo</wdr:pathContains>
    <wdr:pathAlsoContains>bar</wdr:pathAlsoContains>

    means (foo AND bar), whereas

    <wdr:pathContains>foo</wdr:pathContains>
    <wdr:pathAlsoContains>bar</wdr:pathAlsoContains>
    <wdr:pathAlsoContains>boo</wdr:pathAlsoContains>

    means (foo AND (bar OR boo)), and not (foo AND bar AND boo).

    Moreover in the following case

    <wdr:pathContains>foo</wdr:pathContains>
    <wdr:pathContains>bar</wdr:pathContains>
    <wdr:pathAlsoContains>boo</wdr:pathAlsoContains>

    how can we interpret this? wdr:pathAlsoContains is in AND with the
former wdr:pathContains, with the latter, or with both?

*Negative RDF properties*

    Also such properties may be in AND, independently from the matching
rule. In fact, if we say that a path should not start with foo, we can
also say that it must not start with bar. But is it really reasonable to
state that a path should not start with (end with, exact) foo OR should
not start with (end with, exact) bar? Such constraints will be NOT
satisfied only when a path starts with (ends with, exact) both foo AND
bar, which is not possible. In fact, formally, the constraints:

    <wdr:pathNotStartsWith>foo</wdr:pathNotStartsWith>
    <wdr:pathNotStartsWith>bar</wdr:pathNotStartsWith>

    correspond to

        ¬(pathStartsWith="foo") ∨ ¬(pathStartsWith="bar")

    which is equivalent to

        ¬(pathStartsWith="foo" ∧ pathStartsWith="bar")

    a statement which will be always true, since, as said previously, a
path cannot start with both foo and bar.

    The problem here is, again, with contains. A constraint stating that
a path must not contain foo OR bar is NOT satisfied when a path contains
both foo AND bar, which is perfectly possible.

    This issue can be solved by allowing in a resource set definition
just one instance of RDF properties with matching rule starts with, ends
with, exact, and one or more instances of RDF properties with matching
rule contains. This can be obtained by using proper cardinality
constraints in the definition of wdr:Set.

    Moreover, we can support constraints stating that a path should not
start with (end with, exact) foo OR bar by adopting, for instance,
solution 3 above, and assuming that, when used "inside" negative
properties, multiple instances of wdr:pattern are in OR. In fact, a
constraint

    <wdr:pathNotStartsWith>
      <wdr:pattern>foo</wdr:pattern>
      <wdr:pattern>bar</wdr:pattern>
    </wdr:pathNotStartsWith>

    corresponds to

        ¬(pathStartsWith="foo" ∨ pathStartsWith="bar")

    which is equivalent to

        ¬(pathStartsWith="foo") ∧ ¬(pathStartsWith="bar")

    a statement which will be true when a path does not start with foo
OR bar.

    Of course, we can adopt for this purpose also solutions 1 and 2
(i.e., REs), but they do not apply when resources are grouped base on
their properties.

    Alternative (and equivalent) solution: negative properties are
always in AND, even though they are instances of the same property; also
instances of wdr:pattern are in AND. In such a case we have no
constraint on the number of instances of RDF properties with matching
rule starts with, ends with, exact, whereas wdr:pattern can be used only
"inside" negative properties with matching rule contains. Is it more
intuitive this strategy or the former?

*RDF property wdr:hasProperty (wdr:hasNotProperty)*

    The wdr:hasProperty (wdr:hasNotProperty) is used when a scope is
denoted in terms of resource properties, which can be either in AND or OR.

    Also here the implicit approach can be applied (multiple instances
of wdr:hasProperty (wdr:hasNotProperty) in the same resource set
definition are in OR (AND), instances of wdr:hasProperty and
wdr:hasNotProperty in the same resource set definition are in AND).

    The problem is whether this principle should be extended also to
resource properties

    For instance, consider a resource set definition denoting all the
T-Shirts which are either blue or red. By using the same implicit
semantics of wdr:hasProperty (wdr:hasNotProperty), this can be expressed
as follows:

    <wdr:Set>
      <wdr:hasProperty>
        <ex:cloth>t-shirt</ex:cloth>
        <ex:colour>blue</ex:colour>
        <ex:colour>red</ex:colour>
      </wdr:hasProperty>
    </wdr:Set>

    Similarly, a resource set definition denoting all the T-Shirts which
are not blue or red (i.e., ¬(blue ∨ red) = ¬blue ∧ ¬red) can be
expressed as follows:

    <wdr:Set>
      <wdr:hasProperty>
        <ex:cloth>t-shirt</ex:cloth>
      </wdr:hasProperty>
      <wdr:hasNotProperty>
        <ex:colour>blue</ex:colour>
        <ex:colour>red</ex:colour>
      </wdr:hasNotProperty>
    </wdr:Set>

    It is to be investigated whether implicit semantics covers all the
possible resource set definitions using grouping by property. Of course,
owl:intersectionOf, owl:unionOf, and owl:complementOf do.

*To conclude*

The strategy based on explicit semantics (wdr:intersectionOf,
wdr:unionOf, wdr:complementOf) is the more flexible and expressive, but
it is error-prone; the strategy based on implicit semantics is safer,
but we may have problems in finding the best solution when we have to
express complex resource set definitions. Should we support both, as we
support strings and REs—i.e., implicit semantics for "normal" resource
set definitions, explicit semantics for more complex ones?


[1]http://www.w3.org/2007/powder/powder-grouping/conjunction
[2]http://www.w3.org/2007/powder/powder-grouping/conjunction#option1
[3]http://www.w3.org/2007/powder/powder-grouping/conjunction#option4a
[4]http://www.w3.org/TR/2004/REC-owl-ref-20040210/#Boolean

Received on Monday, 14 May 2007 08:57:36 UTC