Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of resource set definitions

Thanks very much Andrea for spending a lot of thinking time on this. I'm 
feeling a lot clearer on the problems and, I know you'll be pleased to 
hear, coming round very much to your initial point of view that we 
should support multiple scope/set statements in a DR - i.e. option 4A in 
the discussion doc [PA1].

In your discussion, you suggest 4 possible solutions to the pathContains 
issue. The complexities get more severe when we get into negatives and, 
from my perspective, we're getting a long way away from a design 
fundamental of simplicity with the real possibility that a 
semi-technically minded person could write a set definition by hand if 
necessary.

Also, forgive me, but whilst all your examples are valid XML, they're 
not all valid RDF since there's always the need for Class -> property -> 
Literal/Class etc. (RDF heads like to talk about 'striping').

I take your point entirely that, since OWL defines unionOf, 
intersectionOf etc. that, if we're going to use anything like this, we 
should use exactly the OWL properties - but they are properties, not 
classes. I've re-created the original option 2 [PA2] as option 2A. The 
RDF/XML and the graph are available at [PA3] and [PA4] respectively.

I won't bother pasting the RDF/XML in here - it's just too HORRIBLE!!! 
That said, it _is_ perfectly legitimate OWL DL using just our limited 
number of RDF properties so we are not, indeed cannot, preclude such an 
approach being taken. NB. use of intersectionOf and unionOf requires OWL 
DL, not OWL Lite - which gets us into more specialised inference engines.

One thing I noticed in my quick re-reading of the OWL specs, was the 
line "...Each of the immediate contained expressions in the class 
definition further restricts the instances of the defined class. 
Instances of the class belong to the intersection of the restrictions" 
[PA5]. That is, there is precedent for an implication of combining 
properties with AND. So the set:

<wdr:Set>
   <wdr:hasHost>example.org</wdr:hasHost>
   <wdr:pathStartsWith>foo</wdr:pathStartsWith>
</wdr:Set>

is a neat and simple way of defining "all resources on example.org with 
a path beginning with foo" that is likely to be widely understood. We 
can use OWL Lite to limit cardinality to 0 or 1.

So the issue is how to handle unions - example.org or example.net where 
the path is foo or bar. Well, option 4A shows how that can be done - 
with a closed world Collection of Sets. A processor works through them 
until it either finds a match or comes to the end.

I'm happy with that approach in that I believe that the number of 
instances where content providers want to say something even as complex 
as the example.or/net plus foo/bar example will be very small.

With one exception... host lists.

It is very common for people to own multiple domain names and, since 
their content creation is likely to be carried out under a unified 
policy, for all the content to have a similar description. So I do feel 
we need to make a special case here somehow. My suggestion for this is 
that we define an RDF property of hasHostList which, guess what, takes a 
list of host names.

<wdr:Set>
   <wdr:hasHostList>example.org example.net</wdr:hasHost>
</wdr:Set>

The maxCardinality would, again, be 1. The only problem now is how to do 
we say (in code) "if you have the property hasHostList you cannot also 
have hasHost? I think a clever bit of OWL might do this but I'm not 
sure. (or maybe we say that hasHost alone takes a white space separated 
list??)

So let me posit an option 4B:

<wdr:WDR>
   <wdr:hasScope rdf:parseType="Collection">
      <wdr:Set>
       <wdr:hasHostList>example.org example.net</wdr:hasHostList>
       <wdr:pathStartsWith>foo</wdr:pathStartsWith>
     </wdr:Set>

      <wdr:Set>
       <wdr:hasHostList>example.org example.net</wdr:hasHostList>
       <wdr:pathStartsWith>bar</wdr:pathStartsWith>
     </wdr:Set>

   </wdr:hasScope>

   <wdr:hasDescription rdf:resource="http://www.example.org/foo" />
</wdr:WDR>

RDF is at [PA6], Graph is at [PA7].

Two things bother me about this:
1. It makes a special case so that we can have a list of things because 
it's convenient. All the arguments about the value of a property having 
structure come back into play.
2. The host list needs to be repeated in each Set.

I've tried various ways to get around this using something like the Host 
Restrictions idea from RDF-CL but we end up with things like open lists, 
  structured values or both - it's not looking pretty. My hope is that 
when we get on to talk about DR packages we can devise some sort of 
inhertiance system but I think that should not be part of the grouping 
of resourcs doc.

So, in summary, I think my feelings on this are:

Option 1 - bad because of the reasons given in the doc: it is very 
woolly and we can't use OWL cardinality contraints. The idea of "combine 
these elemetns in the list with AND and these elements in the same list 
with OR"is just not good enough.

Option 2. Actually, option 2A. This uses existing OWL DL properties to 
produce unions and intersections but doesn't create any new properties. 
Good. We can use cardinality constraints. Again, good. It's v. 
complicated - Bad. But we can't stop people using it if they want to - 
we just got out of jail free.

Option 3. Everything has a white space separated list. Andrea's point 
about it not being possible to say example.org/foo OR example.net/bar is 
well made. We can't use this as simply as option 3 sets out.

Option 4 - modified to option 4B. This has a closed list of Sets, we can 
put cardinality constraints on everything. Some drawbacks - repetition 
of host list, use of collections entails more processing. But, right now 
it's one of my favourites.

Option 5. (Regular Expressions) Always possible so it's always there.

Option 6. Is my other favourite. We have to break out of RDF to derive 
the description of a given resource. By giving an XML literal we make 
this explicit. I want to play with some XPath queries on Option 6 
against SPARQL queries for option 4B to see whether this option is 
really as good as Jo says it is (I think it might be).

Option 7. Andrea's done a good destruction job on this - ignore.

Enough for now.

Phil.


[PA1] http://www.w3.org/2007/powder/powder-grouping/conjunction#option4a
[PA2] http://www.w3.org/2007/powder/powder-grouping/conjunction#option2
[PA3] http://www.fosi.org/projects/powder/option2a.rdf
[PA4] http://www.fosi.org/projects/powder/option2a.png
[PA5] See towards the end of section 3.1.1 of the OWL Guide 
http://www.w3.org/TR/2004/REC-owl-guide-20040210/#DefiningSimpleClasses
[PA6] http://www.fosi.org/projects/powder/option4b.rdf
[PA7] http://www.fosi.org/projects/powder/option4b.png

Andrea Perego wrote:
> Following the document prepared by Phil [1], some notes about the
> conjunction issue which may turn useful in order to decide the best option.
> 
> NB: I don't consider separately all the options in [1], but just two
> ones (corresponding to a variant of Option 2 [2] and to Option 4A [3])
> as examples of the two proposed strategies, namely, explicit and
> implicit semantics, respectively.
> 
> The purpose of a resource set definition is to denote a set of resources
> in a both safe and flexible way. "Safe" means that it must not be
> error-prone; "flexible", that it should be possible to combine
> constraints in possibly nested formulas using the ∧ (AND), ∨ (OR), and ¬
> (NOT) Boolean operators.
> 
> In other words, resource set definitions should be an abstraction layer
> wrt already existing formal tools, which prevents errors but does not
> limit their expressiveness.
> 
> Conjunction, disjunction, and negation are supported by corresponding
> OWL properties (respectively, owl:intersectionOf, owl:unionOf,
> owl:complementOf [4]). By using OWL properties, a resource set
> definition denoting the set of resources hosted by machines with name
> ending with example.org and example.net, where the path component of
> their URI starts with foo or bar, can be expressed as follows:
> 
> [Example 1]
> 
> <wdr:Set>
>   <owl:intersectionOf rdf:parseType="Collection">
>     <owl:unionOf rdf:parseType="Collection">
>       <wdr:hasHost>example.org</wdr:hasHost>
>       <wdr:hasHost>example.net</wdr:hasHost>
>     </owl:unionOf>
>     <owl:unionOf rdf:parseType="Collection">
>       <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>       <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>     </owl:unionOf>
>   </owl:intersectionOf>
> </wdr:Set>
> 
> whereas, if we wish to denote all the resources hosted either by
> example.org, where the path component of their URIs starts with foo, or
> example.net, where the path component of their URIs starts with bar, the
> scope will be defined as follows:
> 
> [Example 2]
> 
> <wdr:Set>
>   <owl:unionOf rdf:parseType="Collection">
>     <owl:intersectionOf rdf:parseType="Collection">
>       <wdr:hasHost>example.org</wdr:hasHost>
>       <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>     </owl:intersectionOf>
>     <owl:intersectionOf rdf:parseType="Collection">
>       <wdr:hasHost>example.net</wdr:hasHost>
>       <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>     </owl:intersectionOf>
>   </owl:unionOf>
> </wdr:Set>
> 
> Also more complex, nested, formulas can be expressed this way.
> 
> NB: if this solution is adopted, negative RDF properties should
> expressed by using owl:complementOf, and not by specific properties.
> 
> The problem here is that AND/OR/NOT can be combined with no constraint,
> and this means that we may have incorrect scope definitions, as the
> following:
> 
> [Example 3]
> 
> <wdr:Set>
>   <owl:intersectionOf rdf:parseType="Collection">
>     <wdr:hasHost>example.org</wdr:hasHost>
>     <wdr:hasHost>example.net</wdr:hasHost>
>   </owl:intersectionOf>
> </wdr:Set>
> 
> REs have the same problem.
> 
> By contrast, the current implicit semantics of resource set definitions
> (same properties in OR, different properties in AND), avoids errors as
> the ones in the example above. However, the problem is that such an
> approach has limited expressiveness.
> 
> For instance, Example 1 may be expressed as follows:
> 
> [Example 4]
> 
> <wdr:Set>
>   <wdr:hasHost>example.org</wdr:hasHost>
>   <wdr:hasHost>example.net</wdr:hasHost>
>   <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>   <wdr:pathStartsWith>bar</wdr:pathStartsWith>
> </wdr:Set>
> 
> By contrast, Example 2 cannot be expressed by using this approach,
> unless we support the possibility of denoting a DR scope by multiple
> resource set definitions (see Option 4A [3]):
> 
> [Example 5]
> 
> <wdr:hasScope rdf:parseType="Collection">
>   <wdr:Set>
>     <wdr:hasHost>example.org</wdr:hasHost>
>     <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>   </wdr:Set>
>   <wdr:Set>
>     <wdr:hasHost>example.net</wdr:hasHost>
>     <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>   </wdr:Set>
> </wdr:hasScope>
> 
> The problem here is that, if we wish to denote all the resources hosted
> by example.org with a path starting with foo OR ending with bar, we must
> specify redundant resource set definition:
> 
> [Example 6]
> 
> <wdr:hasScope rdf:parseType="Collection">
>   <wdr:Set>
>     <wdr:hasHost>example.org</wdr:hasHost>
>     <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>   </wdr:Set>
>   <wdr:Set>
>     <wdr:hasHost>example.org</wdr:hasHost>
>     <wdr:pathEndsWith>bar</wdr:pathEndsWith>
>   </wdr:Set>
> </wdr:hasScope>
> 
> Of course in such cases we can use REs, but not if we are using grouping
> by resource property.
> 
> However, this works with all the RDF properties we defined, except the
> following:
> 
> *RDF properties using as matching rule contains*
> 
>     Such properties may be in AND. By contrast, properties using as
> matching rule starts with, ends with, or exact cannot be in AND (e.g.,
> the same string cannot start with two different substrings).
> 
>     Possible solutions:
> 
>        1. do not support RDF properties using contains as matching rule:
> use wdr:hasUri for them
>        2. support RDF properties using contains as matching rule, use
> them in case the should be in OR, but use wdr:hasUri in case the should
> be in AND.
>        3. support the possibility of specifying a set of patterns as
> object of the RDF property, to be considered in AND:
> 
>           <wdr:pathContains>
>             <wdr:pattern>foo</wdr:pattern>
>             <wdr:pattern>bar</wdr:pattern>
>           </wdr:pathContains>
> 
>        4. define a variant for such RDF properties
> (wdr:pathAlsoContains), which will be in AND with the base one
> (wdr:pathContains)
> 
>     Note that solution 1 and 2 work only when resources are grouped by
> address, but not by property, whereas solution 3 applies also to the latter.
> 
>     By contrast, solution 4 works only in case the conjuncts are two; in
> fact
> 
>     <wdr:pathContains>foo</wdr:pathContains>
>     <wdr:pathAlsoContains>bar</wdr:pathAlsoContains>
> 
>     means (foo AND bar), whereas
> 
>     <wdr:pathContains>foo</wdr:pathContains>
>     <wdr:pathAlsoContains>bar</wdr:pathAlsoContains>
>     <wdr:pathAlsoContains>boo</wdr:pathAlsoContains>
> 
>     means (foo AND (bar OR boo)), and not (foo AND bar AND boo).
> 
>     Moreover in the following case
> 
>     <wdr:pathContains>foo</wdr:pathContains>
>     <wdr:pathContains>bar</wdr:pathContains>
>     <wdr:pathAlsoContains>boo</wdr:pathAlsoContains>
> 
>     how can we interpret this? wdr:pathAlsoContains is in AND with the
> former wdr:pathContains, with the latter, or with both?
> 
> *Negative RDF properties*
> 
>     Also such properties may be in AND, independently from the matching
> rule. In fact, if we say that a path should not start with foo, we can
> also say that it must not start with bar. But is it really reasonable to
> state that a path should not start with (end with, exact) foo OR should
> not start with (end with, exact) bar? Such constraints will be NOT
> satisfied only when a path starts with (ends with, exact) both foo AND
> bar, which is not possible. In fact, formally, the constraints:
> 
>     <wdr:pathNotStartsWith>foo</wdr:pathNotStartsWith>
>     <wdr:pathNotStartsWith>bar</wdr:pathNotStartsWith>
> 
>     correspond to
> 
>         ¬(pathStartsWith="foo") ∨ ¬(pathStartsWith="bar")
> 
>     which is equivalent to
> 
>         ¬(pathStartsWith="foo" ∧ pathStartsWith="bar")
> 
>     a statement which will be always true, since, as said previously, a
> path cannot start with both foo and bar.
> 
>     The problem here is, again, with contains. A constraint stating that
> a path must not contain foo OR bar is NOT satisfied when a path contains
> both foo AND bar, which is perfectly possible.
> 
>     This issue can be solved by allowing in a resource set definition
> just one instance of RDF properties with matching rule starts with, ends
> with, exact, and one or more instances of RDF properties with matching
> rule contains. This can be obtained by using proper cardinality
> constraints in the definition of wdr:Set.
> 
>     Moreover, we can support constraints stating that a path should not
> start with (end with, exact) foo OR bar by adopting, for instance,
> solution 3 above, and assuming that, when used "inside" negative
> properties, multiple instances of wdr:pattern are in OR. In fact, a
> constraint
> 
>     <wdr:pathNotStartsWith>
>       <wdr:pattern>foo</wdr:pattern>
>       <wdr:pattern>bar</wdr:pattern>
>     </wdr:pathNotStartsWith>
> 
>     corresponds to
> 
>         ¬(pathStartsWith="foo" ∨ pathStartsWith="bar")
> 
>     which is equivalent to
> 
>         ¬(pathStartsWith="foo") ∧ ¬(pathStartsWith="bar")
> 
>     a statement which will be true when a path does not start with foo
> OR bar.
> 
>     Of course, we can adopt for this purpose also solutions 1 and 2
> (i.e., REs), but they do not apply when resources are grouped base on
> their properties.
> 
>     Alternative (and equivalent) solution: negative properties are
> always in AND, even though they are instances of the same property; also
> instances of wdr:pattern are in AND. In such a case we have no
> constraint on the number of instances of RDF properties with matching
> rule starts with, ends with, exact, whereas wdr:pattern can be used only
> "inside" negative properties with matching rule contains. Is it more
> intuitive this strategy or the former?
> 
> *RDF property wdr:hasProperty (wdr:hasNotProperty)*
> 
>     The wdr:hasProperty (wdr:hasNotProperty) is used when a scope is
> denoted in terms of resource properties, which can be either in AND or OR.
> 
>     Also here the implicit approach can be applied (multiple instances
> of wdr:hasProperty (wdr:hasNotProperty) in the same resource set
> definition are in OR (AND), instances of wdr:hasProperty and
> wdr:hasNotProperty in the same resource set definition are in AND).
> 
>     The problem is whether this principle should be extended also to
> resource properties
> 
>     For instance, consider a resource set definition denoting all the
> T-Shirts which are either blue or red. By using the same implicit
> semantics of wdr:hasProperty (wdr:hasNotProperty), this can be expressed
> as follows:
> 
>     <wdr:Set>
>       <wdr:hasProperty>
>         <ex:cloth>t-shirt</ex:cloth>
>         <ex:colour>blue</ex:colour>
>         <ex:colour>red</ex:colour>
>       </wdr:hasProperty>
>     </wdr:Set>
> 
>     Similarly, a resource set definition denoting all the T-Shirts which
> are not blue or red (i.e., ¬(blue ∨ red) = ¬blue ∧ ¬red) can be
> expressed as follows:
> 
>     <wdr:Set>
>       <wdr:hasProperty>
>         <ex:cloth>t-shirt</ex:cloth>
>       </wdr:hasProperty>
>       <wdr:hasNotProperty>
>         <ex:colour>blue</ex:colour>
>         <ex:colour>red</ex:colour>
>       </wdr:hasNotProperty>
>     </wdr:Set>
> 
>     It is to be investigated whether implicit semantics covers all the
> possible resource set definitions using grouping by property. Of course,
> owl:intersectionOf, owl:unionOf, and owl:complementOf do.
> 
> *To conclude*
> 
> The strategy based on explicit semantics (wdr:intersectionOf,
> wdr:unionOf, wdr:complementOf) is the more flexible and expressive, but
> it is error-prone; the strategy based on implicit semantics is safer,
> but we may have problems in finding the best solution when we have to
> express complex resource set definitions. Should we support both, as we
> support strings and REs—i.e., implicit semantics for "normal" resource
> set definitions, explicit semantics for more complex ones?
> 
> 
> [1]http://www.w3.org/2007/powder/powder-grouping/conjunction
> [2]http://www.w3.org/2007/powder/powder-grouping/conjunction#option1
> [3]http://www.w3.org/2007/powder/powder-grouping/conjunction#option4a
> [4]http://www.w3.org/TR/2004/REC-owl-ref-20040210/#Boolean
> 

Received on Monday, 14 May 2007 12:11:45 UTC