- From: Phil Archer <parcher@icra.org>
- Date: Mon, 14 May 2007 13:11:36 +0100
- To: Public POWDER <public-powderwg@w3.org>
Thanks very much Andrea for spending a lot of thinking time on this. I'm feeling a lot clearer on the problems and, I know you'll be pleased to hear, coming round very much to your initial point of view that we should support multiple scope/set statements in a DR - i.e. option 4A in the discussion doc [PA1]. In your discussion, you suggest 4 possible solutions to the pathContains issue. The complexities get more severe when we get into negatives and, from my perspective, we're getting a long way away from a design fundamental of simplicity with the real possibility that a semi-technically minded person could write a set definition by hand if necessary. Also, forgive me, but whilst all your examples are valid XML, they're not all valid RDF since there's always the need for Class -> property -> Literal/Class etc. (RDF heads like to talk about 'striping'). I take your point entirely that, since OWL defines unionOf, intersectionOf etc. that, if we're going to use anything like this, we should use exactly the OWL properties - but they are properties, not classes. I've re-created the original option 2 [PA2] as option 2A. The RDF/XML and the graph are available at [PA3] and [PA4] respectively. I won't bother pasting the RDF/XML in here - it's just too HORRIBLE!!! That said, it _is_ perfectly legitimate OWL DL using just our limited number of RDF properties so we are not, indeed cannot, preclude such an approach being taken. NB. use of intersectionOf and unionOf requires OWL DL, not OWL Lite - which gets us into more specialised inference engines. One thing I noticed in my quick re-reading of the OWL specs, was the line "...Each of the immediate contained expressions in the class definition further restricts the instances of the defined class. Instances of the class belong to the intersection of the restrictions" [PA5]. That is, there is precedent for an implication of combining properties with AND. So the set: <wdr:Set> <wdr:hasHost>example.org</wdr:hasHost> <wdr:pathStartsWith>foo</wdr:pathStartsWith> </wdr:Set> is a neat and simple way of defining "all resources on example.org with a path beginning with foo" that is likely to be widely understood. We can use OWL Lite to limit cardinality to 0 or 1. So the issue is how to handle unions - example.org or example.net where the path is foo or bar. Well, option 4A shows how that can be done - with a closed world Collection of Sets. A processor works through them until it either finds a match or comes to the end. I'm happy with that approach in that I believe that the number of instances where content providers want to say something even as complex as the example.or/net plus foo/bar example will be very small. With one exception... host lists. It is very common for people to own multiple domain names and, since their content creation is likely to be carried out under a unified policy, for all the content to have a similar description. So I do feel we need to make a special case here somehow. My suggestion for this is that we define an RDF property of hasHostList which, guess what, takes a list of host names. <wdr:Set> <wdr:hasHostList>example.org example.net</wdr:hasHost> </wdr:Set> The maxCardinality would, again, be 1. The only problem now is how to do we say (in code) "if you have the property hasHostList you cannot also have hasHost? I think a clever bit of OWL might do this but I'm not sure. (or maybe we say that hasHost alone takes a white space separated list??) So let me posit an option 4B: <wdr:WDR> <wdr:hasScope rdf:parseType="Collection"> <wdr:Set> <wdr:hasHostList>example.org example.net</wdr:hasHostList> <wdr:pathStartsWith>foo</wdr:pathStartsWith> </wdr:Set> <wdr:Set> <wdr:hasHostList>example.org example.net</wdr:hasHostList> <wdr:pathStartsWith>bar</wdr:pathStartsWith> </wdr:Set> </wdr:hasScope> <wdr:hasDescription rdf:resource="http://www.example.org/foo" /> </wdr:WDR> RDF is at [PA6], Graph is at [PA7]. Two things bother me about this: 1. It makes a special case so that we can have a list of things because it's convenient. All the arguments about the value of a property having structure come back into play. 2. The host list needs to be repeated in each Set. I've tried various ways to get around this using something like the Host Restrictions idea from RDF-CL but we end up with things like open lists, structured values or both - it's not looking pretty. My hope is that when we get on to talk about DR packages we can devise some sort of inhertiance system but I think that should not be part of the grouping of resourcs doc. So, in summary, I think my feelings on this are: Option 1 - bad because of the reasons given in the doc: it is very woolly and we can't use OWL cardinality contraints. The idea of "combine these elemetns in the list with AND and these elements in the same list with OR"is just not good enough. Option 2. Actually, option 2A. This uses existing OWL DL properties to produce unions and intersections but doesn't create any new properties. Good. We can use cardinality constraints. Again, good. It's v. complicated - Bad. But we can't stop people using it if they want to - we just got out of jail free. Option 3. Everything has a white space separated list. Andrea's point about it not being possible to say example.org/foo OR example.net/bar is well made. We can't use this as simply as option 3 sets out. Option 4 - modified to option 4B. This has a closed list of Sets, we can put cardinality constraints on everything. Some drawbacks - repetition of host list, use of collections entails more processing. But, right now it's one of my favourites. Option 5. (Regular Expressions) Always possible so it's always there. Option 6. Is my other favourite. We have to break out of RDF to derive the description of a given resource. By giving an XML literal we make this explicit. I want to play with some XPath queries on Option 6 against SPARQL queries for option 4B to see whether this option is really as good as Jo says it is (I think it might be). Option 7. Andrea's done a good destruction job on this - ignore. Enough for now. Phil. [PA1] http://www.w3.org/2007/powder/powder-grouping/conjunction#option4a [PA2] http://www.w3.org/2007/powder/powder-grouping/conjunction#option2 [PA3] http://www.fosi.org/projects/powder/option2a.rdf [PA4] http://www.fosi.org/projects/powder/option2a.png [PA5] See towards the end of section 3.1.1 of the OWL Guide http://www.w3.org/TR/2004/REC-owl-guide-20040210/#DefiningSimpleClasses [PA6] http://www.fosi.org/projects/powder/option4b.rdf [PA7] http://www.fosi.org/projects/powder/option4b.png Andrea Perego wrote: > Following the document prepared by Phil [1], some notes about the > conjunction issue which may turn useful in order to decide the best option. > > NB: I don't consider separately all the options in [1], but just two > ones (corresponding to a variant of Option 2 [2] and to Option 4A [3]) > as examples of the two proposed strategies, namely, explicit and > implicit semantics, respectively. > > The purpose of a resource set definition is to denote a set of resources > in a both safe and flexible way. "Safe" means that it must not be > error-prone; "flexible", that it should be possible to combine > constraints in possibly nested formulas using the ∧ (AND), ∨ (OR), and ¬ > (NOT) Boolean operators. > > In other words, resource set definitions should be an abstraction layer > wrt already existing formal tools, which prevents errors but does not > limit their expressiveness. > > Conjunction, disjunction, and negation are supported by corresponding > OWL properties (respectively, owl:intersectionOf, owl:unionOf, > owl:complementOf [4]). By using OWL properties, a resource set > definition denoting the set of resources hosted by machines with name > ending with example.org and example.net, where the path component of > their URI starts with foo or bar, can be expressed as follows: > > [Example 1] > > <wdr:Set> > <owl:intersectionOf rdf:parseType="Collection"> > <owl:unionOf rdf:parseType="Collection"> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:hasHost>example.net</wdr:hasHost> > </owl:unionOf> > <owl:unionOf rdf:parseType="Collection"> > <wdr:pathStartsWith>foo</wdr:pathStartsWith> > <wdr:pathStartsWith>bar</wdr:pathStartsWith> > </owl:unionOf> > </owl:intersectionOf> > </wdr:Set> > > whereas, if we wish to denote all the resources hosted either by > example.org, where the path component of their URIs starts with foo, or > example.net, where the path component of their URIs starts with bar, the > scope will be defined as follows: > > [Example 2] > > <wdr:Set> > <owl:unionOf rdf:parseType="Collection"> > <owl:intersectionOf rdf:parseType="Collection"> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:pathStartsWith>foo</wdr:pathStartsWith> > </owl:intersectionOf> > <owl:intersectionOf rdf:parseType="Collection"> > <wdr:hasHost>example.net</wdr:hasHost> > <wdr:pathStartsWith>bar</wdr:pathStartsWith> > </owl:intersectionOf> > </owl:unionOf> > </wdr:Set> > > Also more complex, nested, formulas can be expressed this way. > > NB: if this solution is adopted, negative RDF properties should > expressed by using owl:complementOf, and not by specific properties. > > The problem here is that AND/OR/NOT can be combined with no constraint, > and this means that we may have incorrect scope definitions, as the > following: > > [Example 3] > > <wdr:Set> > <owl:intersectionOf rdf:parseType="Collection"> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:hasHost>example.net</wdr:hasHost> > </owl:intersectionOf> > </wdr:Set> > > REs have the same problem. > > By contrast, the current implicit semantics of resource set definitions > (same properties in OR, different properties in AND), avoids errors as > the ones in the example above. However, the problem is that such an > approach has limited expressiveness. > > For instance, Example 1 may be expressed as follows: > > [Example 4] > > <wdr:Set> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:hasHost>example.net</wdr:hasHost> > <wdr:pathStartsWith>foo</wdr:pathStartsWith> > <wdr:pathStartsWith>bar</wdr:pathStartsWith> > </wdr:Set> > > By contrast, Example 2 cannot be expressed by using this approach, > unless we support the possibility of denoting a DR scope by multiple > resource set definitions (see Option 4A [3]): > > [Example 5] > > <wdr:hasScope rdf:parseType="Collection"> > <wdr:Set> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:pathStartsWith>foo</wdr:pathStartsWith> > </wdr:Set> > <wdr:Set> > <wdr:hasHost>example.net</wdr:hasHost> > <wdr:pathStartsWith>bar</wdr:pathStartsWith> > </wdr:Set> > </wdr:hasScope> > > The problem here is that, if we wish to denote all the resources hosted > by example.org with a path starting with foo OR ending with bar, we must > specify redundant resource set definition: > > [Example 6] > > <wdr:hasScope rdf:parseType="Collection"> > <wdr:Set> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:pathStartsWith>foo</wdr:pathStartsWith> > </wdr:Set> > <wdr:Set> > <wdr:hasHost>example.org</wdr:hasHost> > <wdr:pathEndsWith>bar</wdr:pathEndsWith> > </wdr:Set> > </wdr:hasScope> > > Of course in such cases we can use REs, but not if we are using grouping > by resource property. > > However, this works with all the RDF properties we defined, except the > following: > > *RDF properties using as matching rule contains* > > Such properties may be in AND. By contrast, properties using as > matching rule starts with, ends with, or exact cannot be in AND (e.g., > the same string cannot start with two different substrings). > > Possible solutions: > > 1. do not support RDF properties using contains as matching rule: > use wdr:hasUri for them > 2. support RDF properties using contains as matching rule, use > them in case the should be in OR, but use wdr:hasUri in case the should > be in AND. > 3. support the possibility of specifying a set of patterns as > object of the RDF property, to be considered in AND: > > <wdr:pathContains> > <wdr:pattern>foo</wdr:pattern> > <wdr:pattern>bar</wdr:pattern> > </wdr:pathContains> > > 4. define a variant for such RDF properties > (wdr:pathAlsoContains), which will be in AND with the base one > (wdr:pathContains) > > Note that solution 1 and 2 work only when resources are grouped by > address, but not by property, whereas solution 3 applies also to the latter. > > By contrast, solution 4 works only in case the conjuncts are two; in > fact > > <wdr:pathContains>foo</wdr:pathContains> > <wdr:pathAlsoContains>bar</wdr:pathAlsoContains> > > means (foo AND bar), whereas > > <wdr:pathContains>foo</wdr:pathContains> > <wdr:pathAlsoContains>bar</wdr:pathAlsoContains> > <wdr:pathAlsoContains>boo</wdr:pathAlsoContains> > > means (foo AND (bar OR boo)), and not (foo AND bar AND boo). > > Moreover in the following case > > <wdr:pathContains>foo</wdr:pathContains> > <wdr:pathContains>bar</wdr:pathContains> > <wdr:pathAlsoContains>boo</wdr:pathAlsoContains> > > how can we interpret this? wdr:pathAlsoContains is in AND with the > former wdr:pathContains, with the latter, or with both? > > *Negative RDF properties* > > Also such properties may be in AND, independently from the matching > rule. In fact, if we say that a path should not start with foo, we can > also say that it must not start with bar. But is it really reasonable to > state that a path should not start with (end with, exact) foo OR should > not start with (end with, exact) bar? Such constraints will be NOT > satisfied only when a path starts with (ends with, exact) both foo AND > bar, which is not possible. In fact, formally, the constraints: > > <wdr:pathNotStartsWith>foo</wdr:pathNotStartsWith> > <wdr:pathNotStartsWith>bar</wdr:pathNotStartsWith> > > correspond to > > ¬(pathStartsWith="foo") ∨ ¬(pathStartsWith="bar") > > which is equivalent to > > ¬(pathStartsWith="foo" ∧ pathStartsWith="bar") > > a statement which will be always true, since, as said previously, a > path cannot start with both foo and bar. > > The problem here is, again, with contains. A constraint stating that > a path must not contain foo OR bar is NOT satisfied when a path contains > both foo AND bar, which is perfectly possible. > > This issue can be solved by allowing in a resource set definition > just one instance of RDF properties with matching rule starts with, ends > with, exact, and one or more instances of RDF properties with matching > rule contains. This can be obtained by using proper cardinality > constraints in the definition of wdr:Set. > > Moreover, we can support constraints stating that a path should not > start with (end with, exact) foo OR bar by adopting, for instance, > solution 3 above, and assuming that, when used "inside" negative > properties, multiple instances of wdr:pattern are in OR. In fact, a > constraint > > <wdr:pathNotStartsWith> > <wdr:pattern>foo</wdr:pattern> > <wdr:pattern>bar</wdr:pattern> > </wdr:pathNotStartsWith> > > corresponds to > > ¬(pathStartsWith="foo" ∨ pathStartsWith="bar") > > which is equivalent to > > ¬(pathStartsWith="foo") ∧ ¬(pathStartsWith="bar") > > a statement which will be true when a path does not start with foo > OR bar. > > Of course, we can adopt for this purpose also solutions 1 and 2 > (i.e., REs), but they do not apply when resources are grouped base on > their properties. > > Alternative (and equivalent) solution: negative properties are > always in AND, even though they are instances of the same property; also > instances of wdr:pattern are in AND. In such a case we have no > constraint on the number of instances of RDF properties with matching > rule starts with, ends with, exact, whereas wdr:pattern can be used only > "inside" negative properties with matching rule contains. Is it more > intuitive this strategy or the former? > > *RDF property wdr:hasProperty (wdr:hasNotProperty)* > > The wdr:hasProperty (wdr:hasNotProperty) is used when a scope is > denoted in terms of resource properties, which can be either in AND or OR. > > Also here the implicit approach can be applied (multiple instances > of wdr:hasProperty (wdr:hasNotProperty) in the same resource set > definition are in OR (AND), instances of wdr:hasProperty and > wdr:hasNotProperty in the same resource set definition are in AND). > > The problem is whether this principle should be extended also to > resource properties > > For instance, consider a resource set definition denoting all the > T-Shirts which are either blue or red. By using the same implicit > semantics of wdr:hasProperty (wdr:hasNotProperty), this can be expressed > as follows: > > <wdr:Set> > <wdr:hasProperty> > <ex:cloth>t-shirt</ex:cloth> > <ex:colour>blue</ex:colour> > <ex:colour>red</ex:colour> > </wdr:hasProperty> > </wdr:Set> > > Similarly, a resource set definition denoting all the T-Shirts which > are not blue or red (i.e., ¬(blue ∨ red) = ¬blue ∧ ¬red) can be > expressed as follows: > > <wdr:Set> > <wdr:hasProperty> > <ex:cloth>t-shirt</ex:cloth> > </wdr:hasProperty> > <wdr:hasNotProperty> > <ex:colour>blue</ex:colour> > <ex:colour>red</ex:colour> > </wdr:hasNotProperty> > </wdr:Set> > > It is to be investigated whether implicit semantics covers all the > possible resource set definitions using grouping by property. Of course, > owl:intersectionOf, owl:unionOf, and owl:complementOf do. > > *To conclude* > > The strategy based on explicit semantics (wdr:intersectionOf, > wdr:unionOf, wdr:complementOf) is the more flexible and expressive, but > it is error-prone; the strategy based on implicit semantics is safer, > but we may have problems in finding the best solution when we have to > express complex resource set definitions. Should we support both, as we > support strings and REs—i.e., implicit semantics for "normal" resource > set definitions, explicit semantics for more complex ones? > > > [1]http://www.w3.org/2007/powder/powder-grouping/conjunction > [2]http://www.w3.org/2007/powder/powder-grouping/conjunction#option1 > [3]http://www.w3.org/2007/powder/powder-grouping/conjunction#option4a > [4]http://www.w3.org/TR/2004/REC-owl-ref-20040210/#Boolean >
Received on Monday, 14 May 2007 12:11:45 UTC