- From: Phil Archer <parcher@icra.org>
- Date: Tue, 22 May 2007 16:27:19 +0100
- To: Public POWDER <public-powderwg@w3.org>
Right, after a while away from this issue, here we are again, looking at the conjunction document [1]. It feels as if we could spend an entire face to face meeting discussing this so let's see if we can avoid that! In recent posts, Andrea has been arguing for the implicit semantics of option 1 so that our example of encoding "everything on example.com OR example.org with a path containing foo OR bar" would be written as at [2]. I agree with Andrea in so far as if we want to express relatively complex things then that's probably going to take some relatively complex code. I just want to keep it as simple as possible (of course!). I also believe it is very much in our interests to reduce the opportunity for the data we create in POWDER to be misused. In particular, I think it generally a good thing to close off Resource Set definitions so that you can't publish further triples whose provenance needs to be taken into account before deciding whether to use them or not. Where I disagree with Andrea is that the implicit semantics of [2] are the least worst option. I really don't like the idea that if you have two of a given property then you combine them with OR but different properties are combined with AND. It just sounds too woolly and error prone to me. And how would we encode those rules? Limiting the cardinality of the various RDF properties is easy with OWL Lite. Thus I generally favour option 3 [3] in which we give a list of values as the value of the various RDF properties. Maybe a change in name of those properties might help clarify thinking. How about this: <wdr:ResourceSet> <wdr:hasAnyHostFrom>example.com example.org</wdr:hasAnyHostFrom> <wdr:pathContainsAnyOf>foo bar</wdr:pathContainsAnyOf> </wdr:ResourceSet> This is, again, a white space separated list but the altered RDF property name makes it easier to read. We might consider defining 'list' versions of the RDF properties we have so that the ones we have now (hasHost, hasScheme etc.) remain as they are taking a single value, but additional properties would take lists - but this seems overly redundant since a list of length 1, such as <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> is valid. So to recap, this gives us the advantage of being able to limit cardinality of each of our set definition properties to 0 or 1 (adding to security). Each of these properties would be combined with logical AND. Andrea makes good points about negation. Since this: (($host !~ /example.org) || ($host !~ /example.net/)) is always true - a classic DeMorgan trap I think. So again, maybe a change of RDF property name can help. How about this <wdr:ResourceSet> <wdr:hasAnyHostFrom>example.org example.com</wdr:hasAnyHostFrom> <wdr:hasNotAnyHostFrom>search.example.org shopping.example.com </wdr:hasNotAnyHostFrom> </wdr:ResourceSet> This translates as "if the host IS ANY of these but NOT ANY of these, then it's in the Resource Set." Lists only take us so far. Again, referring to Andrea's comments, what about anything on example.org with a path beginning with foo OR bar and resources on example.com with a path beginning with bar (only). White space separated lists won't get us out of this - we need to use something like owl:unionOf. OK, let's actually use owl:unionOf. Notice that owl:unionOf is a property, not a Class, therefore, Andrea's code needs a little tweaking to give this: 1 <wdr:ResourceSet> 2 <owl:unionOf rdf:parseType="Collection"> 3 <wdr:ResourceSet> 4 <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> 5 <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf> 6 </wdr:ResourceSet> 7 <wdr:ResourceSet> 8 <wdr:hasAnyHostFrom>example.net</wdr:hasAnyHostFrom> 9 <wdr:pathStartsWithAnyOf>bar</wdr:pathStartsWithAnyOf> 10 </wdr:ResourceSet> 11 </owl:unionOf> 12 </wdr:ResourceSet> We have two Resource Sets here (which are Classes) and we use the owl:unionOf predicate to create the union. More complex examples are possible but given that we're supporting regular expressions, and, if my line of argument holds, white space separated lists, the likelihood of a more complex Resource Set definition than that shown here seems remote - at least for the use cases under our consideration. This retains the closed world objective. RDF Collections are closed world - but I admit it's not clear to me how the constraint that a Resource Set can have a sub set if it's the subject of an owl:unionOf, intersectionOf or owl:complementOf predicate. Incidentally, using these set operators puts us firmly in OWL DL, not OWL Lite (and, if I understand it correctly, nested set operators might take us into OWL Full so they should be strongly discouraged). So I think we're building up a picture here. If you want to define a set simple as 'everything on example.com' (which remains the most likely scenario for our use cases) then you can do it really easily <wdr:ResourceSet> <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> </wdr:ResourceSet> If you want something a little more complicated - like multiple hosts - put them in a white space separated list. If you need to create slightly more complex but still relatively simple RS definitions that include multiple elements then that's possible too, as we've seen with the original example.com/org plus foo/bar example. We can define even more complex sets where we have (multiple definitions) OR (other multiple definitions) using OWL set operators. And if that isn't enough, you can always use a Regular Expression. Actually, there's a thought, can you (meaningfully) have a white space separated list of regular expressions?? probably not - so that's one of our RDF properties that can only have a single value. What about conjunctions of resources grouped by property? The group hasn't discussed this yet, but if we go with my current proposal, below, then how will that affect things? Here's an RS definition for 'all resources on example.org that are in French. <wdr:Set> <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> <wdr:resourcesWith rdf:parseType="Resource"> <ex:lang>fr</ex:lang> </wdr:resourcesWith > <wdr:hasPropLookUp> <wdr:PropLookUp> <wdr:lookUpURI>$cURI</wdr:lookUpURI> <wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" /> <wdr:responseContains>Content-Language: fr</wdr:responseContains> </wdr:PropLookUp> </wdr:hasPropLookUp> </wdr:Set> So this says that the language must be French and the way to find out whether it is or not is to do a Head request to $cURI (the candidate resource's URI) and see if you get a header back that says "Content-Language: fr". Can we use a white space separated list here? Sometimes, would be the answer, I guess. Imagine we wanted to define a set as all resources on example.org in French OR German. Try this: <wdr:Set> <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> <wdr:resourcesWith rdf:parseType="Resource"> <ex:lang>fr de</ex:lang> </wdr:resourcesWith > <wdr:hasPropLookUp> <wdr:PropLookUp> <wdr:lookUpURI>$cURI</wdr:lookUpURI> <wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" /> <wdr:responseContains>"Content-Language: fr" "Content-Language: de"</wdr:responseContains> </wdr:PropLookUp> </wdr:hasPropLookUp> </wdr:Set> I've had to quote the list elements in the responseContains property but I don't think it's unusual to require quoting of strings if they are to include white space! By way of an apology for the length of this post, let me summarise. 1. I don't like implied semantics and think we can do better. 2. We must surely accept complexity where complexity is being expressed 3. Complexity should be as scarce as the use cases that demand it 4. Changing the property names can make it clear (to humans) that the value is a list 5. REs are supported anyway so they're always available for people who prefer them (like me) 6. We can use OWL set operators where we need a union of otherwise separate sets. 7. The multi-layered approach to conjunction can work just as well for RS definitions by property, notwithstanding the need to support quoted strings so that they can include white space. Depending on your feedback, I'd like to write this up in the doc so it can be presented properly. I would, however, like to include the XML-based approach in the doc [4] as an alternative to all this. Its principal attraction, for me, flows from the following argument: It is likely that a generic RDF processor will be able to handle all aspects of a DR, without modification, except the Resource Set. Since the data in an RS definition needs to be handled slightly differently, it does seem to be logical to make that explicit by quoting an XML Literal within the RDF graph (which is what the pre-defined RDF datatype of XML Literal is designed to allow you to do). Its principal problem, IMHO, is that the definition of something as simple as 'everything on example.org' should not require running a separate XML parser/XPath query. I reckon we really need to see some SPARQL queries against the RS data examples to settle this one?? Cheers Phil. [1] http://www.w3.org/2007/powder/powder-grouping/conjunction [2] http://www.w3.org/2007/powder/powder-grouping/option1.rdf and http://www.w3.org/2007/powder/powder-grouping/option1.png [3] http://www.w3.org/2007/powder/powder-grouping/option3.rdf and http://www.w3.org/2007/powder/powder-grouping/option3.png [4] http://www.w3.org/2007/powder/powder-grouping/conjunction#option6 Phil Archer wrote: > > A few small comments inline below > > Andrea Perego wrote: >> Hi, Phil. >> >>> [snip] >>> >>> In your discussion, you suggest 4 possible solutions to the pathContains >>> issue. The complexities get more severe when we get into negatives and, >>> from my perspective, we're getting a long way away from a design >>> fundamental of simplicity with the real possibility that a >>> semi-technically minded person could write a set definition by hand if >>> necessary. >> >> I think here we should consider if and why we should support negation. >> It is not just to support as much flexibility as possible. As was >> reported in a previous version of the grouping document, negation is >> useful in order to simplify the specification of a scope by also >> supporting exceptions. >> >> Suppose, for instance, that a given DR applies to a set of hosts >> my.example.org, your.example.org, his.example.org, her.example.org, >> our.example.org, but not to their.example.org. >> >> If negation is not supported, the scope of the DR must be specified as >> follows: >> >> <wdr:Set> >> <wdr:hasHost>my.example.org</wdr:hasHost> >> <wdr:hasHost>your.example.org</wdr:hasHost> >> <wdr:hasHost>her.example.org</wdr:hasHost> >> <wdr:hasHost>his.example.org</wdr:hasHost> >> <wdr:hasHost>our.example.org</wdr:hasHost> >> </wdr:Set> >> >> otherwise, if a wdr:hasNotHost property is available, we can reduce the >> specification to >> >> <wdr:Set> >> <wdr:hasHost>example.org</wdr:hasHost> >> <wdr:hasNotHost>their.example.org</wdr:hasNotHost> >> </wdr:Set> >> >> So the issue here, is to find a way of supporting negation in a safe and >> possibly `intuitive' way. > > I am certain that negation should be included and your example seems > entirely intuitive to me. If, starting from the most significant > portion, the resource is on the example.org domain AND is NOT on > their.example.org, then it's in the Set. Easy. > > [snip] >> >>> [snip] NB. use of intersectionOf and unionOf requires OWL >>> DL, not OWL Lite - which gets us into more specialised inference >>> engines. >> >> And, consequently, we may have undecidable resource set definitions >> (which is not a nice thing). The solution based on implicit semantics >> (if resolved properly) is safe also with respect to this issue. > > Actually, no, it's OWL Full that does that. OWL DL is closed world (just > more complicated than OWL Lite). > >> >>> [snip: implicit conjunction inside a resource set definition - >>> wdr:hosHostList property] >> >> I don't completely agree. >> >> If we assume that all properties in a wdr:Set are always in end, saying >> "all the resources hosted by example.org and a path starting with foo or >> bar," will require two redundant resource set definitions: >> >> <wdr:Set> >> <wdr:hasHost>example.org</wdr:hasHost> >> <wdr:pathStartsWith>foo</wdr:pathStartsWith> >> </wdr:Set> >> >> <wdr:Set> >> <wdr:hasHost>example.org</wdr:hasHost> >> <wdr:pathStartsWith>bar</wdr:pathStartsWith> >> </wdr:Set> >> >> As you notice, this redundancy increases when we are talking of hosts, >> and not of path patterns, but I think that the need itself of repeating >> the same statement is far from being intuitive. >> >> I agree that it is preferable to combine *by default* all the properties >> in a resource set definition with the same Boolean operator, but the >> solution you propose has several drawbacks in terms of expressiveness. >> >> In other words, if we support AND (implicitly), we must support also OR >> (explicitly) inside a resource set definition. > > Which brings us back to owl:unionOf and example 2A? > >> About the solutions to be >> used for this, I'm not comfortable with space separated lists as object >> of RDF properties (in such a case why not using a RE? we have just to >> substitute a blank space with a `|'). Also, we are forgetting here >> grouping by property. I'm not sure that the considerations above apply >> also to them. > > I think these do apply to grouping by resource property. If the resource > property in question is colour then you can have a white space separated > list of colours. And I agree on the white space or | issue. But we're > trying to find an alternative to using REs for those who don't like them > and that is less error prone (noting that REs are always going to be > supported). > >> >> In other words, I'm for using RDF to express this. Of course, it may be >> verbose, not necessarily human-friendly, and require a lot processing. >> This is why I consider the `original' implicit semantics of resource set >> definitions (i.e., same properties in OR, different properties in AND) >> preferable, even though it is not formally sound. > > OK, I misunderstood your thinking. I thought you were opposed to option > 1. Ah well. > > Phil > > > -- Phil Archer Chief Technical Officer, Family Online Safety Institute t. +44 (0)1473 434770 Skype: philarcher w. http://www.fosi.org/people/philarcher/ Already labelled with ICRA? It's time to raise the bar on child protection standards by ensuring your site is ICRAchecked. See http://checked.icra.org/ for more info.
Received on Tuesday, 22 May 2007 15:51:04 UTC