- From: Phil Archer <parcher@icra.org>
- Date: Wed, 23 May 2007 15:32:12 +0100
- To: Public POWDER <public-powderwg@w3.org>
- CC: Jo Rabin <jo@linguafranca.org>
Thanks very much Kevin, I really appreciate you taking time to look at this. Keeping each property value to a single item, obviating the need for list parsing, is a good benefit. The only drawback is that it means we can't use OWL cardinality to restrict the number of, say, hasPathStartsWith properties. That means that you can publish your DR and then on my server I can publish an RDF triple that says <your Resource Set's URI> wdr:hasPathStartsWith 'red' And a semantic system could pick that up and add it to your DR definition. True, the provenance of that triple can be checked, but this is what I mean by being open, as opposed to closed world. The other problem is that OWL set operators are predicates (properties) that therefore must have Classes as their value. So in fact your example would have to be written thus: <wdr:ResourceSet> <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> <owl:unionOf rdf:parseType="Collection"> <wdr:ResourceSet> <wdr:pathStartsWith>foo</wdr:pathStartsWith> </wdr:ResourceSet> <wdr:ResourceSet> <wdr:pathStartsWith>bar</wdr:pathStartsWith> </wdr:ResourceSet> </owl:unionOf> </wdr:ResourceSet> as opposed to <wdr:ResourceSet> <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf> </wdr:ResourceSet> Yes, there's more processing of the values, but that's something that an application can do in a single line normally (in Perl certainly) whereas to extract multiple values from multiple properties of multiple sets in an OWL collection - that sounds like several SPARQL queries just to get the data. That said, it wouldn't surprise me if this is the solution an RDF head would prefer. Hmmm... But... your example does perhaps point towards the XML-based solution proposed by Jo in the XG. And talking of Jo... I know he and others feel that REs are a road to confusion and error and, no doubt, in some cases that's true. As I've worked with them a bit I reckon that's the easiest way forward but, well, that's what I expect to use most of the time and I guess you would too. But we need alternative as well. Also, as Andrea is usually quick to point out, they don't work on RS defined by resource property. For all that though I'm awfully tempted to put this in IRC next time PROPOSED RESOLUTION: Conjunctions are unnecessary since Regular Expressions provide all the flexibility we need. ... but I'll keep that urge under control. We always knew this would be the hard part to resolve! Phil. Smith, Kevin, VF-Group wrote: > HI Phil, > > Good work! Some thoughts: > > There is precedent for whitespace-delimited lists in element/attribute > values, but would another option be to use owl:unionOf within the RS: > > 3 <wdr:ResourceSet> > 4 <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> > <owl:unionOf rdf:parseType="Collection"> > 5 <wdr:pathStartsWith>foo</wdr:pathStartsWith> > 5 <wdr:pathStartsWith>bar</wdr:pathStartsWith> > </owl:unionOf> > 6 </wdr:ResourceSet> > > That may be more friendly to RDF parsers (i.e. no extra string > operations needed to extract values). Not sure if that risks nested set > operators and OWL Full, as you say. > > NB I was looking at Apache rewrite rules, since they also work on > matching URIs and have a widespread following. It appears there has not > been developed a higher-level language of matching, but a use of (often > complex) REs. IMO this gives credence to the use of REs for our kind of > matching use cases. > > Overall, happy to see this written up further. > > Cheers > Kevin > > > > Kevin Smith > Technology Strategist > Vodafone Research & Development > Mobile: +44 (0)7990 798 916 > Text: +44 (0)7825 106 554 > Email: kevin.smith@vodafone.com > > Vodafone Group Services Limited > Registered Office: Vodafone House, The Connection, > Newbury, Berkshire RG14 2FN > Registered in England No 3802001/ > > -----Original Message----- > From: public-powderwg-request@w3.org > [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer > Sent: 22 May 2007 16:27 > To: Public POWDER > Subject: Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of > resource set definitions > > > Right, after a while away from this issue, here we are again, looking at > > the conjunction document [1]. > > It feels as if we could spend an entire face to face meeting discussing > this so let's see if we can avoid that! > > In recent posts, Andrea has been arguing for the implicit semantics of > option 1 so that our example of encoding "everything on example.com OR > example.org with a path containing foo OR bar" would be written as at > [2]. > > I agree with Andrea in so far as if we want to express relatively > complex things then that's probably going to take some relatively > complex code. I just want to keep it as simple as possible (of course!). > > I also believe it is very much in our interests to reduce the > opportunity for the data we create in POWDER to be misused. In > particular, I think it generally a good thing to close off Resource Set > definitions so that you can't publish further triples whose provenance > needs to be taken into account before deciding whether to use them or > not. > > Where I disagree with Andrea is that the implicit semantics of [2] are > the least worst option. I really don't like the idea that if you have > two of a given property then you combine them with OR but different > properties are combined with AND. It just sounds too woolly and error > prone to me. > > And how would we encode those rules? > > Limiting the cardinality of the various RDF properties is easy with OWL > Lite. Thus I generally favour option 3 [3] in which we give a list of > values as the value of the various RDF properties. Maybe a change in > name of those properties might help clarify thinking. How about this: > > <wdr:ResourceSet> > <wdr:hasAnyHostFrom>example.com example.org</wdr:hasAnyHostFrom> > <wdr:pathContainsAnyOf>foo bar</wdr:pathContainsAnyOf> > </wdr:ResourceSet> > > This is, again, a white space separated list but the altered RDF > property name makes it easier to read. We might consider defining 'list' > > versions of the RDF properties we have so that the ones we have now > (hasHost, hasScheme etc.) remain as they are taking a single value, but > additional properties would take lists - but this seems overly > redundant since a list of length 1, such as > <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> is valid. > > So to recap, this gives us the advantage of being able to limit > cardinality of each of our set definition properties to 0 or 1 (adding > to security). Each of these properties would be combined with logical > AND. > > Andrea makes good points about negation. Since this: > > (($host !~ /example.org) || ($host !~ /example.net/)) > > is always true - a classic DeMorgan trap I think. So again, maybe a > change of RDF property name can help. How about this > > <wdr:ResourceSet> > <wdr:hasAnyHostFrom>example.org example.com</wdr:hasAnyHostFrom> > <wdr:hasNotAnyHostFrom>search.example.org shopping.example.com > </wdr:hasNotAnyHostFrom> > </wdr:ResourceSet> > > This translates as "if the host IS ANY of these but NOT ANY of these, > then it's in the Resource Set." > > Lists only take us so far. Again, referring to Andrea's comments, what > about anything on example.org with a path beginning with foo OR bar and > resources on example.com with a path beginning with bar (only). White > space separated lists won't get us out of this - we need to use > something like owl:unionOf. > > OK, let's actually use owl:unionOf. > > Notice that owl:unionOf is a property, not a Class, therefore, Andrea's > code needs a little tweaking to give this: > > 1 <wdr:ResourceSet> > 2 <owl:unionOf rdf:parseType="Collection"> > > 3 <wdr:ResourceSet> > 4 <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> > 5 <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf> > 6 </wdr:ResourceSet> > > 7 <wdr:ResourceSet> > 8 <wdr:hasAnyHostFrom>example.net</wdr:hasAnyHostFrom> > 9 <wdr:pathStartsWithAnyOf>bar</wdr:pathStartsWithAnyOf> > 10 </wdr:ResourceSet> > > 11 </owl:unionOf> > 12 </wdr:ResourceSet> > > We have two Resource Sets here (which are Classes) and we use the > owl:unionOf predicate to create the union. More complex examples are > possible but given that we're supporting regular expressions, and, if my > > line of argument holds, white space separated lists, the likelihood of a > > more complex Resource Set definition than that shown here seems remote - > > at least for the use cases under our consideration. > > This retains the closed world objective. RDF Collections are closed > world - but I admit it's not clear to me how the constraint that a > Resource Set can have a sub set if it's the subject of an owl:unionOf, > intersectionOf or owl:complementOf predicate. Incidentally, using these > set operators puts us firmly in OWL DL, not OWL Lite (and, if I > understand it correctly, nested set operators might take us into OWL > Full so they should be strongly discouraged). > > So I think we're building up a picture here. > > If you want to define a set simple as 'everything on example.com' (which > > remains the most likely scenario for our use cases) then you can do it > really easily > > <wdr:ResourceSet> > <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> > </wdr:ResourceSet> > > If you want something a little more complicated - like multiple hosts - > put them in a white space separated list. > > If you need to create slightly more complex but still relatively simple > RS definitions that include multiple elements then that's possible too, > as we've seen with the original example.com/org plus foo/bar example. > > We can define even more complex sets where we have (multiple > definitions) OR (other multiple definitions) using OWL set operators. > > And if that isn't enough, you can always use a Regular Expression. > Actually, there's a thought, can you (meaningfully) have a white space > separated list of regular expressions?? probably not - so that's one of > our RDF properties that can only have a single value. > > What about conjunctions of resources grouped by property? The group > hasn't discussed this yet, but if we go with my current proposal, below, > > then how will that affect things? > > Here's an RS definition for 'all resources on example.org that are in > French. > > <wdr:Set> > <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> > > <wdr:resourcesWith rdf:parseType="Resource"> > <ex:lang>fr</ex:lang> > </wdr:resourcesWith > > > <wdr:hasPropLookUp> > <wdr:PropLookUp> > <wdr:lookUpURI>$cURI</wdr:lookUpURI> > <wdr:method > rdf:resource="http://www.w3.org/2006/http#HeadRequest" /> > <wdr:responseContains>Content-Language: fr</wdr:responseContains> > </wdr:PropLookUp> > </wdr:hasPropLookUp> > > </wdr:Set> > > So this says that the language must be French and the way to find out > whether it is or not is to do a Head request to $cURI (the candidate > resource's URI) and see if you get a header back that says > "Content-Language: fr". > > Can we use a white space separated list here? Sometimes, would be the > answer, I guess. Imagine we wanted to define a set as all resources on > example.org in French OR German. Try this: > > <wdr:Set> > <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom> > > <wdr:resourcesWith rdf:parseType="Resource"> > <ex:lang>fr de</ex:lang> > </wdr:resourcesWith > > > <wdr:hasPropLookUp> > <wdr:PropLookUp> > <wdr:lookUpURI>$cURI</wdr:lookUpURI> > <wdr:method > rdf:resource="http://www.w3.org/2006/http#HeadRequest" /> > <wdr:responseContains>"Content-Language: fr" > "Content-Language: de"</wdr:responseContains> > </wdr:PropLookUp> > </wdr:hasPropLookUp> > > </wdr:Set> > > I've had to quote the list elements in the responseContains property but > > I don't think it's unusual to require quoting of strings if they are to > include white space! > > By way of an apology for the length of this post, let me summarise. > > 1. I don't like implied semantics and think we can do better. > 2. We must surely accept complexity where complexity is being expressed > 3. Complexity should be as scarce as the use cases that demand it > 4. Changing the property names can make it clear (to humans) that the > value is a list > 5. REs are supported anyway so they're always available for people who > prefer them (like me) > 6. We can use OWL set operators where we need a union of otherwise > separate sets. > 7. The multi-layered approach to conjunction can work just as well for > RS definitions by property, notwithstanding the need to support quoted > strings so that they can include white space. > > Depending on your feedback, I'd like to write this up in the doc so it > can be presented properly. I would, however, like to include the > XML-based approach in the doc [4] as an alternative to all this. > > Its principal attraction, for me, flows from the following argument: It > is likely that a generic RDF processor will be able to handle all > aspects of a DR, without modification, except the Resource Set. Since > the data in an RS definition needs to be handled slightly differently, > it does seem to be logical to make that explicit by quoting an XML > Literal within the RDF graph (which is what the pre-defined RDF datatype > > of XML Literal is designed to allow you to do). > > Its principal problem, IMHO, is that the definition of something as > simple as 'everything on example.org' should not require running a > separate XML parser/XPath query. I reckon we really need to see some > SPARQL queries against the RS data examples to settle this one?? > > Cheers > > Phil. > > > [1] http://www.w3.org/2007/powder/powder-grouping/conjunction > > [2] http://www.w3.org/2007/powder/powder-grouping/option1.rdf and > http://www.w3.org/2007/powder/powder-grouping/option1.png > > [3] http://www.w3.org/2007/powder/powder-grouping/option3.rdf and > http://www.w3.org/2007/powder/powder-grouping/option3.png > > [4] http://www.w3.org/2007/powder/powder-grouping/conjunction#option6 > > > Phil Archer wrote: >> A few small comments inline below >> >> Andrea Perego wrote: >>> Hi, Phil. >>> >>>> [snip] >>>> >>>> In your discussion, you suggest 4 possible solutions to the > pathContains >>>> issue. The complexities get more severe when we get into negatives > and, >>>> from my perspective, we're getting a long way away from a design >>>> fundamental of simplicity with the real possibility that a >>>> semi-technically minded person could write a set definition by hand > if >>>> necessary. >>> I think here we should consider if and why we should support > negation. >>> It is not just to support as much flexibility as possible. As was >>> reported in a previous version of the grouping document, negation is >>> useful in order to simplify the specification of a scope by also >>> supporting exceptions. >>> >>> Suppose, for instance, that a given DR applies to a set of hosts >>> my.example.org, your.example.org, his.example.org, her.example.org, >>> our.example.org, but not to their.example.org. >>> >>> If negation is not supported, the scope of the DR must be specified > as >>> follows: >>> >>> <wdr:Set> >>> <wdr:hasHost>my.example.org</wdr:hasHost> >>> <wdr:hasHost>your.example.org</wdr:hasHost> >>> <wdr:hasHost>her.example.org</wdr:hasHost> >>> <wdr:hasHost>his.example.org</wdr:hasHost> >>> <wdr:hasHost>our.example.org</wdr:hasHost> >>> </wdr:Set> >>> >>> otherwise, if a wdr:hasNotHost property is available, we can reduce > the >>> specification to >>> >>> <wdr:Set> >>> <wdr:hasHost>example.org</wdr:hasHost> >>> <wdr:hasNotHost>their.example.org</wdr:hasNotHost> >>> </wdr:Set> >>> >>> So the issue here, is to find a way of supporting negation in a safe > and >>> possibly `intuitive' way. >> I am certain that negation should be included and your example seems >> entirely intuitive to me. If, starting from the most significant >> portion, the resource is on the example.org domain AND is NOT on >> their.example.org, then it's in the Set. Easy. >> >> [snip] >>>> [snip] NB. use of intersectionOf and unionOf requires OWL >>>> DL, not OWL Lite - which gets us into more specialised inference >>>> engines. >>> And, consequently, we may have undecidable resource set definitions >>> (which is not a nice thing). The solution based on implicit semantics >>> (if resolved properly) is safe also with respect to this issue. >> Actually, no, it's OWL Full that does that. OWL DL is closed world > (just >> more complicated than OWL Lite). >> >>>> [snip: implicit conjunction inside a resource set definition - >>>> wdr:hosHostList property] >>> I don't completely agree. >>> >>> If we assume that all properties in a wdr:Set are always in end, > saying >>> "all the resources hosted by example.org and a path starting with foo > or >>> bar," will require two redundant resource set definitions: >>> >>> <wdr:Set> >>> <wdr:hasHost>example.org</wdr:hasHost> >>> <wdr:pathStartsWith>foo</wdr:pathStartsWith> >>> </wdr:Set> >>> >>> <wdr:Set> >>> <wdr:hasHost>example.org</wdr:hasHost> >>> <wdr:pathStartsWith>bar</wdr:pathStartsWith> >>> </wdr:Set> >>> >>> As you notice, this redundancy increases when we are talking of > hosts, >>> and not of path patterns, but I think that the need itself of > repeating >>> the same statement is far from being intuitive. >>> >>> I agree that it is preferable to combine *by default* all the > properties >>> in a resource set definition with the same Boolean operator, but the >>> solution you propose has several drawbacks in terms of > expressiveness. >>> In other words, if we support AND (implicitly), we must support also > OR >>> (explicitly) inside a resource set definition. >> Which brings us back to owl:unionOf and example 2A? >> >>> About the solutions to be >>> used for this, I'm not comfortable with space separated lists as > object >>> of RDF properties (in such a case why not using a RE? we have just to >>> substitute a blank space with a `|'). Also, we are forgetting here >>> grouping by property. I'm not sure that the considerations above > apply >>> also to them. >> I think these do apply to grouping by resource property. If the > resource >> property in question is colour then you can have a white space > separated >> list of colours. And I agree on the white space or | issue. But we're >> trying to find an alternative to using REs for those who don't like > them >> and that is less error prone (noting that REs are always going to be >> supported). >> >>> In other words, I'm for using RDF to express this. Of course, it may > be >>> verbose, not necessarily human-friendly, and require a lot > processing. >>> This is why I consider the `original' implicit semantics of resource > set >>> definitions (i.e., same properties in OR, different properties in > AND) >>> preferable, even though it is not formally sound. >> OK, I misunderstood your thinking. I thought you were opposed to > option >> 1. Ah well. >> >> Phil >> >> >>
Received on Wednesday, 23 May 2007 14:32:19 UTC