Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of resource set definitions

David wrote:

I see where you are coming from. I think it depends how you view an RS 
definition. The model is that the RS definition implies a Set of 
Resources. A candidate resource is a member of that Set IF it meets the 
conditions.

[..]
 > I think
> that by using them we're relying on people to associate the white space 
> with logical AND. 

No, I'm saying that white space equates to logical OR. So you have

<wdr:hasAnyHostFrom>host1 host2</wdr:hasAnyHostFrom>
<wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf>

which means that a candidate resource is a member of the RS IF it's on 
host1 OR host2 AND its path starts with foo OR bar.

So, we are saying explicitly that you treat each member of the list 
given for each RDF property as equal and combining the lists with AND.

Where I think your confusion comes in - and you won't be alone in this - 
is in seeing the definitions as complete in their own right. So that the 
Resource Set is everything on host1 AND host2 with a path that starts 
with foo OR bar. Seen that way around, yes, you're right, option 3 is no 
better than option 1 and the use of AND or OR is not explicit.

The main document [1] does, I hope, make it clear which way round we're 
thinking?

Does this help?


I am not sure what the precedent is that Kevin
> mentions but personally i see no reason why a white space (in this 
> implementation) could not be seen as any other type of logical operator. 
> FWIW i think the argument against option 3 should be the same as that 
> against option 1.
> 
> I also don't like the REGEX option. I think that expecting the 
> implementers to be proficient in REGEX as well as RDF may be asking a 
> little too much. Yes, it does provide a solution to our dilema but i 
> think we can find a better option.

We've resolved to support REs as one method. We're also committed to 
provide alternatives for people who don't like them. Also, in the doc we 
say that if you can define a set using simple string matching and 
without useing an RE you should (because it's less error prone).

> 
> I can't choose my favourite option as the arguments for and against are 
> all very compelling. I do think it is import to have a closed DR scope. 
> Its probably then going to come down to a question of which is more 
> important - ease of implementation (use/understanding) or ease of 
> processing.

I'm working on a series of SPARQL queries that I hope will help to move 
us forward.

Phil.

[1] http://www.w3.org/2007/powder/powder-grouping/conjunction

> 
> ----- Original Message ----- From: "Phil Archer" <parcher@icra.org>
> To: "Public POWDER" <public-powderwg@w3.org>
> Cc: "Jo Rabin" <jo@linguafranca.org>
> Sent: Wednesday, May 23, 2007 3:32 PM
> Subject: Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of 
> resource set definitions
> 
> 
>>
>> Thanks very much Kevin, I really appreciate you taking time to look at 
>> this.
>>
>> Keeping each property value to a single item, obviating the need for 
>> list parsing, is a good benefit. The only drawback is that it means we 
>> can't use OWL cardinality to restrict the number of, say, 
>> hasPathStartsWith properties. That means that you can publish your DR 
>> and then on my server I can publish an RDF triple that says
>>
>> <your Resource Set's URI> wdr:hasPathStartsWith 'red'
>>
>> And a semantic system could pick that up and add it to your DR 
>> definition. True, the provenance of that triple can be checked, but 
>> this is what I mean by being open, as opposed to closed world.
>>
>> The other problem is that OWL set operators are predicates 
>> (properties) that therefore must have Classes as their value. So in 
>> fact your example would have to be written thus:
>>
>> <wdr:ResourceSet>
>>   <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>
>>   <owl:unionOf rdf:parseType="Collection">
>>
>>     <wdr:ResourceSet>
>>       <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>>     </wdr:ResourceSet>
>>
>>     <wdr:ResourceSet>
>>       <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>>     </wdr:ResourceSet>
>>
>>   </owl:unionOf>
>> </wdr:ResourceSet>
>>
>> as opposed to
>>
>> <wdr:ResourceSet>
>>   <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>   <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf>
>> </wdr:ResourceSet>
>>
>> Yes, there's more processing of the values, but that's something that 
>> an application can do in a single line normally (in Perl certainly) 
>> whereas to extract multiple values from multiple properties of 
>> multiple sets in an OWL collection - that sounds like several SPARQL 
>> queries just to get the data. That said, it wouldn't surprise me if 
>> this is the solution an RDF head would prefer. Hmmm...
>>
>> But... your example does perhaps point towards the XML-based solution 
>> proposed by Jo in the XG. And talking of Jo...
>>
>> I know he and others feel that REs are a road to confusion and error 
>> and, no doubt, in some cases that's true. As I've worked with them a 
>> bit I reckon that's the easiest way forward but, well, that's what I 
>> expect to use most of the time and I guess you would too. But we need 
>> alternative as well. Also, as Andrea is usually quick to point out, 
>> they don't work on RS defined by resource property. For all that 
>> though I'm awfully tempted to put this in IRC next time
>>
>> PROPOSED RESOLUTION: Conjunctions are unnecessary since Regular 
>> Expressions provide all the flexibility we need.
>>
>> ... but I'll keep that urge under control.
>>
>> We always knew this would be the hard part to resolve!
>>
>> Phil.
>>
>> Smith, Kevin, VF-Group wrote:
>>> HI Phil,
>>>
>>> Good work! Some thoughts:
>>>
>>> There is precedent for whitespace-delimited lists in element/attribute
>>> values, but would another option be to use owl:unionOf within the RS:
>>>
>>> 3      <wdr:ResourceSet>
>>> 4        <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>> <owl:unionOf rdf:parseType="Collection">
>>> 5          <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>>> 5          <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>>>             </owl:unionOf>
>>> 6      </wdr:ResourceSet> That may be more friendly to RDF parsers 
>>> (i.e. no extra string
>>> operations needed to extract values). Not sure if that risks nested set
>>> operators and OWL Full, as you say.
>>>
>>> NB I was looking at Apache rewrite rules, since they also work on
>>> matching URIs and have a widespread following. It appears there has not
>>> been developed a higher-level language of matching, but a use of (often
>>> complex) REs. IMO this gives credence to the use of REs for our kind of
>>> matching use cases.
>>>
>>> Overall, happy to see this written up further.
>>>
>>> Vodafone Group Services Limited
>>> Registered Office: Vodafone House, The Connection,
>>> Newbury, Berkshire RG14 2FN
>>> Registered in England No 3802001/
>>>
>>> -----Original Message-----
>>> From: public-powderwg-request@w3.org
>>> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
>>> Sent: 22 May 2007 16:27
>>> To: Public POWDER
>>> Subject: Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of
>>> resource set definitions
>>>
>>>
>>> Right, after a while away from this issue, here we are again, looking at
>>>
>>> the conjunction document [1].
>>>
>>> It feels as if we could spend an entire face to face meeting 
>>> discussing this so let's see if we can avoid that!
>>>
>>> In recent posts, Andrea has been arguing for the implicit semantics 
>>> of option 1 so that our example of encoding "everything on 
>>> example.com OR example.org with a path containing foo OR bar" would 
>>> be written as at
>>> [2].
>>>
>>> I agree with Andrea in so far as if we want to express relatively 
>>> complex things then that's probably going to take some relatively 
>>> complex code. I just want to keep it as simple as possible (of course!).
>>>
>>> I also believe it is very much in our interests to reduce the 
>>> opportunity for the data we create in POWDER to be misused. In 
>>> particular, I think it generally a good thing to close off Resource 
>>> Set definitions so that you can't publish further triples whose 
>>> provenance needs to be taken into account before deciding whether to 
>>> use them or
>>> not.
>>>
>>> Where I disagree with Andrea is that the implicit semantics of [2] 
>>> are the least worst option. I really don't like the idea that if you 
>>> have two of a given property then you combine them with OR but 
>>> different properties are combined with AND. It just sounds too woolly 
>>> and error prone to me.
>>>
>>> And how would we encode those rules?
>>>
>>> Limiting the cardinality of the various RDF properties is easy with 
>>> OWL Lite. Thus I generally favour option 3 [3] in which we give a 
>>> list of values as the value of the various RDF properties. Maybe a 
>>> change in name of those properties might help clarify thinking. How 
>>> about this:
>>>
>>> <wdr:ResourceSet>
>>>    <wdr:hasAnyHostFrom>example.com example.org</wdr:hasAnyHostFrom>
>>>    <wdr:pathContainsAnyOf>foo bar</wdr:pathContainsAnyOf>
>>> </wdr:ResourceSet>
>>>
>>> This is, again, a white space separated list but the altered RDF 
>>> property name makes it easier to read. We might consider defining 'list'
>>>
>>> versions of the RDF properties we have so that the ones we have now 
>>> (hasHost, hasScheme etc.) remain as they are taking a single value, 
>>> but additional properties would take lists - but this seems overly 
>>> redundant since a list of length 1, such as 
>>> <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> is valid.
>>>
>>> So to recap, this gives us the advantage of being able to limit 
>>> cardinality of each of our set definition properties to 0 or 1 
>>> (adding to security). Each of these properties would be combined with 
>>> logical
>>> AND.
>>>
>>> Andrea makes good points about negation. Since this:
>>>
>>> (($host !~ /example.org) || ($host !~ /example.net/))
>>>
>>> is always true - a classic DeMorgan trap I think. So again, maybe a 
>>> change of RDF property name can help. How about this
>>>
>>> <wdr:ResourceSet>
>>>    <wdr:hasAnyHostFrom>example.org example.com</wdr:hasAnyHostFrom>
>>>    <wdr:hasNotAnyHostFrom>search.example.org shopping.example.com
>>>                                          </wdr:hasNotAnyHostFrom>
>>> </wdr:ResourceSet>
>>>
>>> This translates as "if the host IS ANY of these but NOT ANY of these, 
>>> then it's in the Resource Set."
>>>
>>> Lists only take us so far. Again, referring to Andrea's comments, 
>>> what about anything on example.org with a path beginning with foo OR 
>>> bar and resources on example.com with a path beginning with bar 
>>> (only). White space separated lists won't get us out of this - we 
>>> need to use something like owl:unionOf.
>>>
>>> OK, let's actually use owl:unionOf.
>>>
>>> Notice that owl:unionOf is a property, not a Class, therefore, 
>>> Andrea's code needs a little tweaking to give this:
>>>
>>> 1  <wdr:ResourceSet>
>>> 2    <owl:unionOf rdf:parseType="Collection">
>>>
>>> 3      <wdr:ResourceSet>
>>> 4        <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>> 5        <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf>
>>> 6      </wdr:ResourceSet>
>>>
>>> 7      <wdr:ResourceSet>
>>> 8        <wdr:hasAnyHostFrom>example.net</wdr:hasAnyHostFrom>
>>> 9         <wdr:pathStartsWithAnyOf>bar</wdr:pathStartsWithAnyOf>
>>> 10     </wdr:ResourceSet>
>>>
>>> 11   </owl:unionOf>
>>> 12 </wdr:ResourceSet>
>>>
>>> We have two Resource Sets here (which are Classes) and we use the 
>>> owl:unionOf predicate to create the union. More complex examples are 
>>> possible but given that we're supporting regular expressions, and, if my
>>>
>>> line of argument holds, white space separated lists, the likelihood of a
>>>
>>> more complex Resource Set definition than that shown here seems remote -
>>>
>>> at least for the use cases under our consideration.
>>>
>>> This retains the closed world objective. RDF Collections are closed 
>>> world - but I admit it's not clear to me how the constraint that a 
>>> Resource Set can have a sub set if it's the subject of an 
>>> owl:unionOf, intersectionOf or owl:complementOf predicate. 
>>> Incidentally, using these set operators puts us firmly in OWL DL, not 
>>> OWL Lite (and, if I understand it correctly, nested set operators 
>>> might take us into OWL Full so they should be strongly discouraged).
>>>
>>> So I think we're building up a picture here.
>>>
>>> If you want to define a set simple as 'everything on example.com' (which
>>>
>>> remains the most likely scenario for our use cases) then you can do 
>>> it really easily
>>>
>>> <wdr:ResourceSet>
>>>    <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom>
>>> </wdr:ResourceSet>
>>>
>>> If you want something a little more complicated - like multiple hosts 
>>> - put them in a white space separated list.
>>>
>>> If you need to create slightly more complex but still relatively 
>>> simple RS definitions that include multiple elements then that's 
>>> possible too, as we've seen with the original example.com/org plus 
>>> foo/bar example.
>>>
>>> We can define even more complex sets where we have (multiple 
>>> definitions) OR (other multiple definitions) using OWL set operators.
>>>
>>> And if that isn't enough, you can always use a Regular Expression. 
>>> Actually, there's a thought, can you (meaningfully) have a white 
>>> space separated list of regular expressions?? probably not - so 
>>> that's one of our RDF properties that can only have a single value.
>>>
>>> What about conjunctions of resources grouped by property? The group 
>>> hasn't discussed this yet, but if we go with my current proposal, below,
>>>
>>> then how will that affect things?
>>>
>>> Here's an RS definition for 'all resources on example.org that are in 
>>> French.
>>>
>>> <wdr:Set>
>>>    <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>>
>>>    <wdr:resourcesWith rdf:parseType="Resource">
>>>      <ex:lang>fr</ex:lang>
>>>    </wdr:resourcesWith >
>>>
>>>    <wdr:hasPropLookUp>
>>>      <wdr:PropLookUp>
>>>        <wdr:lookUpURI>$cURI</wdr:lookUpURI>
>>>        <wdr:method
>>> rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
>>>        <wdr:responseContains>Content-Language: fr</wdr:responseContains>
>>>      </wdr:PropLookUp>
>>>    </wdr:hasPropLookUp>
>>>
>>> </wdr:Set>
>>>
>>> So this says that the language must be French and the way to find out 
>>> whether it is or not is to do a Head request to $cURI (the candidate 
>>> resource's URI) and see if you get a header back that says 
>>> "Content-Language: fr".
>>>
>>> Can we use a white space separated list here? Sometimes, would be the 
>>> answer, I guess. Imagine we wanted to define a set as all resources 
>>> on example.org in French OR German. Try this:
>>>
>>> <wdr:Set>
>>>    <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
>>>
>>>    <wdr:resourcesWith rdf:parseType="Resource">
>>>      <ex:lang>fr de</ex:lang>
>>>    </wdr:resourcesWith >
>>>
>>>    <wdr:hasPropLookUp>
>>>      <wdr:PropLookUp>
>>>        <wdr:lookUpURI>$cURI</wdr:lookUpURI>
>>>        <wdr:method
>>> rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
>>>        <wdr:responseContains>"Content-Language: fr"
>>>                            "Content-Language: de"</wdr:responseContains>
>>>      </wdr:PropLookUp>
>>>    </wdr:hasPropLookUp>
>>>
>>> </wdr:Set>
>>>
>>> I've had to quote the list elements in the responseContains property but
>>>
>>> I don't think it's unusual to require quoting of strings if they are 
>>> to include white space!
>>>
>>> By way of an apology for the length of this post, let me summarise.
>>>
>>> 1. I don't like implied semantics and think we can do better.
>>> 2. We must surely accept complexity where complexity is being expressed
>>> 3. Complexity should be as scarce as the use cases that demand it
>>> 4. Changing the property names can make it clear (to humans) that the 
>>> value is a list
>>> 5. REs are supported anyway so they're always available for people 
>>> who prefer them (like me)
>>> 6. We can use OWL set operators where we need a union of otherwise 
>>> separate sets.
>>> 7. The multi-layered approach to conjunction can work just as well 
>>> for RS definitions by property, notwithstanding the need to support 
>>> quoted strings so that they can include white space.
>>>
>>> Depending on your feedback, I'd like to write this up in the doc so 
>>> it can be presented properly. I would, however, like to include the 
>>> XML-based approach in the doc [4] as an alternative to all this.
>>>
>>> Its principal attraction, for me, flows from the following argument: 
>>> It is likely that a generic RDF processor will be able to handle all 
>>> aspects of a DR, without modification, except the Resource Set. Since 
>>> the data in an RS definition needs to be handled slightly 
>>> differently, it does seem to be logical to make that explicit by 
>>> quoting an XML Literal within the RDF graph (which is what the 
>>> pre-defined RDF datatype
>>>
>>> of XML Literal is designed to allow you to do).
>>>
>>> Its principal problem, IMHO, is that the definition of something as 
>>> simple as 'everything on example.org' should not require running a 
>>> separate XML parser/XPath query. I reckon we really need to see some 
>>> SPARQL queries against the RS data examples to settle this one??
>>>
>>> Cheers
>>>
>>> Phil.
>>>
>>>
>>> [1] http://www.w3.org/2007/powder/powder-grouping/conjunction
>>>
>>> [2] http://www.w3.org/2007/powder/powder-grouping/option1.rdf and 
>>> http://www.w3.org/2007/powder/powder-grouping/option1.png
>>>
>>> [3] http://www.w3.org/2007/powder/powder-grouping/option3.rdf and 
>>> http://www.w3.org/2007/powder/powder-grouping/option3.png
>>>
>>> [4] http://www.w3.org/2007/powder/powder-grouping/conjunction#option6
>>>
>>>
>>> Phil Archer wrote:
>>>> A few small comments inline below
>>>>
>>>> Andrea Perego wrote:
>>>>> Hi, Phil.
>>>>>
>>>>>> [snip]
>>>>>>
>>>>>> In your discussion, you suggest 4 possible solutions to the
>>> pathContains
>>>>>> issue. The complexities get more severe when we get into negatives
>>> and,
>>>>>> from my perspective, we're getting a long way away from a design
>>>>>> fundamental of simplicity with the real possibility that a
>>>>>> semi-technically minded person could write a set definition by hand
>>> if
>>>>>> necessary.
>>>>> I think here we should consider if and why we should support
>>> negation.
>>>>> It is not just to support as much flexibility as possible. As was
>>>>> reported in a previous version of the grouping document, negation is
>>>>> useful in order to simplify the specification of a scope by also
>>>>> supporting exceptions.
>>>>>
>>>>> Suppose, for instance, that a given DR applies to a set of hosts
>>>>> my.example.org, your.example.org, his.example.org, her.example.org,
>>>>> our.example.org, but not to their.example.org.
>>>>>
>>>>> If negation is not supported, the scope of the DR must be specified
>>> as
>>>>> follows:
>>>>>
>>>>> <wdr:Set>
>>>>>   <wdr:hasHost>my.example.org</wdr:hasHost>
>>>>>   <wdr:hasHost>your.example.org</wdr:hasHost>
>>>>>   <wdr:hasHost>her.example.org</wdr:hasHost>
>>>>>   <wdr:hasHost>his.example.org</wdr:hasHost>
>>>>>   <wdr:hasHost>our.example.org</wdr:hasHost>
>>>>> </wdr:Set>
>>>>>
>>>>> otherwise, if a wdr:hasNotHost property is available, we can reduce
>>> the
>>>>> specification to
>>>>>
>>>>> <wdr:Set>
>>>>>   <wdr:hasHost>example.org</wdr:hasHost>
>>>>>   <wdr:hasNotHost>their.example.org</wdr:hasNotHost>
>>>>> </wdr:Set>
>>>>>
>>>>> So the issue here, is to find a way of supporting negation in a safe
>>> and
>>>>>  possibly `intuitive' way.
>>>> I am certain that negation should be included and your example seems 
>>>> entirely intuitive to me. If, starting from the most significant 
>>>> portion, the resource is on the example.org domain AND is NOT on 
>>>> their.example.org, then it's in the Set. Easy.
>>>>
>>>> [snip]
>>>>>> [snip] NB. use of intersectionOf and unionOf requires OWL
>>>>>> DL, not OWL Lite - which gets us into more specialised inference 
>>>>>> engines.
>>>>> And, consequently, we may have undecidable resource set definitions
>>>>> (which is not a nice thing). The solution based on implicit semantics
>>>>> (if resolved properly) is safe also with respect to this issue.
>>>> Actually, no, it's OWL Full that does that. OWL DL is closed world
>>> (just
>>>> more complicated than OWL Lite).
>>>>
>>>>>> [snip: implicit conjunction inside a resource set definition - 
>>>>>> wdr:hosHostList property]
>>>>> I don't completely agree.
>>>>>
>>>>> If we assume that all properties in a wdr:Set are always in end,
>>> saying
>>>>> "all the resources hosted by example.org and a path starting with foo
>>> or
>>>>> bar," will require two redundant resource set definitions:
>>>>>
>>>>> <wdr:Set>
>>>>>   <wdr:hasHost>example.org</wdr:hasHost>
>>>>>   <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>>>>> </wdr:Set>
>>>>>
>>>>> <wdr:Set>
>>>>>   <wdr:hasHost>example.org</wdr:hasHost>
>>>>>   <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>>>>> </wdr:Set>
>>>>>
>>>>> As you notice, this redundancy increases when we are talking of
>>> hosts,
>>>>> and not of path patterns, but I think that the need itself of
>>> repeating
>>>>> the same statement is far from being intuitive.
>>>>>
>>>>> I agree that it is preferable to combine *by default* all the
>>> properties
>>>>> in a resource set definition with the same Boolean operator, but the
>>>>> solution you propose has several drawbacks in terms of
>>> expressiveness.
>>>>> In other words, if we support AND (implicitly), we must support also
>>> OR
>>>>> (explicitly) inside a resource set definition.
>>>> Which brings us back to owl:unionOf and example 2A?
>>>>
>>>>> About the solutions to be
>>>>> used for this, I'm not comfortable with space separated lists as
>>> object
>>>>> of RDF properties (in such a case why not using a RE? we have just to
>>>>> substitute a blank space with a `|'). Also, we are forgetting here
>>>>> grouping by property. I'm not sure that the considerations above
>>> apply
>>>>> also to them.
>>>> I think these do apply to grouping by resource property. If the
>>> resource
>>>> property in question is colour then you can have a white space
>>> separated
>>>> list of colours. And I agree on the white space or | issue. But 
>>>> we're trying to find an alternative to using REs for those who don't 
>>>> like
>>> them
>>>> and that is less error prone (noting that REs are always going to be 
>>>> supported).
>>>>
>>>>> In other words, I'm for using RDF to express this. Of course, it may
>>> be
>>>>> verbose, not necessarily human-friendly, and require a lot
>>> processing.
>>>>> This is why I consider the `original' implicit semantics of resource
>>> set
>>>>> definitions (i.e., same properties in OR, different properties in
>>> AND)
>>>>> preferable, even though it is not formally sound.
>>>> OK, I misunderstood your thinking. I thought you were opposed to
>>> option
>>>> 1. Ah well.
>>>>
>>>> Phil
>>>>
>>>>
>>>>
>>
>>
> 

Received on Thursday, 24 May 2007 13:30:52 UTC