W3C home > Mailing lists > Public > public-powderwg@w3.org > May 2007

Re: (Ref.: ISSUE-12: Conjunction and disjunction) Semantics of resource set definitions

From: Phil Archer <parcher@icra.org>
Date: Tue, 22 May 2007 16:27:19 +0100
Message-ID: <46530BD7.1060806@icra.org>
To: Public POWDER <public-powderwg@w3.org>

Right, after a while away from this issue, here we are again, looking at 
the conjunction document [1].

It feels as if we could spend an entire face to face meeting discussing 
this so let's see if we can avoid that!

In recent posts, Andrea has been arguing for the implicit semantics of 
option 1 so that our example of encoding "everything on example.com OR 
example.org with a path containing foo OR bar" would be written as at [2].

I agree with Andrea in so far as if we want to express relatively 
complex things then that's probably going to take some relatively 
complex code. I just want to keep it as simple as possible (of course!). 
I also believe it is very much in our interests to reduce the 
opportunity for the data we create in POWDER to be misused. In 
particular, I think it generally a good thing to close off Resource Set 
definitions so that you can't publish further triples whose provenance 
needs to be taken into account before deciding whether to use them or not.

Where I disagree with Andrea is that the implicit semantics of [2] are 
the least worst option. I really don't like the idea that if you have 
two of a given property then you combine them with OR but different 
properties are combined with AND. It just sounds too woolly and error 
prone to me.

And how would we encode those rules?

Limiting the cardinality of the various RDF properties is easy with OWL 
Lite. Thus I generally favour option 3 [3] in which we give a list of 
values as the value of the various RDF properties. Maybe a change in 
name of those properties might help clarify thinking. How about this:

<wdr:ResourceSet>
   <wdr:hasAnyHostFrom>example.com example.org</wdr:hasAnyHostFrom>
   <wdr:pathContainsAnyOf>foo bar</wdr:pathContainsAnyOf>
</wdr:ResourceSet>

This is, again, a white space separated list but the altered RDF 
property name makes it easier to read. We might consider defining 'list' 
versions of the RDF properties we have so that the ones we have now 
(hasHost, hasScheme etc.) remain as they are taking a single value, but 
  additional properties would take lists - but this seems overly 
redundant since a list of length 1, such as 
<wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> is valid.

So to recap, this gives us the advantage of being able to limit 
cardinality of each of our set definition properties to 0 or 1 (adding 
to security). Each of these properties would be combined with logical AND.

Andrea makes good points about negation. Since this:

(($host !~ /example.org) || ($host !~ /example.net/))

is always true - a classic DeMorgan trap I think. So again, maybe a 
change of RDF property name can help. How about this

<wdr:ResourceSet>
   <wdr:hasAnyHostFrom>example.org example.com</wdr:hasAnyHostFrom>
   <wdr:hasNotAnyHostFrom>search.example.org shopping.example.com
                                         </wdr:hasNotAnyHostFrom>
</wdr:ResourceSet>

This translates as "if the host IS ANY of these but NOT ANY of these, 
then it's in the Resource Set."

Lists only take us so far. Again, referring to Andrea's comments, what 
about anything on example.org with a path beginning with foo OR bar and 
resources on example.com with a path beginning with bar (only). White 
space separated lists won't get us out of this - we need to use 
something like owl:unionOf.

OK, let's actually use owl:unionOf.

Notice that owl:unionOf is a property, not a Class, therefore, Andrea's 
code needs a little tweaking to give this:

1  <wdr:ResourceSet>
2    <owl:unionOf rdf:parseType="Collection">

3      <wdr:ResourceSet>
4        <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
5        <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf>
6      </wdr:ResourceSet>

7      <wdr:ResourceSet>
8        <wdr:hasAnyHostFrom>example.net</wdr:hasAnyHostFrom>
9         <wdr:pathStartsWithAnyOf>bar</wdr:pathStartsWithAnyOf>
10     </wdr:ResourceSet>

11   </owl:unionOf>
12 </wdr:ResourceSet>

We have two Resource Sets here (which are Classes) and we use the 
owl:unionOf predicate to create the union. More complex examples are 
possible but given that we're supporting regular expressions, and, if my 
line of argument holds, white space separated lists, the likelihood of a 
more complex Resource Set definition than that shown here seems remote - 
at least for the use cases under our consideration.

This retains the closed world objective. RDF Collections are closed 
world - but I admit it's not clear to me how the constraint that a 
Resource Set can have a sub set if it's the subject of an owl:unionOf, 
intersectionOf or owl:complementOf predicate. Incidentally, using these 
set operators puts us firmly in OWL DL, not OWL Lite (and, if I 
understand it correctly, nested set operators might take us into OWL 
Full so they should be strongly discouraged).

So I think we're building up a picture here.

If you want to define a set simple as 'everything on example.com' (which 
remains the most likely scenario for our use cases) then you can do it 
really easily

<wdr:ResourceSet>
   <wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom>
</wdr:ResourceSet>

If you want something a little more complicated - like multiple hosts - 
put them in a white space separated list.

If you need to create slightly more complex but still relatively simple 
RS definitions that include multiple elements then that's possible too, 
as we've seen with the original example.com/org plus foo/bar example.

We can define even more complex sets where we have (multiple 
definitions) OR (other multiple definitions) using OWL set operators.

And if that isn't enough, you can always use a Regular Expression. 
Actually, there's a thought, can you (meaningfully) have a white space 
separated list of regular expressions?? probably not - so that's one of 
our RDF properties that can only have a single value.

What about conjunctions of resources grouped by property? The group 
hasn't discussed this yet, but if we go with my current proposal, below, 
then how will that affect things?

Here's an RS definition for 'all resources on example.org that are in 
French.

<wdr:Set>
   <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>

   <wdr:resourcesWith rdf:parseType="Resource">
     <ex:lang>fr</ex:lang>
   </wdr:resourcesWith >

   <wdr:hasPropLookUp>
     <wdr:PropLookUp>
       <wdr:lookUpURI>$cURI</wdr:lookUpURI>
       <wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
       <wdr:responseContains>Content-Language: fr</wdr:responseContains>
     </wdr:PropLookUp>
   </wdr:hasPropLookUp>

</wdr:Set>

So this says that the language must be French and the way to find out 
whether it is or not is to do a Head request to $cURI (the candidate 
resource's URI) and see if you get a header back that says 
"Content-Language: fr".

Can we use a white space separated list here? Sometimes, would be the 
answer, I guess. Imagine we wanted to define a set as all resources on 
example.org in French OR German. Try this:

<wdr:Set>
   <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>

   <wdr:resourcesWith rdf:parseType="Resource">
     <ex:lang>fr de</ex:lang>
   </wdr:resourcesWith >

   <wdr:hasPropLookUp>
     <wdr:PropLookUp>
       <wdr:lookUpURI>$cURI</wdr:lookUpURI>
       <wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
       <wdr:responseContains>"Content-Language: fr"
                           "Content-Language: de"</wdr:responseContains>
     </wdr:PropLookUp>
   </wdr:hasPropLookUp>

</wdr:Set>

I've had to quote the list elements in the responseContains property but 
I don't think it's unusual to require quoting of strings if they are to 
include white space!

By way of an apology for the length of this post, let me summarise.

1. I don't like implied semantics and think we can do better.
2. We must surely accept complexity where complexity is being expressed
3. Complexity should be as scarce as the use cases that demand it
4. Changing the property names can make it clear (to humans) that the 
value is a list
5. REs are supported anyway so they're always available for people who 
prefer them (like me)
6. We can use OWL set operators where we need a union of otherwise 
separate sets.
7. The multi-layered approach to conjunction can work just as well for 
RS definitions by property, notwithstanding the need to support quoted 
strings so that they can include white space.

Depending on your feedback, I'd like to write this up in the doc so it 
can be presented properly. I would, however, like to include the 
XML-based approach in the doc [4] as an alternative to all this.

Its principal attraction, for me, flows from the following argument: It 
is likely that a generic RDF processor will be able to handle all 
aspects of a DR, without modification, except the Resource Set. Since 
the data in an RS definition needs to be handled slightly differently, 
it does seem to be logical to make that explicit by quoting an XML 
Literal within the RDF graph (which is what the pre-defined RDF datatype 
of XML Literal is designed to allow you to do).

Its principal problem, IMHO, is that the definition of something as 
simple as 'everything on example.org' should not require running a 
separate XML parser/XPath query. I reckon we really need to see some 
SPARQL queries against the RS data examples to settle this one??

Cheers

Phil.


[1] http://www.w3.org/2007/powder/powder-grouping/conjunction

[2] http://www.w3.org/2007/powder/powder-grouping/option1.rdf and 
http://www.w3.org/2007/powder/powder-grouping/option1.png

[3] http://www.w3.org/2007/powder/powder-grouping/option3.rdf and 
http://www.w3.org/2007/powder/powder-grouping/option3.png

[4] http://www.w3.org/2007/powder/powder-grouping/conjunction#option6


Phil Archer wrote:
> 
> A few small comments inline below
> 
> Andrea Perego wrote:
>> Hi, Phil.
>>
>>> [snip]
>>>
>>> In your discussion, you suggest 4 possible solutions to the pathContains
>>> issue. The complexities get more severe when we get into negatives and,
>>> from my perspective, we're getting a long way away from a design
>>> fundamental of simplicity with the real possibility that a
>>> semi-technically minded person could write a set definition by hand if
>>> necessary.
>>
>> I think here we should consider if and why we should support negation.
>> It is not just to support as much flexibility as possible. As was
>> reported in a previous version of the grouping document, negation is
>> useful in order to simplify the specification of a scope by also
>> supporting exceptions.
>>
>> Suppose, for instance, that a given DR applies to a set of hosts
>> my.example.org, your.example.org, his.example.org, her.example.org,
>> our.example.org, but not to their.example.org.
>>
>> If negation is not supported, the scope of the DR must be specified as
>> follows:
>>
>> <wdr:Set>
>>   <wdr:hasHost>my.example.org</wdr:hasHost>
>>   <wdr:hasHost>your.example.org</wdr:hasHost>
>>   <wdr:hasHost>her.example.org</wdr:hasHost>
>>   <wdr:hasHost>his.example.org</wdr:hasHost>
>>   <wdr:hasHost>our.example.org</wdr:hasHost>
>> </wdr:Set>
>>
>> otherwise, if a wdr:hasNotHost property is available, we can reduce the
>> specification to
>>
>> <wdr:Set>
>>   <wdr:hasHost>example.org</wdr:hasHost>
>>   <wdr:hasNotHost>their.example.org</wdr:hasNotHost>
>> </wdr:Set>
>>
>> So the issue here, is to find a way of supporting negation in a safe and
>>  possibly `intuitive' way.
> 
> I am certain that negation should be included and your example seems 
> entirely intuitive to me. If, starting from the most significant 
> portion, the resource is on the example.org domain AND is NOT on 
> their.example.org, then it's in the Set. Easy.
> 
> [snip]
>>
>>> [snip] NB. use of intersectionOf and unionOf requires OWL
>>> DL, not OWL Lite - which gets us into more specialised inference 
>>> engines.
>>
>> And, consequently, we may have undecidable resource set definitions
>> (which is not a nice thing). The solution based on implicit semantics
>> (if resolved properly) is safe also with respect to this issue.
> 
> Actually, no, it's OWL Full that does that. OWL DL is closed world (just 
> more complicated than OWL Lite).
> 
>>
>>> [snip: implicit conjunction inside a resource set definition - 
>>> wdr:hosHostList property]
>>
>> I don't completely agree.
>>
>> If we assume that all properties in a wdr:Set are always in end, saying
>> "all the resources hosted by example.org and a path starting with foo or
>> bar," will require two redundant resource set definitions:
>>
>> <wdr:Set>
>>   <wdr:hasHost>example.org</wdr:hasHost>
>>   <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>> </wdr:Set>
>>
>> <wdr:Set>
>>   <wdr:hasHost>example.org</wdr:hasHost>
>>   <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>> </wdr:Set>
>>
>> As you notice, this redundancy increases when we are talking of hosts,
>> and not of path patterns, but I think that the need itself of repeating
>> the same statement is far from being intuitive.
>>
>> I agree that it is preferable to combine *by default* all the properties
>> in a resource set definition with the same Boolean operator, but the
>> solution you propose has several drawbacks in terms of expressiveness.
>>
>> In other words, if we support AND (implicitly), we must support also OR
>> (explicitly) inside a resource set definition. 
> 
> Which brings us back to owl:unionOf and example 2A?
> 
>> About the solutions to be
>> used for this, I'm not comfortable with space separated lists as object
>> of RDF properties (in such a case why not using a RE? we have just to
>> substitute a blank space with a `|'). Also, we are forgetting here
>> grouping by property. I'm not sure that the considerations above apply
>> also to them.
> 
> I think these do apply to grouping by resource property. If the resource 
> property in question is colour then you can have a white space separated 
> list of colours. And I agree on the white space or | issue. But we're 
> trying to find an alternative to using REs for those who don't like them 
> and that is less error prone (noting that REs are always going to be 
> supported).
> 
>>
>> In other words, I'm for using RDF to express this. Of course, it may be
>> verbose, not necessarily human-friendly, and require a lot processing.
>> This is why I consider the `original' implicit semantics of resource set
>> definitions (i.e., same properties in OR, different properties in AND)
>> preferable, even though it is not formally sound.
> 
> OK, I misunderstood your thinking. I thought you were opposed to option 
> 1. Ah well.
> 
> Phil
> 
> 
> 

-- 
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
t. +44 (0)1473 434770
Skype: philarcher
w. http://www.fosi.org/people/philarcher/

Already labelled with ICRA? It's time to raise the bar on child 
protection standards by ensuring your site is ICRAchecked.
See http://checked.icra.org/ for more info.
Received on Tuesday, 22 May 2007 15:51:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:42:11 GMT