- From: Phil Archer <parcher@icra.org>
- Date: Tue, 22 May 2007 16:27:19 +0100
- To: Public POWDER <public-powderwg@w3.org>
Right, after a while away from this issue, here we are again, looking at
the conjunction document [1].
It feels as if we could spend an entire face to face meeting discussing
this so let's see if we can avoid that!
In recent posts, Andrea has been arguing for the implicit semantics of
option 1 so that our example of encoding "everything on example.com OR
example.org with a path containing foo OR bar" would be written as at [2].
I agree with Andrea in so far as if we want to express relatively
complex things then that's probably going to take some relatively
complex code. I just want to keep it as simple as possible (of course!).
I also believe it is very much in our interests to reduce the
opportunity for the data we create in POWDER to be misused. In
particular, I think it generally a good thing to close off Resource Set
definitions so that you can't publish further triples whose provenance
needs to be taken into account before deciding whether to use them or not.
Where I disagree with Andrea is that the implicit semantics of [2] are
the least worst option. I really don't like the idea that if you have
two of a given property then you combine them with OR but different
properties are combined with AND. It just sounds too woolly and error
prone to me.
And how would we encode those rules?
Limiting the cardinality of the various RDF properties is easy with OWL
Lite. Thus I generally favour option 3 [3] in which we give a list of
values as the value of the various RDF properties. Maybe a change in
name of those properties might help clarify thinking. How about this:
<wdr:ResourceSet>
<wdr:hasAnyHostFrom>example.com example.org</wdr:hasAnyHostFrom>
<wdr:pathContainsAnyOf>foo bar</wdr:pathContainsAnyOf>
</wdr:ResourceSet>
This is, again, a white space separated list but the altered RDF
property name makes it easier to read. We might consider defining 'list'
versions of the RDF properties we have so that the ones we have now
(hasHost, hasScheme etc.) remain as they are taking a single value, but
additional properties would take lists - but this seems overly
redundant since a list of length 1, such as
<wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom> is valid.
So to recap, this gives us the advantage of being able to limit
cardinality of each of our set definition properties to 0 or 1 (adding
to security). Each of these properties would be combined with logical AND.
Andrea makes good points about negation. Since this:
(($host !~ /example.org) || ($host !~ /example.net/))
is always true - a classic DeMorgan trap I think. So again, maybe a
change of RDF property name can help. How about this
<wdr:ResourceSet>
<wdr:hasAnyHostFrom>example.org example.com</wdr:hasAnyHostFrom>
<wdr:hasNotAnyHostFrom>search.example.org shopping.example.com
</wdr:hasNotAnyHostFrom>
</wdr:ResourceSet>
This translates as "if the host IS ANY of these but NOT ANY of these,
then it's in the Resource Set."
Lists only take us so far. Again, referring to Andrea's comments, what
about anything on example.org with a path beginning with foo OR bar and
resources on example.com with a path beginning with bar (only). White
space separated lists won't get us out of this - we need to use
something like owl:unionOf.
OK, let's actually use owl:unionOf.
Notice that owl:unionOf is a property, not a Class, therefore, Andrea's
code needs a little tweaking to give this:
1 <wdr:ResourceSet>
2 <owl:unionOf rdf:parseType="Collection">
3 <wdr:ResourceSet>
4 <wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
5 <wdr:pathStartsWithAnyOf>foo bar</wdr:pathStartsWithAnyOf>
6 </wdr:ResourceSet>
7 <wdr:ResourceSet>
8 <wdr:hasAnyHostFrom>example.net</wdr:hasAnyHostFrom>
9 <wdr:pathStartsWithAnyOf>bar</wdr:pathStartsWithAnyOf>
10 </wdr:ResourceSet>
11 </owl:unionOf>
12 </wdr:ResourceSet>
We have two Resource Sets here (which are Classes) and we use the
owl:unionOf predicate to create the union. More complex examples are
possible but given that we're supporting regular expressions, and, if my
line of argument holds, white space separated lists, the likelihood of a
more complex Resource Set definition than that shown here seems remote -
at least for the use cases under our consideration.
This retains the closed world objective. RDF Collections are closed
world - but I admit it's not clear to me how the constraint that a
Resource Set can have a sub set if it's the subject of an owl:unionOf,
intersectionOf or owl:complementOf predicate. Incidentally, using these
set operators puts us firmly in OWL DL, not OWL Lite (and, if I
understand it correctly, nested set operators might take us into OWL
Full so they should be strongly discouraged).
So I think we're building up a picture here.
If you want to define a set simple as 'everything on example.com' (which
remains the most likely scenario for our use cases) then you can do it
really easily
<wdr:ResourceSet>
<wdr:hasAnyHostFrom>example.com</wdr:hasAnyHostFrom>
</wdr:ResourceSet>
If you want something a little more complicated - like multiple hosts -
put them in a white space separated list.
If you need to create slightly more complex but still relatively simple
RS definitions that include multiple elements then that's possible too,
as we've seen with the original example.com/org plus foo/bar example.
We can define even more complex sets where we have (multiple
definitions) OR (other multiple definitions) using OWL set operators.
And if that isn't enough, you can always use a Regular Expression.
Actually, there's a thought, can you (meaningfully) have a white space
separated list of regular expressions?? probably not - so that's one of
our RDF properties that can only have a single value.
What about conjunctions of resources grouped by property? The group
hasn't discussed this yet, but if we go with my current proposal, below,
then how will that affect things?
Here's an RS definition for 'all resources on example.org that are in
French.
<wdr:Set>
<wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
<wdr:resourcesWith rdf:parseType="Resource">
<ex:lang>fr</ex:lang>
</wdr:resourcesWith >
<wdr:hasPropLookUp>
<wdr:PropLookUp>
<wdr:lookUpURI>$cURI</wdr:lookUpURI>
<wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
<wdr:responseContains>Content-Language: fr</wdr:responseContains>
</wdr:PropLookUp>
</wdr:hasPropLookUp>
</wdr:Set>
So this says that the language must be French and the way to find out
whether it is or not is to do a Head request to $cURI (the candidate
resource's URI) and see if you get a header back that says
"Content-Language: fr".
Can we use a white space separated list here? Sometimes, would be the
answer, I guess. Imagine we wanted to define a set as all resources on
example.org in French OR German. Try this:
<wdr:Set>
<wdr:hasAnyHostFrom>example.org</wdr:hasAnyHostFrom>
<wdr:resourcesWith rdf:parseType="Resource">
<ex:lang>fr de</ex:lang>
</wdr:resourcesWith >
<wdr:hasPropLookUp>
<wdr:PropLookUp>
<wdr:lookUpURI>$cURI</wdr:lookUpURI>
<wdr:method rdf:resource="http://www.w3.org/2006/http#HeadRequest" />
<wdr:responseContains>"Content-Language: fr"
"Content-Language: de"</wdr:responseContains>
</wdr:PropLookUp>
</wdr:hasPropLookUp>
</wdr:Set>
I've had to quote the list elements in the responseContains property but
I don't think it's unusual to require quoting of strings if they are to
include white space!
By way of an apology for the length of this post, let me summarise.
1. I don't like implied semantics and think we can do better.
2. We must surely accept complexity where complexity is being expressed
3. Complexity should be as scarce as the use cases that demand it
4. Changing the property names can make it clear (to humans) that the
value is a list
5. REs are supported anyway so they're always available for people who
prefer them (like me)
6. We can use OWL set operators where we need a union of otherwise
separate sets.
7. The multi-layered approach to conjunction can work just as well for
RS definitions by property, notwithstanding the need to support quoted
strings so that they can include white space.
Depending on your feedback, I'd like to write this up in the doc so it
can be presented properly. I would, however, like to include the
XML-based approach in the doc [4] as an alternative to all this.
Its principal attraction, for me, flows from the following argument: It
is likely that a generic RDF processor will be able to handle all
aspects of a DR, without modification, except the Resource Set. Since
the data in an RS definition needs to be handled slightly differently,
it does seem to be logical to make that explicit by quoting an XML
Literal within the RDF graph (which is what the pre-defined RDF datatype
of XML Literal is designed to allow you to do).
Its principal problem, IMHO, is that the definition of something as
simple as 'everything on example.org' should not require running a
separate XML parser/XPath query. I reckon we really need to see some
SPARQL queries against the RS data examples to settle this one??
Cheers
Phil.
[1] http://www.w3.org/2007/powder/powder-grouping/conjunction
[2] http://www.w3.org/2007/powder/powder-grouping/option1.rdf and
http://www.w3.org/2007/powder/powder-grouping/option1.png
[3] http://www.w3.org/2007/powder/powder-grouping/option3.rdf and
http://www.w3.org/2007/powder/powder-grouping/option3.png
[4] http://www.w3.org/2007/powder/powder-grouping/conjunction#option6
Phil Archer wrote:
>
> A few small comments inline below
>
> Andrea Perego wrote:
>> Hi, Phil.
>>
>>> [snip]
>>>
>>> In your discussion, you suggest 4 possible solutions to the pathContains
>>> issue. The complexities get more severe when we get into negatives and,
>>> from my perspective, we're getting a long way away from a design
>>> fundamental of simplicity with the real possibility that a
>>> semi-technically minded person could write a set definition by hand if
>>> necessary.
>>
>> I think here we should consider if and why we should support negation.
>> It is not just to support as much flexibility as possible. As was
>> reported in a previous version of the grouping document, negation is
>> useful in order to simplify the specification of a scope by also
>> supporting exceptions.
>>
>> Suppose, for instance, that a given DR applies to a set of hosts
>> my.example.org, your.example.org, his.example.org, her.example.org,
>> our.example.org, but not to their.example.org.
>>
>> If negation is not supported, the scope of the DR must be specified as
>> follows:
>>
>> <wdr:Set>
>> <wdr:hasHost>my.example.org</wdr:hasHost>
>> <wdr:hasHost>your.example.org</wdr:hasHost>
>> <wdr:hasHost>her.example.org</wdr:hasHost>
>> <wdr:hasHost>his.example.org</wdr:hasHost>
>> <wdr:hasHost>our.example.org</wdr:hasHost>
>> </wdr:Set>
>>
>> otherwise, if a wdr:hasNotHost property is available, we can reduce the
>> specification to
>>
>> <wdr:Set>
>> <wdr:hasHost>example.org</wdr:hasHost>
>> <wdr:hasNotHost>their.example.org</wdr:hasNotHost>
>> </wdr:Set>
>>
>> So the issue here, is to find a way of supporting negation in a safe and
>> possibly `intuitive' way.
>
> I am certain that negation should be included and your example seems
> entirely intuitive to me. If, starting from the most significant
> portion, the resource is on the example.org domain AND is NOT on
> their.example.org, then it's in the Set. Easy.
>
> [snip]
>>
>>> [snip] NB. use of intersectionOf and unionOf requires OWL
>>> DL, not OWL Lite - which gets us into more specialised inference
>>> engines.
>>
>> And, consequently, we may have undecidable resource set definitions
>> (which is not a nice thing). The solution based on implicit semantics
>> (if resolved properly) is safe also with respect to this issue.
>
> Actually, no, it's OWL Full that does that. OWL DL is closed world (just
> more complicated than OWL Lite).
>
>>
>>> [snip: implicit conjunction inside a resource set definition -
>>> wdr:hosHostList property]
>>
>> I don't completely agree.
>>
>> If we assume that all properties in a wdr:Set are always in end, saying
>> "all the resources hosted by example.org and a path starting with foo or
>> bar," will require two redundant resource set definitions:
>>
>> <wdr:Set>
>> <wdr:hasHost>example.org</wdr:hasHost>
>> <wdr:pathStartsWith>foo</wdr:pathStartsWith>
>> </wdr:Set>
>>
>> <wdr:Set>
>> <wdr:hasHost>example.org</wdr:hasHost>
>> <wdr:pathStartsWith>bar</wdr:pathStartsWith>
>> </wdr:Set>
>>
>> As you notice, this redundancy increases when we are talking of hosts,
>> and not of path patterns, but I think that the need itself of repeating
>> the same statement is far from being intuitive.
>>
>> I agree that it is preferable to combine *by default* all the properties
>> in a resource set definition with the same Boolean operator, but the
>> solution you propose has several drawbacks in terms of expressiveness.
>>
>> In other words, if we support AND (implicitly), we must support also OR
>> (explicitly) inside a resource set definition.
>
> Which brings us back to owl:unionOf and example 2A?
>
>> About the solutions to be
>> used for this, I'm not comfortable with space separated lists as object
>> of RDF properties (in such a case why not using a RE? we have just to
>> substitute a blank space with a `|'). Also, we are forgetting here
>> grouping by property. I'm not sure that the considerations above apply
>> also to them.
>
> I think these do apply to grouping by resource property. If the resource
> property in question is colour then you can have a white space separated
> list of colours. And I agree on the white space or | issue. But we're
> trying to find an alternative to using REs for those who don't like them
> and that is less error prone (noting that REs are always going to be
> supported).
>
>>
>> In other words, I'm for using RDF to express this. Of course, it may be
>> verbose, not necessarily human-friendly, and require a lot processing.
>> This is why I consider the `original' implicit semantics of resource set
>> definitions (i.e., same properties in OR, different properties in AND)
>> preferable, even though it is not formally sound.
>
> OK, I misunderstood your thinking. I thought you were opposed to option
> 1. Ah well.
>
> Phil
>
>
>
--
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
t. +44 (0)1473 434770
Skype: philarcher
w. http://www.fosi.org/people/philarcher/
Already labelled with ICRA? It's time to raise the bar on child
protection standards by ensuring your site is ICRAchecked.
See http://checked.icra.org/ for more info.
Received on Tuesday, 22 May 2007 15:51:04 UTC