Re: Another attempt at 'cascading DRs' from Andrea Perego on 2008-02-13 (public-powderwg@w3.org from February 2008)

From: Andrea Perego <andrea.perego@uninsubria.it>
Date: Wed, 13 Feb 2008 16:43:46 +0100
To: Public POWDER <public-powderwg@w3.org>
Message-ID: <47B31032.1050201@uninsubria.it>
Well, there are several issues here, so let me start with the one 
related to the overriding mechanism.

In theory, we can (operationally) enforce a mechanism able to determine 
whether a DR scope S1 is more *specific* than another DR scope S2 wrt a 
given resource. For this purpose, we have just to check whether the set 
of URIs denoted by the regex corresponding S1 includes the set of URIs 
denoted by the regex corresponding to S2. Which is exactly what you 
said, Kai.

However, I'm not sure that this mechanism can work in our scenario, as 
Phil pointed out, unless we take a precise decision on what resource 
properties are denoted by the descriptors contained in a DR (all the 
properties of a set of resources? only those shared by all of them? only 
those shared by most of them?).

Note that the option we have followed so far implies that the 
descriptors denote the characteristics SHARED BY MOST OF the resources 
in the DR scope. Moreover, in order to simplify the specification of 
DRs, we assumed that, if the scope of a given DR1 denotes a superset of 
the resources denoted by the scope of a given DR2, the descriptors in 
DR1 apply also to (are inherited by) DR2. Operatively, this is obtained 
by computing the union of all the descriptors in the DRs applying to a 
given resource.

So if we say that resources hosted by example.org are blue and safe for 
children, whereas those hosted by example.org and having a path starting 
with foo are not safe for children, a resource having URI 
http://www.example.org/foo/ will be blue, safe for children, and not 
safe for children. And here's the issue on how we can deal with 
conflicting descriptions.

The first point is: how we can check whether two given descriptors are 
in conflict? For instance, if a given DR includes descriptors 
mime-type=text/html and mime-type=image/jpeg, this is not necessarily a 
conflict. Note the DR scope denotes a *set* of resources, not a single, 
physical resource, so it makes sense saying that in that set of 
resources you have both HTML files and JPEG images. Moreover, if you 
consider a Web page as a resource, since it may include multimedia 
content, different MIME types may apply to it. Similarly, it makes sense 
having childSafe=true and childSafe=false: this simply says that some 
resources in the set are safe for children, whereas others are not so.

Provided that, if we require that we cannot associate with a resource 
more than one descriptor concerning property, e.g., childSafe, this can 
be done by stating this restriction in the RDF/OWL schema defining such 
property. Done that, if we have two DRs saying:

DR1: example.org     -> childSafe=true
DR2: example.org/foo -> childSafe=false

we decide the prevailing one based on the specificity of the scope 
definition wrt to the resource's URI.

Note that this solution requires to check the definition of property 
childSafe in the corresponding RDF schema. Operatively this means:

1. Retrieve the RDF/OWL schema defining childSafe
2. If there's no restriction on the cardinality of childSafe, both 
descriptors apply; otherwise,
3. If the scope definition of one of the DRs is more specific than the 
other wrt the resource's URI, the former overrides the latter; otherwise,
4. If the two scope definitions are equivalent/incomparable... what? 
(ignore both descriptors?)

An alternative to default overriding, is to specify explicit overriding 
rules. But also in this case we have to check whether two descriptors 
are in conflict or not, based on their definition.

Andrea


Scheppe, Kai-Dietrich wrote:
> Hmmm...come to think of it...I think Andrea could really help us with
> this :-)
> 
> I am thinking if
> 
> - scope A = a large area
> - scope B = a small area
> 
> And
> 
> - B is a subset of A, meaning B is contained within A but through some
> property different.
> 
> If I transfer this onto URIs, I can talk about paths.
> 
> A = *.t-online.de
> A is a large scope
> 
> B = www.t-online.de/c/01/02/03/01020304.html 
> B is a small scope, but is also a subset of A
> 
> 
> Irrespective of what precisely is listed in the path, it /*IS*/ more
> specific than that used for the larger scope.
> 
> Some examples:
> 
> www.t-online.de/c/01/02/03/ 
> www.t-online.de/c/01/02/
> www.t-online.de/c/01/
> www.t-online.de/c/
> 
> Are all subsets, but
> 
> www.someotherdomain.de/c/01/02/03/01020304.html
> 
> is not.
> 
> 
> To summarize:
> 
> In CSS I define a style sheet and override this, on a "local" level,
> with an inline style.
> Here I would define the scope globally and everything that is within
> that scope can override, on a local level, the global definition.
> 
> Restating the link-back mechanism, if the "local" DR would link back to
> the "global" DR you make content discoverable.
> 
> 
> Does that make sense?
> 
> 
> Kai
> 
> 
> 
>  
> 
>> -----Original Message-----
>> From: public-powderwg-request@w3.org 
>> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
>> Sent: Tuesday, February 12, 2008 3:14 PM
>> To: Public POWDER
>> Subject: Re: Another attempt at 'cascading DRs'
>>
>>
>> The problem is that your DR_2 directly contradicts DR_1 and 
>> both are created by the same person, meaning "I think 
>> http://www.t-online.de/c/01/02/03/01020304.html is text AND I 
>> think it's an image." Something has to change. Yes, we can 
>> can have conflicting DRs but these would generally be created 
>> by different people, so it comes down to whom you trust. If 
>> the same person creates two conflicting DRs, you can't use 
>> trust as a means to separate them.
>>
>> In my proposal today I've effectively removed the scope 
>> entirely from the DR so it becomes impossible to go from the 
>> DR to the resource - i.e. 
>> eliminates the first problem you identify.
>>
>> On the upside, no machine can find the two descriptions and 
>> infer that they both apply to the same thing and therefore 
>> end up with a logical paradox. The downside is that you then 
>> can't use the DR as a discovery mechanism, only to provide 
>> information about what you already have.
>>
>> That's perhaps a bit of an over-statement. If you publish a 
>> POWDER doc that has an aboutHosts attribute of t-online.de 
>> and you have a description that says 'text' and one that says 
>> 'image' then it would be reasonable to infer that t-online.de 
>> includes both text and images - you just wouldn't know 
>> exactly where they were. You would, however, be able to crawl 
>> the site looking for links to "...#text" and "...#image" and 
>> create a catalogue/site map with relative ease.
>>
>> Taking your last paragraph: I'm not sure how you would define 
>> 'locally based' - it sounds very much like the linkFrom 
>> attribute already posited in the published doc [1] - i.e. you 
>> make it local by linking to it, yes? 
>> So the operation here is "use the cached DR that covers all 
>> of t-online.de" and the image would then have an HTTP header 
>> that linked to
>> DR_2 that says "I can override DR_1." You still need the link 
>> to create the property of being local. OK, I can see that 
>> working operationally but it seems to have a real danger of 
>> creating data that, when taken in isolation, can be contradictory.
>>
>> Another option might be to make more use of the issue date 
>> with the idea that if you have two conflicting DRs then the 
>> most recently issued one wins. What worries me here is the 
>> caching. If I get a DR that says t-online.de is all text, I 
>> might cache that. If there is no validUntil date, in theory 
>> it's valid until the entropic heat death of the universe. If 
>> there is a validUntil date, OK, I should only cache it up 
>> until that date. Either way, the new, more recent DR can 
>> legitimately be ignored by an optimised system until the 
>> original expires. Again, this problem disappears if there is no scope.
>>
>> P
>>
>> [1] http://www.w3.org/TR/2007/WD-powder-dr-20070925/#noPattern
>>
>>
>> Scheppe, Kai-Dietrich wrote:
>>> Hi,
>>>
>>> I have a question:
>>>
>>> The problem really doesn't exist when going from the 
>> resource to the DR.
>>> It does exist when going from the DR to the resource.
>>>
>>> But since we say at some point that conflicting DRs can 
>>> exist...afterall different CA could have different opinions about 
>>> content, it is up to the user to decide which DR he believes.
>>>
>>> Can't this principle apply here as well?
>>>
>>> Or better, if it doesn't apply here, why does it apply in general?
>>> And if it does and if this is a problem, how do we solve 
>> it...with the 
>>> knowledge that solving that problem would also solve this problem?
>>>
>>>
>>> Either way, I think if we just say that
>>>
>>> DR_1 says all content on t-online.de is text based
>>> DR_2 says that 
>> http://www.t-online.de/c/01/02/03/01020304.html is an 
>>> image
>>>
>>> then it is up to the peruser to decide whether to download this 
>>> resource.
>>>
>>> We could defuse the problem somewhat by requiring the more locally 
>>> based DR to refer to the more globally based DR.  This way an 
>>> application could create its own set of exceptions.
>>> So in the example above DR_2 would contain a link to DR_1.
>>>
>>>
>>> However, the problem centers on dealing with unknown DRs.
>>>
>>>
>>> -- Kai
>>>
>>>
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: public-powderwg-request@w3.org 
>>>> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
>>>> Sent: Tuesday, February 12, 2008 1:17 PM
>>>> To: Public POWDER
>>>> Subject: Another attempt at 'cascading DRs'
>>>>
>>>>
>>>> The basic POWDER model has a resource that describes a lot 
>> of other 
>>>> resources. A processor may start at the descriptive resource (the 
>>>> POWDER
>>>> document) and discover the resources it describes. To aid 
>> discovery 
>>>> of the description, a resource may link to to the POWDER document 
>>>> that describes it.
>>>>
>>>> In some important circumstances however this doesn't work. 
>>>> POWDER's Grouping mechanism [1] (currently under revision 
>> with a new 
>>>> draft due for publication v. soon) assumes that by 
>> examining a URI, 
>>>> one can deduce which description of a collection applies to it. If 
>>>> URIs don't follow a particular pattern, such as numerical URIs 
>>>> generated by some content management systems, we need a different 
>>>> mechanism: we must rely on the link from the described resource to 
>>>> point to the correct description.
>>>>
>>>> We've discussed this a lot, most recently in Athens, and 
>> we know we 
>>>> need to solve it. We also know that it's impractical in a 
>> commercial 
>>>> workflow to need to edit the POWDER document continually, 
>> adding in 
>>>> lists of exceptions to rules. We need to work more along the CSS 
>>>> model where there is a central file that carries the 
>> defined styles. 
>>>> Which style, indeed, which stylesheet, is applicable, is defined 
>>>> within the document for which it contains the styles. HTTP 
>> and client 
>>>> caching ensures that stylesheets need only be accessed once until 
>>>> updated.
>>>>
>>>> A Package of DRs, as currently defined at [2], has an attribute 
>>>> 'aboutHosts'. The structure of packages is going to be modified a 
>>>> little in the near future but this feature is a very 
>> useful one for 
>>>> processing efficiency. The plan now is to make it so that, where 
>>>> present, the aboutHosts guarantees that the DRs in the 
>> package do not 
>>>> cover any resources on domains other than those listed (it doesn't 
>>>> guarantee that all resources on those domains are described by the 
>>>> way, just that if the aboutHosts property lists 
>> example.org then you 
>>>> can be sure that it does not describe anything on example.com).
>>>>
>>>> OK, hold on to that and look at this:
>>>>
>>>> 1  <?xml version="1.0"?>
>>>> 2   <POWDER xmlns="http://www.w3.org/2007/05/powder#"
>>>> 3           xmlns:ex="http://example.org/vocab#"
>>>> 4           
>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>>>> 5    <attribution>
>>>> 6      <maker>http://authority.example.org/foaf.rdf#me</maker>
>>>> 7      <aboutHosts>example.org</aboutHosts>
>>>> 8    </attribution>
>>>>
>>>> 9    <Descriptors xml:id="red">
>>>> 10     <ex:color>red</ex:color>
>>>> 11   </Descriptors>
>>>>
>>>> 12   <Descriptors xml:id="blue">
>>>> 13     <ex:color>blue</ex:color>
>>>> 14   </Descriptors>
>>>>
>>>> 15 </POWDER>
>>>>
>>>> This is POWDER doc with its required attribution. Line 7 
>> adds in an 
>>>> aboutHosts element.
>>>>
>>>> But there are no DRs here, just two descriptions (and if 
>> there are no 
>>>> DRs there is no requirement for a URISet). A 'red page' on 
>>>> example.org would include a link element with an href attribute of 
>>>> "...powder.xml#red", blue pages would have #blue as the fragment 
>>>> identifier. The aboutHosts element prevents other domains 
>> pointing to 
>>>> this POWDER doc and claiming (quite possibly falsely) that 
>> the entity 
>>>> described at http://authority.example.org/foaf.rdf#me 
>> described them 
>>>> - that is, the POWDER doc author has a mechanism for 
>> restricting the 
>>>> scope of the descriptions without actually having a URISet.
>>>>
>>>> But... what's the POWDER-S version of this, i.e. the output of the 
>>>> GRDDL transform with formal semantics? Well, I guess it 
>> ends up just 
>>>> being:
>>>>
>>>> <rdf:Description rdf:about="">
>>>>    <foaf:maker
>>>> rdf:resource="http://authority.example.org/foaf.rdf#me" /> 
>>>> </rdf:Description>
>>>>
>>>> <owl:Class rdf:nodeID="red">
>>>>    <owl:intersectionOf rdf:parseType="Collection">
>>>>      <owl:Restriction>
>>>>        <owl:onProperty
>>>> rdf:resource="http://example.org/vocab#color" />
>>>>        <owl:hasValue>red</owl:hasValue>
>>>>      </owl:Restriction>
>>>>    </owl:intersectionOf>
>>>> </owl:Class>
>>>>
>>>> <owl:Class rdf:nodeID="blue">
>>>>    <owl:intersectionOf rdf:parseType="Collection">
>>>>      <owl:Restriction>
>>>>        <owl:onProperty
>>>> rdf:resource="http://example.org/vocab#color" />
>>>>        <owl:hasValue>blue</owl:hasValue>
>>>>      </owl:Restriction>
>>>>    </owl:intersectionOf>
>>>> </owl:Class>
>>>>
>>>> Notice that a) the aboutHosts element is not copied from the 
>>>> operational semantics - it's not needed here and, I think 
>> I'm right 
>>>> in saying, won't be of any value in the ordered list in a POWDER-S 
>>>> doc either. It could be included but I'm not sure that it 
>> will add a 
>>>> great deal.
>>>> b) there is no subClassOf relation asserted - which is good because
>>>> c) there is no URIset to be a sub class of the descriptors.
>>>>
>>>>
>>>> Three questions for people with the appropriate knowledge:
>>>>
>>>> So the XSLT here must only assert the sub class 
>> relationship if there 
>>>> is a URISet. Doable?
>>>>
>>>> I understand that, formally, creating a blank node in an RDF graph 
>>>> means that the universe is so arranged that there is at least one 
>>>> resource that has the properties given by those of the blank node. 
>>>> Does creating an OWL class in this way get us off this hook?
>>>>
>>>> How does this look, Kai?
>>>>
>>>> N.B. I'm trying to avoid having to create server-side 
>> software that 
>>>> returns triples with the described resource's URI as the subject - 
>>>> that's clearly the semantically pure way, but it's impractical.
>>>>
>>>> I'm asking all this because it obviously affects the rules on what 
>>>> MUST and SHOULD and MAY be in a POWDER doc - something Andrea's 
>>>> poised to encode in the schema and Kevin is poised to 
>> enshrine in the 
>>>> XSLT.
>>>>
>>>> Phil.
>>>>
>>>>
>>>>
>>>> [1] http://www.w3.org/TR/2007/WD-powder-grouping-20071031/
>>>> [2] 
>>>> http://www.w3.org/TR/2007/WD-powder-dr-20070925/#package-structure
Received on Wednesday, 13 February 2008 15:44:10 UTC