Re: Another attempt at 'cascading DRs' from Phil Archer on 2008-02-14 (public-powderwg@w3.org from February 2008)

From: Phil Archer <parcher@icra.org>
Date: Thu, 14 Feb 2008 15:45:31 +0000
To: Public POWDER <public-powderwg@w3.org>
Message-ID: <47B4621B.4090209@icra.org>
I agree with Andrea's arguments but am far less happy than he is, and I 
see Kai is since I drafted this, to think in terms of the value of one 
property overriding the value of another depending on some processing 
rules. I much prefer the data to be unambiguous. Having to define which 
values override which is OK for CSS since both the vocabulary and the 
values are defined, but we want to support unrestricted descriptive 
vocabularies.

But I wonder what elements of CSS we can borrow - after all, it's a
widely used, understood and liked model.

I think we can keep the following:
1. A centralised file linked from anywhere that gets cached
2. The styles are only applied to documents that link to it.

We probably _could_ define a way to embed POWDER directly in HTML. The
microformat people seem to have no difficulty doing this and GRDDL can
get you from structured data in HTML to RDF easily enough - but I'm
hoping we don't need to.

The model I posted the other day [1] I believe meets what we can do
without falling foul of messy processing rules (which by the way would 
mean abandoning all relationship with RDF and the Semantic web). Like 
all other POWDER documents, the ones we're interested in here have an 
attribution/maker. They may also have temporal validity etc. and an 
aboutHosts element that effectively restricts all descriptions within 
the doc to the listed hosts (if we want to firm this up more we may have 
to define another semantic extension). The attribution data is used by 
an application to decide whether or not to trust the descriptions.

Description(s) within the doc are not scoped individually. Thus we are
not creating a URISet. A resource can link directly to a description,
finding this, an application can decide whether the data is valid and
whether to trust the data or not. If it does, it has the description.
Including the link, with fragment identifier, in a doc effectively
describes that document, just as CSS styles it. Change the link to
change the description.

This is identical in operation to the current RDF-CL-based system used
by t-online.de - so it would appear to be workable ;-). Of course one
can include multiple link elements in a document to provide multiple
descriptions.

It means that the definition of what a POWDER document MUST have is
limited solely to attribution since containing a DR element becomes
optional, and within a DR, the descriptors may be in a separate
document, and this may all make the validation of such a doc more
complex, let alone the XSLT/GRDDL to POWDER-S, but, well... that's life.

Phil.


Andrea Perego wrote:
> 
> Well, there are several issues here, so let me start with the one 
> related to the overriding mechanism.
> 
> In theory, we can (operationally) enforce a mechanism able to determine 
> whether a DR scope S1 is more *specific* than another DR scope S2 wrt a 
> given resource. For this purpose, we have just to check whether the set 
> of URIs denoted by the regex corresponding S1 includes the set of URIs 
> denoted by the regex corresponding to S2. Which is exactly what you 
> said, Kai.
> 
> However, I'm not sure that this mechanism can work in our scenario, as 
> Phil pointed out, unless we take a precise decision on what resource 
> properties are denoted by the descriptors contained in a DR (all the 
> properties of a set of resources? only those shared by all of them? only 
> those shared by most of them?).
> 
> Note that the option we have followed so far implies that the 
> descriptors denote the characteristics SHARED BY MOST OF the resources 
> in the DR scope. Moreover, in order to simplify the specification of 
> DRs, we assumed that, if the scope of a given DR1 denotes a superset of 
> the resources denoted by the scope of a given DR2, the descriptors in 
> DR1 apply also to (are inherited by) DR2. Operatively, this is obtained 
> by computing the union of all the descriptors in the DRs applying to a 
> given resource.
> 
> So if we say that resources hosted by example.org are blue and safe for 
> children, whereas those hosted by example.org and having a path starting 
> with foo are not safe for children, a resource having URI 
> http://www.example.org/foo/ will be blue, safe for children, and not 
> safe for children. And here's the issue on how we can deal with 
> conflicting descriptions.
> 
> The first point is: how we can check whether two given descriptors are 
> in conflict? For instance, if a given DR includes descriptors 
> mime-type=text/html and mime-type=image/jpeg, this is not necessarily a 
> conflict. Note the DR scope denotes a *set* of resources, not a single, 
> physical resource, so it makes sense saying that in that set of 
> resources you have both HTML files and JPEG images. Moreover, if you 
> consider a Web page as a resource, since it may include multimedia 
> content, different MIME types may apply to it. Similarly, it makes sense 
> having childSafe=true and childSafe=false: this simply says that some 
> resources in the set are safe for children, whereas others are not so.
> 
> Provided that, if we require that we cannot associate with a resource 
> more than one descriptor concerning property, e.g., childSafe, this can 
> be done by stating this restriction in the RDF/OWL schema defining such 
> property. Done that, if we have two DRs saying:
> 
> DR1: example.org     -> childSafe=true
> DR2: example.org/foo -> childSafe=false
> 
> we decide the prevailing one based on the specificity of the scope 
> definition wrt to the resource's URI.
> 
> Note that this solution requires to check the definition of property 
> childSafe in the corresponding RDF schema. Operatively this means:
> 
> 1. Retrieve the RDF/OWL schema defining childSafe
> 2. If there's no restriction on the cardinality of childSafe, both 
> descriptors apply; otherwise,
> 3. If the scope definition of one of the DRs is more specific than the 
> other wrt the resource's URI, the former overrides the latter; otherwise,
> 4. If the two scope definitions are equivalent/incomparable... what? 
> (ignore both descriptors?)
> 
> An alternative to default overriding, is to specify explicit overriding 
> rules. But also in this case we have to check whether two descriptors 
> are in conflict or not, based on their definition.
> 
> Andrea
> 
> 
> Scheppe, Kai-Dietrich wrote:
>> Hmmm...come to think of it...I think Andrea could really help us with
>> this :-)
>>
>> I am thinking if
>>
>> - scope A = a large area
>> - scope B = a small area
>>
>> And
>>
>> - B is a subset of A, meaning B is contained within A but through some
>> property different.
>>
>> If I transfer this onto URIs, I can talk about paths.
>>
>> A = *.t-online.de
>> A is a large scope
>>
>> B = www.t-online.de/c/01/02/03/01020304.html B is a small scope, but 
>> is also a subset of A
>>
>>
>> Irrespective of what precisely is listed in the path, it /*IS*/ more
>> specific than that used for the larger scope.
>>
>> Some examples:
>>
>> www.t-online.de/c/01/02/03/ www.t-online.de/c/01/02/
>> www.t-online.de/c/01/
>> www.t-online.de/c/
>>
>> Are all subsets, but
>>
>> www.someotherdomain.de/c/01/02/03/01020304.html
>>
>> is not.
>>
>>
>> To summarize:
>>
>> In CSS I define a style sheet and override this, on a "local" level,
>> with an inline style.
>> Here I would define the scope globally and everything that is within
>> that scope can override, on a local level, the global definition.
>>
>> Restating the link-back mechanism, if the "local" DR would link back to
>> the "global" DR you make content discoverable.
>>
>>
>> Does that make sense?
>>
>>
>> Kai
>>
>>
>>
>>  
>>
>>> -----Original Message-----
>>> From: public-powderwg-request@w3.org 
>>> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
>>> Sent: Tuesday, February 12, 2008 3:14 PM
>>> To: Public POWDER
>>> Subject: Re: Another attempt at 'cascading DRs'
>>>
>>>
>>> The problem is that your DR_2 directly contradicts DR_1 and both are 
>>> created by the same person, meaning "I think 
>>> http://www.t-online.de/c/01/02/03/01020304.html is text AND I think 
>>> it's an image." Something has to change. Yes, we can can have 
>>> conflicting DRs but these would generally be created by different 
>>> people, so it comes down to whom you trust. If the same person 
>>> creates two conflicting DRs, you can't use trust as a means to 
>>> separate them.
>>>
>>> In my proposal today I've effectively removed the scope entirely from 
>>> the DR so it becomes impossible to go from the DR to the resource - 
>>> i.e. eliminates the first problem you identify.
>>>
>>> On the upside, no machine can find the two descriptions and infer 
>>> that they both apply to the same thing and therefore end up with a 
>>> logical paradox. The downside is that you then can't use the DR as a 
>>> discovery mechanism, only to provide information about what you 
>>> already have.
>>>
>>> That's perhaps a bit of an over-statement. If you publish a POWDER 
>>> doc that has an aboutHosts attribute of t-online.de and you have a 
>>> description that says 'text' and one that says 'image' then it would 
>>> be reasonable to infer that t-online.de includes both text and images 
>>> - you just wouldn't know exactly where they were. You would, however, 
>>> be able to crawl the site looking for links to "...#text" and 
>>> "...#image" and create a catalogue/site map with relative ease.
>>>
>>> Taking your last paragraph: I'm not sure how you would define 
>>> 'locally based' - it sounds very much like the linkFrom attribute 
>>> already posited in the published doc [1] - i.e. you make it local by 
>>> linking to it, yes? So the operation here is "use the cached DR that 
>>> covers all of t-online.de" and the image would then have an HTTP 
>>> header that linked to
>>> DR_2 that says "I can override DR_1." You still need the link to 
>>> create the property of being local. OK, I can see that working 
>>> operationally but it seems to have a real danger of creating data 
>>> that, when taken in isolation, can be contradictory.
>>>
>>> Another option might be to make more use of the issue date with the 
>>> idea that if you have two conflicting DRs then the most recently 
>>> issued one wins. What worries me here is the caching. If I get a DR 
>>> that says t-online.de is all text, I might cache that. If there is no 
>>> validUntil date, in theory it's valid until the entropic heat death 
>>> of the universe. If there is a validUntil date, OK, I should only 
>>> cache it up until that date. Either way, the new, more recent DR can 
>>> legitimately be ignored by an optimised system until the original 
>>> expires. Again, this problem disappears if there is no scope.
>>>
>>> P
>>>
>>> [1] http://www.w3.org/TR/2007/WD-powder-dr-20070925/#noPattern
>>>
>>>
>>> Scheppe, Kai-Dietrich wrote:
>>>> Hi,
>>>>
>>>> I have a question:
>>>>
>>>> The problem really doesn't exist when going from the 
>>> resource to the DR.
>>>> It does exist when going from the DR to the resource.
>>>>
>>>> But since we say at some point that conflicting DRs can 
>>>> exist...afterall different CA could have different opinions about 
>>>> content, it is up to the user to decide which DR he believes.
>>>>
>>>> Can't this principle apply here as well?
>>>>
>>>> Or better, if it doesn't apply here, why does it apply in general?
>>>> And if it does and if this is a problem, how do we solve 
>>> it...with the
>>>> knowledge that solving that problem would also solve this problem?
>>>>
>>>>
>>>> Either way, I think if we just say that
>>>>
>>>> DR_1 says all content on t-online.de is text based
>>>> DR_2 says that 
>>> http://www.t-online.de/c/01/02/03/01020304.html is an
>>>> image
>>>>
>>>> then it is up to the peruser to decide whether to download this 
>>>> resource.
>>>>
>>>> We could defuse the problem somewhat by requiring the more locally 
>>>> based DR to refer to the more globally based DR.  This way an 
>>>> application could create its own set of exceptions.
>>>> So in the example above DR_2 would contain a link to DR_1.
>>>>
>>>>
>>>> However, the problem centers on dealing with unknown DRs.
>>>>
>>>>
>>>> -- Kai
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: public-powderwg-request@w3.org 
>>>>> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
>>>>> Sent: Tuesday, February 12, 2008 1:17 PM
>>>>> To: Public POWDER
>>>>> Subject: Another attempt at 'cascading DRs'
>>>>>
>>>>>
>>>>> The basic POWDER model has a resource that describes a lot 
>>> of other
>>>>> resources. A processor may start at the descriptive resource (the 
>>>>> POWDER
>>>>> document) and discover the resources it describes. To aid 
>>> discovery
>>>>> of the description, a resource may link to to the POWDER document 
>>>>> that describes it.
>>>>>
>>>>> In some important circumstances however this doesn't work. POWDER's 
>>>>> Grouping mechanism [1] (currently under revision 
>>> with a new
>>>>> draft due for publication v. soon) assumes that by 
>>> examining a URI,
>>>>> one can deduce which description of a collection applies to it. If 
>>>>> URIs don't follow a particular pattern, such as numerical URIs 
>>>>> generated by some content management systems, we need a different 
>>>>> mechanism: we must rely on the link from the described resource to 
>>>>> point to the correct description.
>>>>>
>>>>> We've discussed this a lot, most recently in Athens, and 
>>> we know we
>>>>> need to solve it. We also know that it's impractical in a 
>>> commercial
>>>>> workflow to need to edit the POWDER document continually, 
>>> adding in
>>>>> lists of exceptions to rules. We need to work more along the CSS 
>>>>> model where there is a central file that carries the 
>>> defined styles.
>>>>> Which style, indeed, which stylesheet, is applicable, is defined 
>>>>> within the document for which it contains the styles. HTTP 
>>> and client
>>>>> caching ensures that stylesheets need only be accessed once until 
>>>>> updated.
>>>>>
>>>>> A Package of DRs, as currently defined at [2], has an attribute 
>>>>> 'aboutHosts'. The structure of packages is going to be modified a 
>>>>> little in the near future but this feature is a very 
>>> useful one for
>>>>> processing efficiency. The plan now is to make it so that, where 
>>>>> present, the aboutHosts guarantees that the DRs in the 
>>> package do not
>>>>> cover any resources on domains other than those listed (it doesn't 
>>>>> guarantee that all resources on those domains are described by the 
>>>>> way, just that if the aboutHosts property lists 
>>> example.org then you
>>>>> can be sure that it does not describe anything on example.com).
>>>>>
>>>>> OK, hold on to that and look at this:
>>>>>
>>>>> 1  <?xml version="1.0"?>
>>>>> 2   <POWDER xmlns="http://www.w3.org/2007/05/powder#"
>>>>> 3           xmlns:ex="http://example.org/vocab#"
>>>>> 4           
>>> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>>>>> 5    <attribution>
>>>>> 6      <maker>http://authority.example.org/foaf.rdf#me</maker>
>>>>> 7      <aboutHosts>example.org</aboutHosts>
>>>>> 8    </attribution>
>>>>>
>>>>> 9    <Descriptors xml:id="red">
>>>>> 10     <ex:color>red</ex:color>
>>>>> 11   </Descriptors>
>>>>>
>>>>> 12   <Descriptors xml:id="blue">
>>>>> 13     <ex:color>blue</ex:color>
>>>>> 14   </Descriptors>
>>>>>
>>>>> 15 </POWDER>
>>>>>
>>>>> This is POWDER doc with its required attribution. Line 7 
>>> adds in an
>>>>> aboutHosts element.
>>>>>
>>>>> But there are no DRs here, just two descriptions (and if 
>>> there are no
>>>>> DRs there is no requirement for a URISet). A 'red page' on 
>>>>> example.org would include a link element with an href attribute of 
>>>>> "...powder.xml#red", blue pages would have #blue as the fragment 
>>>>> identifier. The aboutHosts element prevents other domains 
>>> pointing to
>>>>> this POWDER doc and claiming (quite possibly falsely) that 
>>> the entity
>>>>> described at http://authority.example.org/foaf.rdf#me 
>>> described them
>>>>> - that is, the POWDER doc author has a mechanism for 
>>> restricting the
>>>>> scope of the descriptions without actually having a URISet.
>>>>>
>>>>> But... what's the POWDER-S version of this, i.e. the output of the 
>>>>> GRDDL transform with formal semantics? Well, I guess it 
>>> ends up just
>>>>> being:
>>>>>
>>>>> <rdf:Description rdf:about="">
>>>>>    <foaf:maker
>>>>> rdf:resource="http://authority.example.org/foaf.rdf#me" /> 
>>>>> </rdf:Description>
>>>>>
>>>>> <owl:Class rdf:nodeID="red">
>>>>>    <owl:intersectionOf rdf:parseType="Collection">
>>>>>      <owl:Restriction>
>>>>>        <owl:onProperty
>>>>> rdf:resource="http://example.org/vocab#color" />
>>>>>        <owl:hasValue>red</owl:hasValue>
>>>>>      </owl:Restriction>
>>>>>    </owl:intersectionOf>
>>>>> </owl:Class>
>>>>>
>>>>> <owl:Class rdf:nodeID="blue">
>>>>>    <owl:intersectionOf rdf:parseType="Collection">
>>>>>      <owl:Restriction>
>>>>>        <owl:onProperty
>>>>> rdf:resource="http://example.org/vocab#color" />
>>>>>        <owl:hasValue>blue</owl:hasValue>
>>>>>      </owl:Restriction>
>>>>>    </owl:intersectionOf>
>>>>> </owl:Class>
>>>>>
>>>>> Notice that a) the aboutHosts element is not copied from the 
>>>>> operational semantics - it's not needed here and, I think 
>>> I'm right
>>>>> in saying, won't be of any value in the ordered list in a POWDER-S 
>>>>> doc either. It could be included but I'm not sure that it 
>>> will add a
>>>>> great deal.
>>>>> b) there is no subClassOf relation asserted - which is good because
>>>>> c) there is no URIset to be a sub class of the descriptors.
>>>>>
>>>>>
>>>>> Three questions for people with the appropriate knowledge:
>>>>>
>>>>> So the XSLT here must only assert the sub class 
>>> relationship if there
>>>>> is a URISet. Doable?
>>>>>
>>>>> I understand that, formally, creating a blank node in an RDF graph 
>>>>> means that the universe is so arranged that there is at least one 
>>>>> resource that has the properties given by those of the blank node. 
>>>>> Does creating an OWL class in this way get us off this hook?
>>>>>
>>>>> How does this look, Kai?
>>>>>
>>>>> N.B. I'm trying to avoid having to create server-side 
>>> software that
>>>>> returns triples with the described resource's URI as the subject - 
>>>>> that's clearly the semantically pure way, but it's impractical.
>>>>>
>>>>> I'm asking all this because it obviously affects the rules on what 
>>>>> MUST and SHOULD and MAY be in a POWDER doc - something Andrea's 
>>>>> poised to encode in the schema and Kevin is poised to 
>>> enshrine in the
>>>>> XSLT.
>>>>>
>>>>> Phil.
>>>>>
>>>>>
>>>>>
>>>>> [1] http://www.w3.org/TR/2007/WD-powder-grouping-20071031/
>>>>> [2] http://www.w3.org/TR/2007/WD-powder-dr-20070925/#package-structure
>
Received on Thursday, 14 February 2008 15:45:59 UTC