Possible New Model (was Re: status report - formal layer) from Phil Archer on 2008-01-21 (public-powderwg@w3.org from January 2008)

From: Phil Archer <parcher@icra.org>
Date: Mon, 21 Jan 2008 15:18:35 +0000
To: Jeremy Carroll <jjc@hpl.hp.com>
CC: public-powderwg@w3.org
Message-ID: <4794B7CB.7010904@icra.org>
As promised on Friday afternoon, here are some musings on a possible new 
structure for Operational POWDER (POWDER-O), taking into account the 
recent discussion.


Example 1
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o1.rdf
Graph: http://www.fosi.org/projects/powder/servlet_63142.png

Let's begin with a simple case, that all resources dereferenced from a 
URI with a host component ending with example.org are red and square (I 
got these line numbers from the RDF validator but I've knocked off the 
namespace declarations).

11:   <rdf:Description rdf:about="">
12:     <foaf:maker 
rdf:resource="http://authority.example.org/foaf.rdf#me" />
13:     <dcterms:issued>2007-12-14</dcterms:issued>
14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
16:   </rdf:Description>
17:
18:   <wdr:DR rdf:ID="DR_1">
19:     <wdr:hasScope rdf:parseType="Resource">
20:       <wdr:includeHosts>example.org</wdr:includeHosts>
21:       <wdr:hasDescriptors rdf:parseType="Resource">
22:         <ex:colour>red</ex:colour>
23:         <ex:shape>red</ex:shape>
24:       </wdr:hasDescriptors>
25:     </wdr:hasScope>
26:   </wdr:DR>

The document was made by "...#me" on 14th December 2007 and as valid for 
2008. If you trust "...#me" and today's date is in 2008, then the DR in 
the document can be transformed into its semantic encoding (DR-S) and 
the RDF merged into your triple store for processing.

The single DR in the document has:
1. a URI set defined solely in terms of its host (example.org) and there 
are two descriptors from the ex namespace.

Now let's make it a little more complicated and have one DR but two URI 
sets.

Example 2
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o2.rdf
Graph: http://www.fosi.org/projects/powder/servlet_63180.png

11:   <rdf:Description rdf:about="">
12:     <foaf:maker 
rdf:resource="http://authority.example.org/foaf.rdf#me" />
13:     <dcterms:issued>2007-12-14</dcterms:issued>
14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
16:   </rdf:Description>
17:
18:   <wdr:DR rdf:ID="DR_1">
19:
20:     <wdr:hasScope rdf:parseType="Collection">
21:
22:       <wdr:URIset rdf:ID="URIset_1">
23:         <wdr:includeHosts>example.org</wdr:includeHosts>
24:         <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
25:         <wdr:hasDescriptors rdf:parseType="Resource">
26:           <ex:colour>red</ex:colour>
27:         </wdr:hasDescriptors>
28:         </wdr:URIset>
29:
30:        <wdr:URIset rdf:about="#URIset_2" />
31:
32:      </wdr:hasScope>
33:
34:   </wdr:DR>
35:
36:   <wdr:URIset rdf:ID="URIset_2">
37:     <wdr:includeHosts>example.org</wdr:includeHosts>
38:     <wdr:hasDescriptors rdf:parseType="Resource">
39:       <ex:colour>blue</ex:colour>
40:     </wdr:hasDescriptors>
41:   </wdr:URIset>

The attribution and validity information (lines 11 - 16) remain 
unchanged. Now though our DR contains two URIsets and two attendant 
descriptions. All resources identified by URIs with that have host 
components ending with example.org are blue except those where the path 
starts with /foo which are red.

If the trust and validity information is to your satisfaction, then you 
can transform this operational data into two DR-S instances and merge 
the RDF. N.B. The transformation must contain data from both URI sets thus:

<wdr:URISet rdf:ID="URISet_1">
   <owl:intersectionOf rdf:parseType="Collection">
     <owl:Restriction>
       <owl:onProperty rdf:resource="&wdr;includeHosts" />
       <owl:hasValue>example.org</owl:hasValue>
     </owl:Restriction>
     <owl:Restriction>
       *<owl:onProperty rdf:resource="&wdr;includePathStartsWith" />*
       <owl:hasValue>/foo</owl:hasValue>
     </owl:Restriction>
   </owl:intersectionOf>
</wdr:URISet>

And

<wdr:URISet rdf:ID="URISet_2">
   <owl:intersectionOf rdf:parseType="Collection">
     <owl:Restriction>
       <owl:onProperty rdf:resource="&wdr;includeHosts" />
       <owl:hasValue>example.org</owl:hasValue>
     </owl:Restriction>
     <owl:Restriction>
       *<owl:onProperty rdf:resource="&wdr;excludePathStartsWith" />*
       <owl:hasValue>/foo</owl:hasValue>
     </owl:Restriction>
   </owl:intersectionOf>
</wdr:URISet>

The exclude path starts with property in URISet 2 is generated by:

1. Noting the defining features of URISet 1 (example.org and /foo)
2. Noting the defining features of URISet 2 (example.org)
3. Taking the inverse of those present in 1 but absent in 2. 
(example.org and NOT /foo)

This is going to get complicated when there are n URISets in the 
sequence but hey, let's be optimistic...

We can add further DRs into the document iff the validity information is 
the same. I won't copy it out here but such an example is available:

Example 3
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o3.rdf
Graph: http://www.fosi.org/projects/powder/servlet_63202.png

This says that everything on example.org is blue except things with a 
URI path starting with /foo which are red. Separately, everything on 
example.org is circular except things with a URI path starting with /bar 
which is square. All assertions are subject to the same validity/trust 
conditions. And you process the Collections within each of the DRs to 
get these results or, if you want to merge the RDF into your triple 
store, you perform the transformation to get the full OWL-based version.

Two people providing different descriptions with different validity 
dates will need to publish separate RDF/XML instances.

N.B.

I have written these examples with the exceptional case written within 
the DR and the 'default' description (there's no other word for it) as a 
separate block. There is no difference in structure but it does reflect 
what I expect to be the workflow reality - we're describing everything 
in a given URI set as being like _this_ *except* things that have 
_these_ things in their URIs which are like _this_ instead. Less 
abstract: there is no sex, drugs or rock and roll on fosi.org _except_ 
where the URI path component begins with /associates where there might 
be. Operationally, you write the general case first and then worry about 
the exceptions in a separate thought process.

All of the RDF/XML instances given so far have a single URI so that a 
Web site can include an identical link element pointing to such a file 
irrespective of whether the resource is red, blue, square or circular - 
the POWDER client will sort it out.

But, where the Web site has no (discernible or usable) URI structure, 
this isn't good enough. We need to include a new conditional statement, 
linkFrom, that says that if you trust "...#me" and today's date is 
within range *and* the resource includes a link to a specific DR, _then_ 
it is valid.

The next example shows this.

Example 4
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o4.rdf
Graph: http://www.fosi.org/projects/powder/servlet_63230.png

11:   <rdf:Description rdf:about="">
12:     <foaf:maker 
rdf:resource="http://authority.example.org/foaf.rdf#me" />
13:     <dcterms:issued>2007-12-14</dcterms:issued>
14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
16:     <wdr:linkFrom>true</wdr:linkFrom>
17:   </rdf:Description>
18:
19:   <wdr:DR rdf:ID="DR_1">
20:     <wdr:hasScope rdf:parseType="Resource">
21:       <wdr:includeHosts>example.org</wdr:includeHosts>
22:       <wdr:hasDescriptors rdf:parseType="Resource">
23:         <ex:texture>smooth</ex:texture>
24:       </wdr:hasDescriptors>
25:     </wdr:hasScope>
26:   </wdr:DR>
27:
28:
29:   <wdr:DR rdf:ID="DR_2">
30:     <wdr:hasScope rdf:parseType="Resource">
31:       <wdr:includeHosts>example.org</wdr:includeHosts>
32:       <wdr:includePathStartsWith>/sawn</wdr:includePathStartsWith>
33:       <wdr:hasDescriptors rdf:parseType="Resource">
34:         <ex:texture>rough</ex:texture>
35:       </wdr:hasDescriptors>
36:     </wdr:hasScope>
37:   </wdr:DR>

Notice line 16 which introduces the linkFrom element.

Then the URI sets for each DR are identical - everything on example.org 
- you need to refer to the linkFrom element to decide which is 
applicable (actually, you'd probably define the URI set once in a 
separate block and refer to it from both DRs). I know this is a pain - 
deliberately publishing two sets of triples that say different things 
about the same subjects - but, as they say in Vladivostok, c'ést la Guerre.

What about storing lots of DRs in a single RDF/XML instance? Well, it's 
clear that we can't. If you're a content provider and you need to have 
several DRs covering different domains of interest then you simply 
create multiple RDF/XML files and link as you need to. If you're a 
labelling authority you store your DRs in a database with a front end 
like, oh I dunno, http://repository.icra.org/label?id=1 :-) which calls, 
I mean, would call a script that would return a single RDF/XML instance.

This might mean we re-visit the issue of whether we want to put a hint 
in the link element as to what vocabularies are used in a given DR.

If this is along the right lines then it seems to me we need to revisit 
the question of whether a DR-O is written in RDF/XML, just XML or 
something in between. All the examples above are valid RDF/XML but we 
have not got rid of the problem of generic RDF tools sucking in the 
triples and trying to make sense of them out of context. Personally I'm 
tending towards DR-Os being written in XML with only DR-Ss in RDF. It 
seems that GRDDL cannot transform RDF/XML and, I think I'm right in 
saying, that XSLT will have difficulty too.

I'll do some more playing around with that now...

Phil.



Phil Archer wrote:
> 
> Good, this feels as if we're making progress (or rather, you're making 
> progress in a promising direction :-)).
> 
> I'll do some more playing around on Monday morning and see if I come up 
> against anything we're missing.
> 
> Have a good weekend and thank you.
> 
> Phil.
> 
> Jeremy Carroll wrote:
>>
>> Phil Archer wrote:
>>>
>>>
>>> Jeremy Carroll wrote:
>>> [snip]
>>>>
>>>> If we choose to make the GRDDL transform make the DR-S include the 
>>>> subClassOf relationship as above, then we have the issue that in a 
>>>> package (or any collection of DRs) some of the DRs may be valid and 
>>>> some may be invalid, and all the subClassOf relationships are in the 
>>>> same file, and it is unclear how to distinguish the ones we want to 
>>>> claim (the valid ones), from the ones we don't (the invalid ones).
>>>
>>> I take this point. It may be that we can do something about it 
>>> though. We have so far taken the view that a DR should be 
>>> self-contained and that a package is therefore a group of 
>>> self-contained units. Doing this means that the validity information 
>>> (and attribution) is NOT inherited by DRs in the package. However... 
>>> we then had to introduce the idea of using dcterms:isPartOf to force 
>>> the processing of these "discrete DRs" in a particular order [1]. In 
>>> such a scenario, yes, each DR would have its own validity and 
>>> attribution.
>>>
>>> But it doesn't have to be this way...
>>>
>>> It would be possible I think to work with the package carrying the 
>>> validity information that was then inherited by the DRs within that 
>>> package - which I think from what you say would make life easier?
>>>
>>>
>>
>>
>> Yes - I was thinking along these lines.
>>
>> I was discussing this with Stuart - a possible view is then:
>>
>> The unit of a POWDER description is a document, which may contain a 
>> single wdr:DR or a single wdr:Package.
>>
>> Either way the document has information pertinent to the relevance of 
>> the document:
>> e.g. validity and who vouches for it.
>>
>> Operationally the process of trust is as follows:
>>
>> for each possible document that you might be considering, you read 
>> that document, then understanding what that document says about 
>> itself, if you are satisfied that you want to act on that document 
>> (e.g. it is valid, and is vouched for by an appropriate authority), 
>> then you load it into your knowledge base (formally corresponding to 
>> an RDF merge using the POWDER-S GRDDL result)
>>
>> The resulting RDF graph consists of only valid POWDER DRs, which have 
>> been vouched for by appropriate authorities.
>>
>>  From the formal side the motivations for doing this way are:
>> - it is known that temporal logics (i.e. dealing with time in a 
>> logical way) is a hard problem
>> - it is known that dealing with trust in logic is a hard problem
>> - it is clear that POWDER deals with both time and trust, but in 
>> simple ways
>> - hence it feels inappropriate to do the time and trust parts in the 
>> formal logical layer, but to deal with them in a pragmatic layer prior 
>> to, but informed by, the logical treatment
>>
>> It is a limitation of the current RDF technology that it is hard to 
>> talk about part of an RDF graph, and its validity, or who vouches for 
>> that part - hence the desire to talk about documents containing 
>> RDF/XML that expresses those parts of the graph.
>> I think it is possible to design documents of the 'right' size so that:
>>
>> - validity and vouching are pertinent on a document by document level 
>> (and not on a finer grain)
>> - documents are large enough that the small scale powder user need 
>> only write one document for their site, or maybe two.
>>
>> - that the expectations on large publishers who may need to make 
>> declarations that fit into complex workflows are intelligible and not 
>> too burdensome.
>>
>> Jeremy
>>
Received on Monday, 21 January 2008 15:18:56 UTC