Re: Possible New Model (was Re: status report - formal layer) from Andrea Perego on 2008-01-30 (public-powderwg@w3.org from January 2008)

From: Andrea Perego <andrea.perego@uninsubria.it>
Date: Wed, 30 Jan 2008 09:54:04 +0100
To: public-powderwg@w3.org
CC: Jeremy Carroll <jjc@hpl.hp.com>
Message-ID: <47A03B2C.1080400@uninsubria.it>
Thanks for having worked this out, Phil.

I would like to add just a note concerning an old issue, namely, whether
DR Descriptors should be included into the resource set definition or not.

Following Jeremy's mail [1], what we called the resource set definition 
(corresponding to wdr:ResourceSet, and renamed into wdr:URISet) now 
denotes a set of URIs, and not a set of resources.

I think this solves the issue above. In fact, if DR Descriptors are 
nested into wdr:URISet, they would denote the content/characteristics of 
a set of URIs, and not of the set of resources having a URI satisfying 
the resource set definition.

So, referring to one of your examples, the following DR-S:

<wdr:DR rdf:ID="DR_1">
   <wdr:hasScope rdf:parseType="Resource">
     <wdr:includeHosts>example.org</wdr:includeHosts>
     <wdr:hasDescriptors rdf:parseType="Resource">
       <ex:colour>red</ex:colour>
       <ex:shape>square</ex:shape>
     </wdr:hasDescriptors>
   </wdr:hasScope>
</wdr:DR>

should be rewritten:

<wdr:DR rdf:ID="DR_1">
   <wdr:hasScope rdf:parseType="Resource">
     <wdr:includeHosts>example.org</wdr:includeHosts>
   </wdr:hasScope>
   <wdr:hasDescriptors rdf:parseType="Resource">
     <ex:colour>red</ex:colour>
     <ex:shape>square</ex:shape>
   </wdr:hasDescriptors>
</wdr:DR>

Similarly, the following DR-O:

<DR xmlns="http://www.w3.org/2007/05/powder#"
     xmlns:ex="http://example.org/vocab#">
   <maker>http://authority.example.org/foaf.rdf#me</maker>
   <issued>2007-12-14</issued>
   <validFrom>2008-01-01</validFrom>
   <validUntil>2008-12-31</validUntil>
   <URISet>
     <includeHosts>example.org</includeHosts>
     <Descriptors ex:colour="red" ex:shape="square" />
   </URISet>
</DR>

should be:

<DR xmlns="http://www.w3.org/2007/05/powder#"
     xmlns:ex="http://example.org/vocab#">
   <maker>http://authority.example.org/foaf.rdf#me</maker>
   <issued>2007-12-14</issued>
   <validFrom>2008-01-01</validFrom>
   <validUntil>2008-12-31</validUntil>
   <URISet>
     <includeHosts>example.org</includeHosts>
   </URISet>
   <Descriptors ex:colour="red" ex:shape="square" />
</DR>



Andrea

--------
[1]http://lists.w3.org/Archives/Public/public-powderwg/2007Dec/0045


Phil Archer wrote:
> 
> As a follow up, I've tried expressing the same 4 examples in just XML - 
> see what you think.
> 
> Example 1
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o1.rdf
> XML only: http://www.fosi.org/projects/powder/dr-o1.xml
> 
> Which is short and sweet
> 
> <DR xmlns="http://www.w3.org/2007/05/powder#"
>     xmlns:ex="http://example.org/vocab#">
> 
>   <maker>http://authority.example.org/foaf.rdf#me</maker>
>   <issued>2007-12-14</issued>
>   <validFrom>2008-01-01</validFrom>
>   <validUntil>2008-12-31</validUntil>
> 
>   <URISet>
>     <includeHosts>example.org</includeHosts>
>     <Descriptors ex:colour="red" ex:shape="square" />
>   </URISet>
> </DR>
> 
> The first 4 lines of data would need to be transformed into other 
> namespaces - maker becomes <> foaf:maker 
> http://authority.example.org/foaf.rdf#me for example.
> 
> Then we get to the URI Set which encloses the description which, in this 
> case, comprises two simple properties/literal values - we can express 
> those as XML attributes.
> 
> How about the site that is generally be blue but the /foo area is red? I 
> reckon there are 2 ways to do this in XML
> 
> Example 2
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o2.rdf
> XML Only: http://www.fosi.org/projects/powder/dr-o2.xml
> OR: http://www.fosi.org/projects/powder/dr-o2a.xml
> 
> The first of these has two separate URI sets:
> 
> <URISet>
>   <includeHosts>example.org</includeHosts>
>   <includePathStartsWith>/foo</includePathStartsWith>
>   <Descriptors ex:colour="red" />
> </URISet>
> 
> <URISet>
>   <includeHosts>example.org</includeHosts>
>   <Descriptors ex:colour="blue" />
> </URISet>
> 
> Whilst the second makes it more explicit that the /foo section is a sub 
> set of the overall example.org set thus:
> 
> <URISet>
>   <includeHosts>example.org</includeHosts>
>   <Descriptors ex:colour="blue">
> 
>     <URISet>
>       <includePathStartsWith>/foo</includePathStartsWith>
>       <Descriptors ex:colour="red" />
>     </URISet>
> 
>   </Descriptors
> </URISet>
> 
> This is akin to previous work done in this area by Jo Rabin and Mark 
> Nottingham [2, 3]. As before, GRDDL has a lot to do here (it still needs 
> to work out that the URI set for the blue resources is those on 
> example.org where the path does *not* start with /foo).
> 
> Adding identifiers to the URISets is easy enough, as example 3 shows:
> 
> Example 3
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o3.rdf
> XML: http://www.fosi.org/projects/powder/dr-o3.xml
> 
> But the problem posed by example 4 doesn't go away. In this example, we 
> have two identical URI sets with different descriptions. Which one is 
> applied is conditional on which one the described resource links to.
> 
> Example 4
> =========
> 
> RDF/XML: http://www.fosi.org/projects/powder/dr-o4.rdf
> XML: http://www.fosi.org/projects/powder/dr-o4.xml
> 
> So each of the examples in my previous mail can be written in simpler 
> XML. Operationally, one would expect content providers and labelling 
> authorities alike to use the simpler XML version and the GRDDL transform 
> to a full RDF/OWL DR-S to be useful in situations where reasoning was 
> required or where an application already made use of semantics. But what 
> are the pros and cons?
> 
> In favour of XML only for DR-O:
>  - it does not convey potentially misleading data. It's just flat XML 
> that you can GRDDL (transform) if you wish to.
>  - it is more compact
>  - DO-O and DR-S would have different MIME types
>  - GRDDL can't work with RDF/XML
>  - We'd keep the "Semantic Web is a waste of time" merchants happy.
> 
> In favour of RDF/XML:
>  - The argument that seemed to carry this when we discussed it (I can't 
> find it in the minutes) was that having two different file and MIME 
> types would lead to confusion (everyone would get it wrong - goodness 
> knows MIME types are often wrongly declared).
> - There is more flexibility in what can be expressed in RDF/XML, 
> particularly in terms of the descriptions.
> - A DR-O is closer to a DR-S so you get into the Sem Web mindset from 
> the start, rather than it being seen 9by some surely) as an add-on.
> - We'd risk upsetting the Semantic Web merchants and, potentially, seen 
> to be undermining the SW activity as a whole.
> - I can think of at least one person (the Bard of Chiswick) who would 
> crow for years.
> 
> I'm not convinced about the first of these arguments against. Day to 
> day, it's going to be XML that gets used so seeing the RDF MIME type 
> would probably be unusual. I've just tried to create an example to show 
> the second one without success so I'm not convinced of that one either! 
> The rest, to be honest, are space fillers.
> 
> If - *if* - we can create a nice XSLT that goes from a DR written in XML 
> to one written in OWL, then personally I'm sold.
> 
> Phil.
> 
> 
> [1] http://lists.w3.org/Archives/Public/public-powderwg/2008Jan/0018.html
> [2] http://www.w3.org/2005/Incubator/wcl/matching.html
> [3] http://www.w3.org/TR/urispace
> 
> 
> Phil Archer wrote:
>>
>> As promised on Friday afternoon, here are some musings on a possible 
>> new structure for Operational POWDER (POWDER-O), taking into account 
>> the recent discussion.
>>
>>
>> Example 1
>> =========
>> RDF/XML: http://www.fosi.org/projects/powder/dr-o1.rdf
>> Graph: http://www.fosi.org/projects/powder/servlet_63142.png
>>
>> Let's begin with a simple case, that all resources dereferenced from a 
>> URI with a host component ending with example.org are red and square 
>> (I got these line numbers from the RDF validator but I've knocked off 
>> the namespace declarations).
>>
>> 11:   <rdf:Description rdf:about="">
>> 12:     <foaf:maker 
>> rdf:resource="http://authority.example.org/foaf.rdf#me" />
>> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
>> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
>> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
>> 16:   </rdf:Description>
>> 17:
>> 18:   <wdr:DR rdf:ID="DR_1">
>> 19:     <wdr:hasScope rdf:parseType="Resource">
>> 20:       <wdr:includeHosts>example.org</wdr:includeHosts>
>> 21:       <wdr:hasDescriptors rdf:parseType="Resource">
>> 22:         <ex:colour>red</ex:colour>
>> 23:         <ex:shape>red</ex:shape>
>> 24:       </wdr:hasDescriptors>
>> 25:     </wdr:hasScope>
>> 26:   </wdr:DR>
>>
>> The document was made by "...#me" on 14th December 2007 and as valid 
>> for 2008. If you trust "...#me" and today's date is in 2008, then the 
>> DR in the document can be transformed into its semantic encoding 
>> (DR-S) and the RDF merged into your triple store for processing.
>>
>> The single DR in the document has:
>> 1. a URI set defined solely in terms of its host (example.org) and 
>> there are two descriptors from the ex namespace.
>>
>> Now let's make it a little more complicated and have one DR but two 
>> URI sets.
>>
>> Example 2
>> =========
>> RDF/XML: http://www.fosi.org/projects/powder/dr-o2.rdf
>> Graph: http://www.fosi.org/projects/powder/servlet_63180.png
>>
>> 11:   <rdf:Description rdf:about="">
>> 12:     <foaf:maker 
>> rdf:resource="http://authority.example.org/foaf.rdf#me" />
>> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
>> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
>> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
>> 16:   </rdf:Description>
>> 17:
>> 18:   <wdr:DR rdf:ID="DR_1">
>> 19:
>> 20:     <wdr:hasScope rdf:parseType="Collection">
>> 21:
>> 22:       <wdr:URIset rdf:ID="URIset_1">
>> 23:         <wdr:includeHosts>example.org</wdr:includeHosts>
>> 24:         <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
>> 25:         <wdr:hasDescriptors rdf:parseType="Resource">
>> 26:           <ex:colour>red</ex:colour>
>> 27:         </wdr:hasDescriptors>
>> 28:         </wdr:URIset>
>> 29:
>> 30:        <wdr:URIset rdf:about="#URIset_2" />
>> 31:
>> 32:      </wdr:hasScope>
>> 33:
>> 34:   </wdr:DR>
>> 35:
>> 36:   <wdr:URIset rdf:ID="URIset_2">
>> 37:     <wdr:includeHosts>example.org</wdr:includeHosts>
>> 38:     <wdr:hasDescriptors rdf:parseType="Resource">
>> 39:       <ex:colour>blue</ex:colour>
>> 40:     </wdr:hasDescriptors>
>> 41:   </wdr:URIset>
>>
>> The attribution and validity information (lines 11 - 16) remain 
>> unchanged. Now though our DR contains two URIsets and two attendant 
>> descriptions. All resources identified by URIs with that have host 
>> components ending with example.org are blue except those where the 
>> path starts with /foo which are red.
>>
>> If the trust and validity information is to your satisfaction, then 
>> you can transform this operational data into two DR-S instances and 
>> merge the RDF. N.B. The transformation must contain data from both URI 
>> sets thus:
>>
>> <wdr:URISet rdf:ID="URISet_1">
>>   <owl:intersectionOf rdf:parseType="Collection">
>>     <owl:Restriction>
>>       <owl:onProperty rdf:resource="&wdr;includeHosts" />
>>       <owl:hasValue>example.org</owl:hasValue>
>>     </owl:Restriction>
>>     <owl:Restriction>
>>       *<owl:onProperty rdf:resource="&wdr;includePathStartsWith" />*
>>       <owl:hasValue>/foo</owl:hasValue>
>>     </owl:Restriction>
>>   </owl:intersectionOf>
>> </wdr:URISet>
>>
>> And
>>
>> <wdr:URISet rdf:ID="URISet_2">
>>   <owl:intersectionOf rdf:parseType="Collection">
>>     <owl:Restriction>
>>       <owl:onProperty rdf:resource="&wdr;includeHosts" />
>>       <owl:hasValue>example.org</owl:hasValue>
>>     </owl:Restriction>
>>     <owl:Restriction>
>>       *<owl:onProperty rdf:resource="&wdr;excludePathStartsWith" />*
>>       <owl:hasValue>/foo</owl:hasValue>
>>     </owl:Restriction>
>>   </owl:intersectionOf>
>> </wdr:URISet>
>>
>> The exclude path starts with property in URISet 2 is generated by:
>>
>> 1. Noting the defining features of URISet 1 (example.org and /foo)
>> 2. Noting the defining features of URISet 2 (example.org)
>> 3. Taking the inverse of those present in 1 but absent in 2. 
>> (example.org and NOT /foo)
>>
>> This is going to get complicated when there are n URISets in the 
>> sequence but hey, let's be optimistic...
>>
>> We can add further DRs into the document iff the validity information 
>> is the same. I won't copy it out here but such an example is available:
>>
>> Example 3
>> =========
>> RDF/XML: http://www.fosi.org/projects/powder/dr-o3.rdf
>> Graph: http://www.fosi.org/projects/powder/servlet_63202.png
>>
>> This says that everything on example.org is blue except things with a 
>> URI path starting with /foo which are red. Separately, everything on 
>> example.org is circular except things with a URI path starting with 
>> /bar which is square. All assertions are subject to the same 
>> validity/trust conditions. And you process the Collections within each 
>> of the DRs to get these results or, if you want to merge the RDF into 
>> your triple store, you perform the transformation to get the full 
>> OWL-based version.
>>
>> Two people providing different descriptions with different validity 
>> dates will need to publish separate RDF/XML instances.
>>
>> N.B.
>>
>> I have written these examples with the exceptional case written within 
>> the DR and the 'default' description (there's no other word for it) as 
>> a separate block. There is no difference in structure but it does 
>> reflect what I expect to be the workflow reality - we're describing 
>> everything in a given URI set as being like _this_ *except* things 
>> that have _these_ things in their URIs which are like _this_ instead. 
>> Less abstract: there is no sex, drugs or rock and roll on fosi.org 
>> _except_ where the URI path component begins with /associates where 
>> there might be. Operationally, you write the general case first and 
>> then worry about the exceptions in a separate thought process.
>>
>> All of the RDF/XML instances given so far have a single URI so that a 
>> Web site can include an identical link element pointing to such a file 
>> irrespective of whether the resource is red, blue, square or circular 
>> - the POWDER client will sort it out.
>>
>> But, where the Web site has no (discernible or usable) URI structure, 
>> this isn't good enough. We need to include a new conditional 
>> statement, linkFrom, that says that if you trust "...#me" and today's 
>> date is within range *and* the resource includes a link to a specific 
>> DR, _then_ it is valid.
>>
>> The next example shows this.
>>
>> Example 4
>> =========
>> RDF/XML: http://www.fosi.org/projects/powder/dr-o4.rdf
>> Graph: http://www.fosi.org/projects/powder/servlet_63230.png
>>
>> 11:   <rdf:Description rdf:about="">
>> 12:     <foaf:maker 
>> rdf:resource="http://authority.example.org/foaf.rdf#me" />
>> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
>> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
>> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
>> 16:     <wdr:linkFrom>true</wdr:linkFrom>
>> 17:   </rdf:Description>
>> 18:
>> 19:   <wdr:DR rdf:ID="DR_1">
>> 20:     <wdr:hasScope rdf:parseType="Resource">
>> 21:       <wdr:includeHosts>example.org</wdr:includeHosts>
>> 22:       <wdr:hasDescriptors rdf:parseType="Resource">
>> 23:         <ex:texture>smooth</ex:texture>
>> 24:       </wdr:hasDescriptors>
>> 25:     </wdr:hasScope>
>> 26:   </wdr:DR>
>> 27:
>> 28:
>> 29:   <wdr:DR rdf:ID="DR_2">
>> 30:     <wdr:hasScope rdf:parseType="Resource">
>> 31:       <wdr:includeHosts>example.org</wdr:includeHosts>
>> 32:       <wdr:includePathStartsWith>/sawn</wdr:includePathStartsWith>
>> 33:       <wdr:hasDescriptors rdf:parseType="Resource">
>> 34:         <ex:texture>rough</ex:texture>
>> 35:       </wdr:hasDescriptors>
>> 36:     </wdr:hasScope>
>> 37:   </wdr:DR>
>>
>> Notice line 16 which introduces the linkFrom element.
>>
>> Then the URI sets for each DR are identical - everything on 
>> example.org - you need to refer to the linkFrom element to decide 
>> which is applicable (actually, you'd probably define the URI set once 
>> in a separate block and refer to it from both DRs). I know this is a 
>> pain - deliberately publishing two sets of triples that say different 
>> things about the same subjects - but, as they say in Vladivostok, 
>> c'ést la Guerre.
>>
>> What about storing lots of DRs in a single RDF/XML instance? Well, 
>> it's clear that we can't. If you're a content provider and you need to 
>> have several DRs covering different domains of interest then you 
>> simply create multiple RDF/XML files and link as you need to. If 
>> you're a labelling authority you store your DRs in a database with a 
>> front end like, oh I dunno, http://repository.icra.org/label?id=1 :-) 
>> which calls, I mean, would call a script that would return a single 
>> RDF/XML instance.
>>
>> This might mean we re-visit the issue of whether we want to put a hint 
>> in the link element as to what vocabularies are used in a given DR.
>>
>> If this is along the right lines then it seems to me we need to 
>> revisit the question of whether a DR-O is written in RDF/XML, just XML 
>> or something in between. All the examples above are valid RDF/XML but 
>> we have not got rid of the problem of generic RDF tools sucking in the 
>> triples and trying to make sense of them out of context. Personally 
>> I'm tending towards DR-Os being written in XML with only DR-Ss in RDF. 
>> It seems that GRDDL cannot transform RDF/XML and, I think I'm right in 
>> saying, that XSLT will have difficulty too.
>>
>> I'll do some more playing around with that now...
>>
>> Phil.
>>
>>
>>
>> Phil Archer wrote:
>>>
>>> Good, this feels as if we're making progress (or rather, you're 
>>> making progress in a promising direction :-)).
>>>
>>> I'll do some more playing around on Monday morning and see if I come 
>>> up against anything we're missing.
>>>
>>> Have a good weekend and thank you.
>>>
>>> Phil.
>>>
>>> Jeremy Carroll wrote:
>>>>
>>>> Phil Archer wrote:
>>>>>
>>>>>
>>>>> Jeremy Carroll wrote:
>>>>> [snip]
>>>>>>
>>>>>> If we choose to make the GRDDL transform make the DR-S include the 
>>>>>> subClassOf relationship as above, then we have the issue that in a 
>>>>>> package (or any collection of DRs) some of the DRs may be valid 
>>>>>> and some may be invalid, and all the subClassOf relationships are 
>>>>>> in the same file, and it is unclear how to distinguish the ones we 
>>>>>> want to claim (the valid ones), from the ones we don't (the 
>>>>>> invalid ones).
>>>>>
>>>>> I take this point. It may be that we can do something about it 
>>>>> though. We have so far taken the view that a DR should be 
>>>>> self-contained and that a package is therefore a group of 
>>>>> self-contained units. Doing this means that the validity 
>>>>> information (and attribution) is NOT inherited by DRs in the 
>>>>> package. However... we then had to introduce the idea of using 
>>>>> dcterms:isPartOf to force the processing of these "discrete DRs" in 
>>>>> a particular order [1]. In such a scenario, yes, each DR would have 
>>>>> its own validity and attribution.
>>>>>
>>>>> But it doesn't have to be this way...
>>>>>
>>>>> It would be possible I think to work with the package carrying the 
>>>>> validity information that was then inherited by the DRs within that 
>>>>> package - which I think from what you say would make life easier?
>>>>>
>>>>>
>>>>
>>>>
>>>> Yes - I was thinking along these lines.
>>>>
>>>> I was discussing this with Stuart - a possible view is then:
>>>>
>>>> The unit of a POWDER description is a document, which may contain a 
>>>> single wdr:DR or a single wdr:Package.
>>>>
>>>> Either way the document has information pertinent to the relevance 
>>>> of the document:
>>>> e.g. validity and who vouches for it.
>>>>
>>>> Operationally the process of trust is as follows:
>>>>
>>>> for each possible document that you might be considering, you read 
>>>> that document, then understanding what that document says about 
>>>> itself, if you are satisfied that you want to act on that document 
>>>> (e.g. it is valid, and is vouched for by an appropriate authority), 
>>>> then you load it into your knowledge base (formally corresponding to 
>>>> an RDF merge using the POWDER-S GRDDL result)
>>>>
>>>> The resulting RDF graph consists of only valid POWDER DRs, which 
>>>> have been vouched for by appropriate authorities.
>>>>
>>>>  From the formal side the motivations for doing this way are:
>>>> - it is known that temporal logics (i.e. dealing with time in a 
>>>> logical way) is a hard problem
>>>> - it is known that dealing with trust in logic is a hard problem
>>>> - it is clear that POWDER deals with both time and trust, but in 
>>>> simple ways
>>>> - hence it feels inappropriate to do the time and trust parts in the 
>>>> formal logical layer, but to deal with them in a pragmatic layer 
>>>> prior to, but informed by, the logical treatment
>>>>
>>>> It is a limitation of the current RDF technology that it is hard to 
>>>> talk about part of an RDF graph, and its validity, or who vouches 
>>>> for that part - hence the desire to talk about documents containing 
>>>> RDF/XML that expresses those parts of the graph.
>>>> I think it is possible to design documents of the 'right' size so that:
>>>>
>>>> - validity and vouching are pertinent on a document by document 
>>>> level (and not on a finer grain)
>>>> - documents are large enough that the small scale powder user need 
>>>> only write one document for their site, or maybe two.
>>>>
>>>> - that the expectations on large publishers who may need to make 
>>>> declarations that fit into complex workflows are intelligible and 
>>>> not too burdensome.
>>>>
>>>> Jeremy
Received on Wednesday, 30 January 2008 08:54:30 UTC