Re: Possible New Model (was Re: status report - formal layer) from Phil Archer on 2008-01-21 (public-powderwg@w3.org from January 2008)

From: Phil Archer <parcher@icra.org>
Date: Mon, 21 Jan 2008 17:22:45 +0000
To: public-powderwg@w3.org
CC: Jeremy Carroll <jjc@hpl.hp.com>
Message-ID: <4794D4E5.4050200@icra.org>
As a follow up, I've tried expressing the same 4 examples in just XML - 
see what you think.

Example 1
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o1.rdf
XML only: http://www.fosi.org/projects/powder/dr-o1.xml

Which is short and sweet

<DR xmlns="http://www.w3.org/2007/05/powder#"
     xmlns:ex="http://example.org/vocab#">

   <maker>http://authority.example.org/foaf.rdf#me</maker>
   <issued>2007-12-14</issued>
   <validFrom>2008-01-01</validFrom>
   <validUntil>2008-12-31</validUntil>

   <URISet>
     <includeHosts>example.org</includeHosts>
     <Descriptors ex:colour="red" ex:shape="square" />
   </URISet>
</DR>

The first 4 lines of data would need to be transformed into other 
namespaces - maker becomes <> foaf:maker 
http://authority.example.org/foaf.rdf#me for example.

Then we get to the URI Set which encloses the description which, in this 
case, comprises two simple properties/literal values - we can express 
those as XML attributes.

How about the site that is generally be blue but the /foo area is red? I 
reckon there are 2 ways to do this in XML

Example 2
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o2.rdf
XML Only: http://www.fosi.org/projects/powder/dr-o2.xml
OR: http://www.fosi.org/projects/powder/dr-o2a.xml

The first of these has two separate URI sets:

<URISet>
   <includeHosts>example.org</includeHosts>
   <includePathStartsWith>/foo</includePathStartsWith>
   <Descriptors ex:colour="red" />
</URISet>

<URISet>
   <includeHosts>example.org</includeHosts>
   <Descriptors ex:colour="blue" />
</URISet>

Whilst the second makes it more explicit that the /foo section is a sub 
set of the overall example.org set thus:

<URISet>
   <includeHosts>example.org</includeHosts>
   <Descriptors ex:colour="blue">

     <URISet>
       <includePathStartsWith>/foo</includePathStartsWith>
       <Descriptors ex:colour="red" />
     </URISet>

   </Descriptors
</URISet>

This is akin to previous work done in this area by Jo Rabin and Mark 
Nottingham [2, 3]. As before, GRDDL has a lot to do here (it still needs 
to work out that the URI set for the blue resources is those on 
example.org where the path does *not* start with /foo).

Adding identifiers to the URISets is easy enough, as example 3 shows:

Example 3
=========
RDF/XML: http://www.fosi.org/projects/powder/dr-o3.rdf
XML: http://www.fosi.org/projects/powder/dr-o3.xml

But the problem posed by example 4 doesn't go away. In this example, we 
have two identical URI sets with different descriptions. Which one is 
applied is conditional on which one the described resource links to.

Example 4
=========

RDF/XML: http://www.fosi.org/projects/powder/dr-o4.rdf
XML: http://www.fosi.org/projects/powder/dr-o4.xml

So each of the examples in my previous mail can be written in simpler 
XML. Operationally, one would expect content providers and labelling 
authorities alike to use the simpler XML version and the GRDDL transform 
to a full RDF/OWL DR-S to be useful in situations where reasoning was 
required or where an application already made use of semantics. But what 
are the pros and cons?

In favour of XML only for DR-O:
  - it does not convey potentially misleading data. It's just flat XML 
that you can GRDDL (transform) if you wish to.
  - it is more compact
  - DO-O and DR-S would have different MIME types
  - GRDDL can't work with RDF/XML
  - We'd keep the "Semantic Web is a waste of time" merchants happy.

In favour of RDF/XML:
  - The argument that seemed to carry this when we discussed it (I can't 
find it in the minutes) was that having two different file and MIME 
types would lead to confusion (everyone would get it wrong - goodness 
knows MIME types are often wrongly declared).
- There is more flexibility in what can be expressed in RDF/XML, 
particularly in terms of the descriptions.
- A DR-O is closer to a DR-S so you get into the Sem Web mindset from 
the start, rather than it being seen 9by some surely) as an add-on.
- We'd risk upsetting the Semantic Web merchants and, potentially, seen 
to be undermining the SW activity as a whole.
- I can think of at least one person (the Bard of Chiswick) who would 
crow for years.

I'm not convinced about the first of these arguments against. Day to 
day, it's going to be XML that gets used so seeing the RDF MIME type 
would probably be unusual. I've just tried to create an example to show 
the second one without success so I'm not convinced of that one either! 
The rest, to be honest, are space fillers.

If - *if* - we can create a nice XSLT that goes from a DR written in XML 
to one written in OWL, then personally I'm sold.

Phil.


[1] http://lists.w3.org/Archives/Public/public-powderwg/2008Jan/0018.html
[2] http://www.w3.org/2005/Incubator/wcl/matching.html
[3] http://www.w3.org/TR/urispace


Phil Archer wrote:
> 
> As promised on Friday afternoon, here are some musings on a possible new 
> structure for Operational POWDER (POWDER-O), taking into account the 
> recent discussion.
> 
> 
> Example 1
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o1.rdf
> Graph: http://www.fosi.org/projects/powder/servlet_63142.png
> 
> Let's begin with a simple case, that all resources dereferenced from a 
> URI with a host component ending with example.org are red and square (I 
> got these line numbers from the RDF validator but I've knocked off the 
> namespace declarations).
> 
> 11:   <rdf:Description rdf:about="">
> 12:     <foaf:maker 
> rdf:resource="http://authority.example.org/foaf.rdf#me" />
> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
> 16:   </rdf:Description>
> 17:
> 18:   <wdr:DR rdf:ID="DR_1">
> 19:     <wdr:hasScope rdf:parseType="Resource">
> 20:       <wdr:includeHosts>example.org</wdr:includeHosts>
> 21:       <wdr:hasDescriptors rdf:parseType="Resource">
> 22:         <ex:colour>red</ex:colour>
> 23:         <ex:shape>red</ex:shape>
> 24:       </wdr:hasDescriptors>
> 25:     </wdr:hasScope>
> 26:   </wdr:DR>
> 
> The document was made by "...#me" on 14th December 2007 and as valid for 
> 2008. If you trust "...#me" and today's date is in 2008, then the DR in 
> the document can be transformed into its semantic encoding (DR-S) and 
> the RDF merged into your triple store for processing.
> 
> The single DR in the document has:
> 1. a URI set defined solely in terms of its host (example.org) and there 
> are two descriptors from the ex namespace.
> 
> Now let's make it a little more complicated and have one DR but two URI 
> sets.
> 
> Example 2
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o2.rdf
> Graph: http://www.fosi.org/projects/powder/servlet_63180.png
> 
> 11:   <rdf:Description rdf:about="">
> 12:     <foaf:maker 
> rdf:resource="http://authority.example.org/foaf.rdf#me" />
> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
> 16:   </rdf:Description>
> 17:
> 18:   <wdr:DR rdf:ID="DR_1">
> 19:
> 20:     <wdr:hasScope rdf:parseType="Collection">
> 21:
> 22:       <wdr:URIset rdf:ID="URIset_1">
> 23:         <wdr:includeHosts>example.org</wdr:includeHosts>
> 24:         <wdr:includePathStartsWith>/foo</wdr:includePathStartsWith>
> 25:         <wdr:hasDescriptors rdf:parseType="Resource">
> 26:           <ex:colour>red</ex:colour>
> 27:         </wdr:hasDescriptors>
> 28:         </wdr:URIset>
> 29:
> 30:        <wdr:URIset rdf:about="#URIset_2" />
> 31:
> 32:      </wdr:hasScope>
> 33:
> 34:   </wdr:DR>
> 35:
> 36:   <wdr:URIset rdf:ID="URIset_2">
> 37:     <wdr:includeHosts>example.org</wdr:includeHosts>
> 38:     <wdr:hasDescriptors rdf:parseType="Resource">
> 39:       <ex:colour>blue</ex:colour>
> 40:     </wdr:hasDescriptors>
> 41:   </wdr:URIset>
> 
> The attribution and validity information (lines 11 - 16) remain 
> unchanged. Now though our DR contains two URIsets and two attendant 
> descriptions. All resources identified by URIs with that have host 
> components ending with example.org are blue except those where the path 
> starts with /foo which are red.
> 
> If the trust and validity information is to your satisfaction, then you 
> can transform this operational data into two DR-S instances and merge 
> the RDF. N.B. The transformation must contain data from both URI sets thus:
> 
> <wdr:URISet rdf:ID="URISet_1">
>   <owl:intersectionOf rdf:parseType="Collection">
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="&wdr;includeHosts" />
>       <owl:hasValue>example.org</owl:hasValue>
>     </owl:Restriction>
>     <owl:Restriction>
>       *<owl:onProperty rdf:resource="&wdr;includePathStartsWith" />*
>       <owl:hasValue>/foo</owl:hasValue>
>     </owl:Restriction>
>   </owl:intersectionOf>
> </wdr:URISet>
> 
> And
> 
> <wdr:URISet rdf:ID="URISet_2">
>   <owl:intersectionOf rdf:parseType="Collection">
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="&wdr;includeHosts" />
>       <owl:hasValue>example.org</owl:hasValue>
>     </owl:Restriction>
>     <owl:Restriction>
>       *<owl:onProperty rdf:resource="&wdr;excludePathStartsWith" />*
>       <owl:hasValue>/foo</owl:hasValue>
>     </owl:Restriction>
>   </owl:intersectionOf>
> </wdr:URISet>
> 
> The exclude path starts with property in URISet 2 is generated by:
> 
> 1. Noting the defining features of URISet 1 (example.org and /foo)
> 2. Noting the defining features of URISet 2 (example.org)
> 3. Taking the inverse of those present in 1 but absent in 2. 
> (example.org and NOT /foo)
> 
> This is going to get complicated when there are n URISets in the 
> sequence but hey, let's be optimistic...
> 
> We can add further DRs into the document iff the validity information is 
> the same. I won't copy it out here but such an example is available:
> 
> Example 3
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o3.rdf
> Graph: http://www.fosi.org/projects/powder/servlet_63202.png
> 
> This says that everything on example.org is blue except things with a 
> URI path starting with /foo which are red. Separately, everything on 
> example.org is circular except things with a URI path starting with /bar 
> which is square. All assertions are subject to the same validity/trust 
> conditions. And you process the Collections within each of the DRs to 
> get these results or, if you want to merge the RDF into your triple 
> store, you perform the transformation to get the full OWL-based version.
> 
> Two people providing different descriptions with different validity 
> dates will need to publish separate RDF/XML instances.
> 
> N.B.
> 
> I have written these examples with the exceptional case written within 
> the DR and the 'default' description (there's no other word for it) as a 
> separate block. There is no difference in structure but it does reflect 
> what I expect to be the workflow reality - we're describing everything 
> in a given URI set as being like _this_ *except* things that have 
> _these_ things in their URIs which are like _this_ instead. Less 
> abstract: there is no sex, drugs or rock and roll on fosi.org _except_ 
> where the URI path component begins with /associates where there might 
> be. Operationally, you write the general case first and then worry about 
> the exceptions in a separate thought process.
> 
> All of the RDF/XML instances given so far have a single URI so that a 
> Web site can include an identical link element pointing to such a file 
> irrespective of whether the resource is red, blue, square or circular - 
> the POWDER client will sort it out.
> 
> But, where the Web site has no (discernible or usable) URI structure, 
> this isn't good enough. We need to include a new conditional statement, 
> linkFrom, that says that if you trust "...#me" and today's date is 
> within range *and* the resource includes a link to a specific DR, _then_ 
> it is valid.
> 
> The next example shows this.
> 
> Example 4
> =========
> RDF/XML: http://www.fosi.org/projects/powder/dr-o4.rdf
> Graph: http://www.fosi.org/projects/powder/servlet_63230.png
> 
> 11:   <rdf:Description rdf:about="">
> 12:     <foaf:maker 
> rdf:resource="http://authority.example.org/foaf.rdf#me" />
> 13:     <dcterms:issued>2007-12-14</dcterms:issued>
> 14:     <wdr:validFrom>2008-01-01</wdr:validFrom>
> 15:     <wdr:validUntil>2008-12-31</wdr:validUntil>
> 16:     <wdr:linkFrom>true</wdr:linkFrom>
> 17:   </rdf:Description>
> 18:
> 19:   <wdr:DR rdf:ID="DR_1">
> 20:     <wdr:hasScope rdf:parseType="Resource">
> 21:       <wdr:includeHosts>example.org</wdr:includeHosts>
> 22:       <wdr:hasDescriptors rdf:parseType="Resource">
> 23:         <ex:texture>smooth</ex:texture>
> 24:       </wdr:hasDescriptors>
> 25:     </wdr:hasScope>
> 26:   </wdr:DR>
> 27:
> 28:
> 29:   <wdr:DR rdf:ID="DR_2">
> 30:     <wdr:hasScope rdf:parseType="Resource">
> 31:       <wdr:includeHosts>example.org</wdr:includeHosts>
> 32:       <wdr:includePathStartsWith>/sawn</wdr:includePathStartsWith>
> 33:       <wdr:hasDescriptors rdf:parseType="Resource">
> 34:         <ex:texture>rough</ex:texture>
> 35:       </wdr:hasDescriptors>
> 36:     </wdr:hasScope>
> 37:   </wdr:DR>
> 
> Notice line 16 which introduces the linkFrom element.
> 
> Then the URI sets for each DR are identical - everything on example.org 
> - you need to refer to the linkFrom element to decide which is 
> applicable (actually, you'd probably define the URI set once in a 
> separate block and refer to it from both DRs). I know this is a pain - 
> deliberately publishing two sets of triples that say different things 
> about the same subjects - but, as they say in Vladivostok, c'ést la Guerre.
> 
> What about storing lots of DRs in a single RDF/XML instance? Well, it's 
> clear that we can't. If you're a content provider and you need to have 
> several DRs covering different domains of interest then you simply 
> create multiple RDF/XML files and link as you need to. If you're a 
> labelling authority you store your DRs in a database with a front end 
> like, oh I dunno, http://repository.icra.org/label?id=1 :-) which calls, 
> I mean, would call a script that would return a single RDF/XML instance.
> 
> This might mean we re-visit the issue of whether we want to put a hint 
> in the link element as to what vocabularies are used in a given DR.
> 
> If this is along the right lines then it seems to me we need to revisit 
> the question of whether a DR-O is written in RDF/XML, just XML or 
> something in between. All the examples above are valid RDF/XML but we 
> have not got rid of the problem of generic RDF tools sucking in the 
> triples and trying to make sense of them out of context. Personally I'm 
> tending towards DR-Os being written in XML with only DR-Ss in RDF. It 
> seems that GRDDL cannot transform RDF/XML and, I think I'm right in 
> saying, that XSLT will have difficulty too.
> 
> I'll do some more playing around with that now...
> 
> Phil.
> 
> 
> 
> Phil Archer wrote:
>>
>> Good, this feels as if we're making progress (or rather, you're making 
>> progress in a promising direction :-)).
>>
>> I'll do some more playing around on Monday morning and see if I come 
>> up against anything we're missing.
>>
>> Have a good weekend and thank you.
>>
>> Phil.
>>
>> Jeremy Carroll wrote:
>>>
>>> Phil Archer wrote:
>>>>
>>>>
>>>> Jeremy Carroll wrote:
>>>> [snip]
>>>>>
>>>>> If we choose to make the GRDDL transform make the DR-S include the 
>>>>> subClassOf relationship as above, then we have the issue that in a 
>>>>> package (or any collection of DRs) some of the DRs may be valid and 
>>>>> some may be invalid, and all the subClassOf relationships are in 
>>>>> the same file, and it is unclear how to distinguish the ones we 
>>>>> want to claim (the valid ones), from the ones we don't (the invalid 
>>>>> ones).
>>>>
>>>> I take this point. It may be that we can do something about it 
>>>> though. We have so far taken the view that a DR should be 
>>>> self-contained and that a package is therefore a group of 
>>>> self-contained units. Doing this means that the validity information 
>>>> (and attribution) is NOT inherited by DRs in the package. However... 
>>>> we then had to introduce the idea of using dcterms:isPartOf to force 
>>>> the processing of these "discrete DRs" in a particular order [1]. In 
>>>> such a scenario, yes, each DR would have its own validity and 
>>>> attribution.
>>>>
>>>> But it doesn't have to be this way...
>>>>
>>>> It would be possible I think to work with the package carrying the 
>>>> validity information that was then inherited by the DRs within that 
>>>> package - which I think from what you say would make life easier?
>>>>
>>>>
>>>
>>>
>>> Yes - I was thinking along these lines.
>>>
>>> I was discussing this with Stuart - a possible view is then:
>>>
>>> The unit of a POWDER description is a document, which may contain a 
>>> single wdr:DR or a single wdr:Package.
>>>
>>> Either way the document has information pertinent to the relevance of 
>>> the document:
>>> e.g. validity and who vouches for it.
>>>
>>> Operationally the process of trust is as follows:
>>>
>>> for each possible document that you might be considering, you read 
>>> that document, then understanding what that document says about 
>>> itself, if you are satisfied that you want to act on that document 
>>> (e.g. it is valid, and is vouched for by an appropriate authority), 
>>> then you load it into your knowledge base (formally corresponding to 
>>> an RDF merge using the POWDER-S GRDDL result)
>>>
>>> The resulting RDF graph consists of only valid POWDER DRs, which have 
>>> been vouched for by appropriate authorities.
>>>
>>>  From the formal side the motivations for doing this way are:
>>> - it is known that temporal logics (i.e. dealing with time in a 
>>> logical way) is a hard problem
>>> - it is known that dealing with trust in logic is a hard problem
>>> - it is clear that POWDER deals with both time and trust, but in 
>>> simple ways
>>> - hence it feels inappropriate to do the time and trust parts in the 
>>> formal logical layer, but to deal with them in a pragmatic layer 
>>> prior to, but informed by, the logical treatment
>>>
>>> It is a limitation of the current RDF technology that it is hard to 
>>> talk about part of an RDF graph, and its validity, or who vouches for 
>>> that part - hence the desire to talk about documents containing 
>>> RDF/XML that expresses those parts of the graph.
>>> I think it is possible to design documents of the 'right' size so that:
>>>
>>> - validity and vouching are pertinent on a document by document level 
>>> (and not on a finer grain)
>>> - documents are large enough that the small scale powder user need 
>>> only write one document for their site, or maybe two.
>>>
>>> - that the expectations on large publishers who may need to make 
>>> declarations that fit into complex workflows are intelligible and not 
>>> too burdensome.
>>>
>>> Jeremy
>>>
> 
> 
> 
> 

-- 
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
w. http://www.fosi.org/people/philarcher/
Received on Monday, 21 January 2008 17:23:12 UTC