(old) Notes on RDF-CL expressivity

(bcc:'ing the XG's Member list; followups to the public list please
('cos I want to find this in Google searches in the future...))

I wrote this late last year, as part of ERCIM and W3C Europe's
contribution to the Quatro EU project. It wasn't widely 
circulated at the time. Posting it here un-edited, as a contribution
to discussion of RDF-CL scope and expressivity analysis. I am 
picking up some of these themes again via involvement in the 
www.medieq.org EU project. Note that when I wrote the draft below,
the RIF WG hadn't formed (nor had this XG).

cheers,

Dan


>
>Quatro label format - architectural review
>
>This document provides a brief overview of the design issues
>and architectural tradeoffs made in the Quatro content labelling 
>data format.
>
>Quatro content labels are positioned on a migration path from 
>W3C's original content-labelling system, PICS, through simple 
>RDF descriptions, and from there, to the more sophisticated 
>capabilities of logical rule languages.
>
>The essence of the Quatro labelling approach is a re-expression
>of the PICS labeling system in RDF. PICS technology is dated and 
>suffered from poor deployment levels. 
>
>The motivation for re-expression of PICS-style content labels is, 
>broadly, to leverage the ongoing work in the RDF world:
>
> - use of generic RDF-based standards 
>  - schema and ontology languages (RDFS/OWL)
>  - query language (SPARQL)
>  - rule languages (RIF - newly under development at W3C)
> - ability to mix labeling data with other RDF data
>  - label instances can also contain Dublin Core, RSS, Creative Commons,
>    FOAF, PRISM, RSS-Media, ...
>  - labels can be merged into RDF databases that contain 
>    other relevant info about mentioned pages.
>
>
>Since RDF was itself designed as "PICS next generation", we might ask:
>why is there a need for any particular PICS-like RDF structures such 
>as those used in Quatro labels? Shouldn't each PICS scheme (Quatro,
>MedPICS/HIDDEL,
>etc) simply be remodelled as an RDF vocabulary (ie. schema/ontology)
>using
>RDFS or OWL? 
>
>The answer here is key to situating the current Quatro label format on
>the 
>migration path from original PICS, through basic RDF, into full
>rule-based 
>Semantic Web formalisms (eg. OWL and RIF). The core issue here is the 
>ability to express *generalisations* that apply to multiple pages. 
>
>The original PICS standard included a capability that, until recently,
>has
>not been well addressed in the RDF technology stack. In PICS, it was 
>possible to express generalisation such as:
>
>"Each page whose URI begins 'http://playboy.com/images/' has an
> sv:rudeness property whose value is sv:VERYRUDE".
>
>The abstract RDF graph model has no such capability.
>
>The original RDF *syntax* (Feb 1999 Recommendation of W3C) did 
>have a syntactic expression of this notion, the rdf:aboutEachPrefix 
>construction. Unfortunately, this aspect of the original RDF design
>was widely considered to be flawed, not usefully implementable (since
>it was defined solely over syntactic constructs). The RDF Core 
>Working Group removed rdf:aboutEachPrefix from the language; RDF 
>in the 2004 edition recommendations no longer contains this construct.
>
>Meanwhile, W3C did continue to improve RDF's ability to express other
>forms of generalisation. The Web Ontology Language, OWL (again completed
>in 2004) provides a sophisticated framework for expressing
>generalisations
>about classes of thing: it can for example express:
>
>"All things that are a Person and have a workplaceHomepage of 
>http://www.w3.org/ are a W3CStaffPerson. All W3CStaffPersons are 
>beautiful."
>
>It cannot, however, express complex rules that involve "variables"
>or intermediate entities. So it can't say "All things that have a 
>URI which begins with the string "http://www.playboy.com/images/ are 
>RudeDocuments". So, unfortunately, even the expressive power of 
>OWL (in its various dialects) does not quite capture the capabilities
>of PICS.
>
>The Quatro labelling approach was designed just after OWL was finished,
>and as work was beginning on SPARQL, the new W3C language for querying
>RDF.
>It was designed in anticipation of future work on a RDF-friendly Rules
>language, work which (at time of writing, late 2005) is just beginning.
>
>Quatro labels encode a data structure which carries the basic
>information
>that could be modelled in old-style PICS: simple categories, attached
>either
>to a document URI, or to a representation of a regular expression
>against
>such URIs. This allows such a label to be used to exchange information 
>broadly equivalent to original PICS labels. 
>
>As outlined above, this is a progression from PICS, because such data
>structures can be encoded in RDF/XML (rather than in PICS syntax), 
>and mixed closely with other RDF data. The mixing can occur either
>within 
>a document, ie. a block of Quatro labelling data, right alongside some 
>Dublin Core. Or the mixing can occur in a database system, perhaps
>exposed
>as a Web Service using the SPARQL query language and protocol.
>
>
>The are some important limitations associated with the current design
>that must be understood, to ensure accurate use of the format.
>
>The URI-based generalisation capabilities of Quatro labels are most 
>properly only usable with what we might call the PICS-like-idiom. We can 
>use Quatro to construct pieces of RDF that say (when exchanged between
>Quatro-aware systems) that some category/value applies to a particular 
>document or URI-regex-indicated class of documents. It is, in current 
>design, less successful when trying to use arbitrary non-Quatro-oriented
>RDF vocabularies with the URI-regex construction. In other words, it 
>doesn't quite work for saying things like:
>
>	'all documents that have a URI which begins with the string 
>	http://danbri.org/ are things that have a dc:creator whose 
>	foaf:name is "Dan Brickley"'.
>
>If used carefully (eg. for exchange between systems that share 
>additional assumptions about the representation used), Quatro labels
>can carry this information, but strictly speaking the approach 
>doesn't fit with the formal semantics of RDF.
>
>The design discussions which led to the current Quatro approach
>[ref Vodaphone meeting with phil, kal, daniel, danbri et al] considered
>several possible alternative designs. Unfortunately each such design
>(namely: use of RDF reification vocabulary; use of an alternate RDF 
>reification vocabulary; quoting of RDF/XML using XML escaping; use of 
>multiple files) carried a major syntactic overhead for content creators
>and users. The decision was therefore made to limit expressivity in 
>version 1 of the format, to hasten adoption. 
>
>For expressivity in this area to be improved, without major impact
>on deployability of the syntax, a radically new approach will be needed.
>The problem is that we are trying to express complex "templated" claims 
>using RDF vocabulary, and to embed those hypothetical 
>templates within "top level" RDF graphs. Fortunately W3C has now 
>initiated new work in this area (Rule Interchange Format - RIF). It is 
>anticipated that any elaboration of Quatro labelling to improve its 
>expressivity in this area will likely be conducted in the context
>of RIF. It is also likely that the RIF WGs deliverables will provide a 
>format capable of expressing (to general purpose Semantic Web tools) the
>semantics currently encoded in plain RDF/XML within Quatro labels. This,
>if verified, will prove a useful evaluation mechanism and deployment
>environment for the Quatro work.
>   
>
>
>Todo:
> - section headings, examples, diagram
> - test cases
> - refs to specs
>
>

Received on Thursday, 1 June 2006 14:01:14 UTC