- From: Dan Brickley <danbri@danbri.org>
- Date: Thu, 1 Jun 2006 10:01:03 -0400
- To: public-xg-wcl@w3.org
- Cc: dd@w3.org
(bcc:'ing the XG's Member list; followups to the public list please ('cos I want to find this in Google searches in the future...)) I wrote this late last year, as part of ERCIM and W3C Europe's contribution to the Quatro EU project. It wasn't widely circulated at the time. Posting it here un-edited, as a contribution to discussion of RDF-CL scope and expressivity analysis. I am picking up some of these themes again via involvement in the www.medieq.org EU project. Note that when I wrote the draft below, the RIF WG hadn't formed (nor had this XG). cheers, Dan > >Quatro label format - architectural review > >This document provides a brief overview of the design issues >and architectural tradeoffs made in the Quatro content labelling >data format. > >Quatro content labels are positioned on a migration path from >W3C's original content-labelling system, PICS, through simple >RDF descriptions, and from there, to the more sophisticated >capabilities of logical rule languages. > >The essence of the Quatro labelling approach is a re-expression >of the PICS labeling system in RDF. PICS technology is dated and >suffered from poor deployment levels. > >The motivation for re-expression of PICS-style content labels is, >broadly, to leverage the ongoing work in the RDF world: > > - use of generic RDF-based standards > - schema and ontology languages (RDFS/OWL) > - query language (SPARQL) > - rule languages (RIF - newly under development at W3C) > - ability to mix labeling data with other RDF data > - label instances can also contain Dublin Core, RSS, Creative Commons, > FOAF, PRISM, RSS-Media, ... > - labels can be merged into RDF databases that contain > other relevant info about mentioned pages. > > >Since RDF was itself designed as "PICS next generation", we might ask: >why is there a need for any particular PICS-like RDF structures such >as those used in Quatro labels? Shouldn't each PICS scheme (Quatro, >MedPICS/HIDDEL, >etc) simply be remodelled as an RDF vocabulary (ie. schema/ontology) >using >RDFS or OWL? > >The answer here is key to situating the current Quatro label format on >the >migration path from original PICS, through basic RDF, into full >rule-based >Semantic Web formalisms (eg. OWL and RIF). The core issue here is the >ability to express *generalisations* that apply to multiple pages. > >The original PICS standard included a capability that, until recently, >has >not been well addressed in the RDF technology stack. In PICS, it was >possible to express generalisation such as: > >"Each page whose URI begins 'http://playboy.com/images/' has an > sv:rudeness property whose value is sv:VERYRUDE". > >The abstract RDF graph model has no such capability. > >The original RDF *syntax* (Feb 1999 Recommendation of W3C) did >have a syntactic expression of this notion, the rdf:aboutEachPrefix >construction. Unfortunately, this aspect of the original RDF design >was widely considered to be flawed, not usefully implementable (since >it was defined solely over syntactic constructs). The RDF Core >Working Group removed rdf:aboutEachPrefix from the language; RDF >in the 2004 edition recommendations no longer contains this construct. > >Meanwhile, W3C did continue to improve RDF's ability to express other >forms of generalisation. The Web Ontology Language, OWL (again completed >in 2004) provides a sophisticated framework for expressing >generalisations >about classes of thing: it can for example express: > >"All things that are a Person and have a workplaceHomepage of >http://www.w3.org/ are a W3CStaffPerson. All W3CStaffPersons are >beautiful." > >It cannot, however, express complex rules that involve "variables" >or intermediate entities. So it can't say "All things that have a >URI which begins with the string "http://www.playboy.com/images/ are >RudeDocuments". So, unfortunately, even the expressive power of >OWL (in its various dialects) does not quite capture the capabilities >of PICS. > >The Quatro labelling approach was designed just after OWL was finished, >and as work was beginning on SPARQL, the new W3C language for querying >RDF. >It was designed in anticipation of future work on a RDF-friendly Rules >language, work which (at time of writing, late 2005) is just beginning. > >Quatro labels encode a data structure which carries the basic >information >that could be modelled in old-style PICS: simple categories, attached >either >to a document URI, or to a representation of a regular expression >against >such URIs. This allows such a label to be used to exchange information >broadly equivalent to original PICS labels. > >As outlined above, this is a progression from PICS, because such data >structures can be encoded in RDF/XML (rather than in PICS syntax), >and mixed closely with other RDF data. The mixing can occur either >within >a document, ie. a block of Quatro labelling data, right alongside some >Dublin Core. Or the mixing can occur in a database system, perhaps >exposed >as a Web Service using the SPARQL query language and protocol. > > >The are some important limitations associated with the current design >that must be understood, to ensure accurate use of the format. > >The URI-based generalisation capabilities of Quatro labels are most >properly only usable with what we might call the PICS-like-idiom. We can >use Quatro to construct pieces of RDF that say (when exchanged between >Quatro-aware systems) that some category/value applies to a particular >document or URI-regex-indicated class of documents. It is, in current >design, less successful when trying to use arbitrary non-Quatro-oriented >RDF vocabularies with the URI-regex construction. In other words, it >doesn't quite work for saying things like: > > 'all documents that have a URI which begins with the string > http://danbri.org/ are things that have a dc:creator whose > foaf:name is "Dan Brickley"'. > >If used carefully (eg. for exchange between systems that share >additional assumptions about the representation used), Quatro labels >can carry this information, but strictly speaking the approach >doesn't fit with the formal semantics of RDF. > >The design discussions which led to the current Quatro approach >[ref Vodaphone meeting with phil, kal, daniel, danbri et al] considered >several possible alternative designs. Unfortunately each such design >(namely: use of RDF reification vocabulary; use of an alternate RDF >reification vocabulary; quoting of RDF/XML using XML escaping; use of >multiple files) carried a major syntactic overhead for content creators >and users. The decision was therefore made to limit expressivity in >version 1 of the format, to hasten adoption. > >For expressivity in this area to be improved, without major impact >on deployability of the syntax, a radically new approach will be needed. >The problem is that we are trying to express complex "templated" claims >using RDF vocabulary, and to embed those hypothetical >templates within "top level" RDF graphs. Fortunately W3C has now >initiated new work in this area (Rule Interchange Format - RIF). It is >anticipated that any elaboration of Quatro labelling to improve its >expressivity in this area will likely be conducted in the context >of RIF. It is also likely that the RIF WGs deliverables will provide a >format capable of expressing (to general purpose Semantic Web tools) the >semantics currently encoded in plain RDF/XML within Quatro labels. This, >if verified, will prove a useful evaluation mechanism and deployment >environment for the Quatro work. > > > >Todo: > - section headings, examples, diagram > - test cases > - refs to specs > >
Received on Thursday, 1 June 2006 14:01:14 UTC