Update 6/12/04 from Phil Archer on 2004-12-06 (public-quatro@w3.org from December 2004)

From: Phil Archer <phil.archer@icra.org>
Date: Mon, 6 Dec 2004 15:58:34 -0000
To: <public-quatro@w3.org>
Message-ID: <005801c4dbac$7536f3d0$53276551@PHILXP>
Dear all,

Following Kal's latest work and my visit to Japan, here's the current 
situation as I see it.

The key area for debate remains the application rules. Kal has produced a 
very robust set of these including working code to back it up[1]. From what 
I can see it does exactly what we have been working towards so far. Give it 
an RDF instance to look at and a URI, back comes a suitable chunk of RDF/XML 
with the relevant properties.

Eric Prud'hommeaux and I discussed using SPARQL to extract data. This seemed 
promising and there are ways it could be used but having seen Kal's demo app 
I'm not sure there's any distinct advantage in this, except that we'd be 
using a standard method (not a small exception I'll grant).

Eric and I also discussed the problem of overrides and this, I think, 
remains problematic.

Imagine our client has an RDF instance in cache. A request is made by the 
user and the client looks in its RDF instances, essentially to see if it can 
find a reason NOT to fetch the requested resource. So, to use one of Kal's 
examples, we request http://example.com/page.html and, checking in Kal's 
example2.rdf, we get back:

<rdf:RDF
  xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://www.example.com/page.html">
    <icra:hasLabel rdf:node="A0">
    <dc:copyright>(c) 2004 example.com enterprises</dc:copyright>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A0">
    <icra:cz>1</icra:cz>
    <icra:nz>1</icra:nz>
    <icra:lz>1</icra:lz>
    <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
  </rdf:Description>
</rdf:RDF>

So we know the ICRA label and the DC copyright before we even go there. This 
is good as it's a real efficiency saving for filters. Now we fetch the page 
and find that there's a link in it to 
"http://www.example.com/anotherLabel.rdf#alcohol". Now we'll have a label 
something like:

<rdf:RDF
  xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:ID="alcohol">
    <icra:cz>1</icra:cz>
    <icra:nz>1</icra:nz>
    <icra:lz>1</icra:lz>
    <icra:oa>1</icra:oa>
    <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
  </rdf:Description>
</rdf:RDF>

We infer this to be "about" http://www.example.org/page.html. So now we have 
2 descriptions of the same thing. But, there are differences. Are the 
differences sufficient for us write some normative statement that says "use 
the rdf:ID" contentLabel rather than rdf:nodeID or do we need something 
else?

Perhaps the rule-generated label can add in a property of precedence thus:
[...]
  <rdf:Description rdf:nodeID="A0">
    <icra:cz>1</icra:cz>
    <icra:nz>1</icra:nz>
    <icra:lz>1</icra:lz>
    <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
    <label:precedence value="1000" />
  </rdf:Description>
[...]

The directly accessed label has no such precedence value, i.e. it's "null" 
and therefore less than 1000 and can be given priority. It would be up to a 
content provider (or the tools we create) to assign unique precedence values 
across a domain so that if a client had any number of labels they should 
assign unique precedence values to the labels? This is a bit woolly I know! 
But, if we can create a difference between different labels that could 
potentially be applied to the same thing, preferably a numerical difference 
so it's easy to process, then we're not using one chunk of RDF to overwrite 
an existing chunk based on things like order received.

I spent a lot of time talking to various people concerned with the Mobile 
Filtering Project in Japan. There's a great deal of work going on in terms 
of research and testing of different architectural designs for where 
filtering should occur (on the handset vs. on the gateway etc.). One 
interesting point is that the focus there is very much on 3rd party labels 
rather than my focus which is on self-labels. Both are crucial.

It's clear we need to begin to think about the equivalent of PICSRules and 
soon. How to interpret a label that says something like "na 1 nb 1 nc 1 sa 1 
sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 before 
you should see this". This applies as much to creating the labels as reading 
them.

Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's developed 
that generates RDF/XML and N-Triples based on various imported schemas. He's 
used the ones produced by Kal so far, among others. I've sought 
clarification on a couple of issues but it seems to map the idea of rating 
values of 0 - 5 to various schemas. This is something we need to be able to 
do but it's an implementation issue. This is where the difference between a 
label and a rating becomes clear. For example, a Korean might "rate" a site 
as being an adult site because it depicted implied sexual acts. In Britain 
that's probably going to be rating 12 or 15 at most. Whether the label is 
produced by a Korean describing it as an adult site or a Briton rating it as 
a teenager's site, it still has an ICRA label that says "implied sexual 
acts" so the output is the same.

The vision is that labels should be available from multiple resources so 
maybe our PICSRules-like language needs to specify how to combine multiple 
labels for the same thing - or more likely we need to define how to define 
how to combine labels! Would this then obviate the need for overrides, 
precedence values etc?

A couple of specific points:

The PICSWizard assumes that a URL can include a wildcard. So, *.jp means 
"anything on the .jp TLD". Wildcards can occur anywhere so you could have 
*.example.* to mean anything on either example.com or example.org (or 
actually anything on example.foo.org etc as well). Unless I am mistaken, the 
Mobile Filtering Project has not defined this specifically but the meaning 
is clear, as is the expectation that software manufacturers would implement 
it.

The suggestion from Japan is therefore that we replace the beginsWith, 
endsWith and contains constructs and replace them simply with hasURL and 
that the value for this property may contain wildcards.

Comments please? It's easier to write but is it as well defined and easy to 
process?

Actually the suggestion was that matches was also dispensed with but I 
argued that a regular expression has more power than a simple wildcard.

Another request was that hasURL (or matches, or beginsWith etc) should be 
defined as properties not classes. I am unable to comment on this. Kal?

Finally, I saw a presentation on using RDF/XML labelling in RSS. I need to 
get a copy of the slides but I believe the basic point was that it's easy to 
add contentLabels to RSS 1.0 and Atom but not RSS 2.0.

Phil.

[1] See http://www.techquila.com/download/rdf-rules.zip
See also: 
http://www.icra.org/projects/quatro/techdiscussion/rdf-rulesets.pdf
[2] See http://www.semanticweb.jp/repository/picswizard/PICSWizard.ZIP
Received on Monday, 6 December 2004 15:59:11 UTC