- From: Phil Archer <phil.archer@icra.org>
- Date: Mon, 6 Dec 2004 15:58:34 -0000
- To: <public-quatro@w3.org>
Dear all,
Following Kal's latest work and my visit to Japan, here's the current
situation as I see it.
The key area for debate remains the application rules. Kal has produced a
very robust set of these including working code to back it up[1]. From what
I can see it does exactly what we have been working towards so far. Give it
an RDF instance to look at and a URI, back comes a suitable chunk of RDF/XML
with the relevant properties.
Eric Prud'hommeaux and I discussed using SPARQL to extract data. This seemed
promising and there are ways it could be used but having seen Kal's demo app
I'm not sure there's any distinct advantage in this, except that we'd be
using a standard method (not a small exception I'll grant).
Eric and I also discussed the problem of overrides and this, I think,
remains problematic.
Imagine our client has an RDF instance in cache. A request is made by the
user and the client looks in its RDF instances, essentially to see if it can
find a reason NOT to fetch the requested resource. So, to use one of Kal's
examples, we request http://example.com/page.html and, checking in Kal's
example2.rdf, we get back:
<rdf:RDF
xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.example.com/page.html">
<icra:hasLabel rdf:node="A0">
<dc:copyright>(c) 2004 example.com enterprises</dc:copyright>
</rdf:Description>
<rdf:Description rdf:nodeID="A0">
<icra:cz>1</icra:cz>
<icra:nz>1</icra:nz>
<icra:lz>1</icra:lz>
<rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
</rdf:Description>
</rdf:RDF>
So we know the ICRA label and the DC copyright before we even go there. This
is good as it's a real efficiency saving for filters. Now we fetch the page
and find that there's a link in it to
"http://www.example.com/anotherLabel.rdf#alcohol". Now we'll have a label
something like:
<rdf:RDF
xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:ID="alcohol">
<icra:cz>1</icra:cz>
<icra:nz>1</icra:nz>
<icra:lz>1</icra:lz>
<icra:oa>1</icra:oa>
<rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
</rdf:Description>
</rdf:RDF>
We infer this to be "about" http://www.example.org/page.html. So now we have
2 descriptions of the same thing. But, there are differences. Are the
differences sufficient for us write some normative statement that says "use
the rdf:ID" contentLabel rather than rdf:nodeID or do we need something
else?
Perhaps the rule-generated label can add in a property of precedence thus:
[...]
<rdf:Description rdf:nodeID="A0">
<icra:cz>1</icra:cz>
<icra:nz>1</icra:nz>
<icra:lz>1</icra:lz>
<rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" />
<label:precedence value="1000" />
</rdf:Description>
[...]
The directly accessed label has no such precedence value, i.e. it's "null"
and therefore less than 1000 and can be given priority. It would be up to a
content provider (or the tools we create) to assign unique precedence values
across a domain so that if a client had any number of labels they should
assign unique precedence values to the labels? This is a bit woolly I know!
But, if we can create a difference between different labels that could
potentially be applied to the same thing, preferably a numerical difference
so it's easy to process, then we're not using one chunk of RDF to overwrite
an existing chunk based on things like order received.
I spent a lot of time talking to various people concerned with the Mobile
Filtering Project in Japan. There's a great deal of work going on in terms
of research and testing of different architectural designs for where
filtering should occur (on the handset vs. on the gateway etc.). One
interesting point is that the focus there is very much on 3rd party labels
rather than my focus which is on self-labels. Both are crucial.
It's clear we need to begin to think about the equivalent of PICSRules and
soon. How to interpret a label that says something like "na 1 nb 1 nc 1 sa 1
sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 before
you should see this". This applies as much to creating the labels as reading
them.
Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's developed
that generates RDF/XML and N-Triples based on various imported schemas. He's
used the ones produced by Kal so far, among others. I've sought
clarification on a couple of issues but it seems to map the idea of rating
values of 0 - 5 to various schemas. This is something we need to be able to
do but it's an implementation issue. This is where the difference between a
label and a rating becomes clear. For example, a Korean might "rate" a site
as being an adult site because it depicted implied sexual acts. In Britain
that's probably going to be rating 12 or 15 at most. Whether the label is
produced by a Korean describing it as an adult site or a Briton rating it as
a teenager's site, it still has an ICRA label that says "implied sexual
acts" so the output is the same.
The vision is that labels should be available from multiple resources so
maybe our PICSRules-like language needs to specify how to combine multiple
labels for the same thing - or more likely we need to define how to define
how to combine labels! Would this then obviate the need for overrides,
precedence values etc?
A couple of specific points:
The PICSWizard assumes that a URL can include a wildcard. So, *.jp means
"anything on the .jp TLD". Wildcards can occur anywhere so you could have
*.example.* to mean anything on either example.com or example.org (or
actually anything on example.foo.org etc as well). Unless I am mistaken, the
Mobile Filtering Project has not defined this specifically but the meaning
is clear, as is the expectation that software manufacturers would implement
it.
The suggestion from Japan is therefore that we replace the beginsWith,
endsWith and contains constructs and replace them simply with hasURL and
that the value for this property may contain wildcards.
Comments please? It's easier to write but is it as well defined and easy to
process?
Actually the suggestion was that matches was also dispensed with but I
argued that a regular expression has more power than a simple wildcard.
Another request was that hasURL (or matches, or beginsWith etc) should be
defined as properties not classes. I am unable to comment on this. Kal?
Finally, I saw a presentation on using RDF/XML labelling in RSS. I need to
get a copy of the slides but I believe the basic point was that it's easy to
add contentLabels to RSS 1.0 and Atom but not RSS 2.0.
Phil.
[1] See http://www.techquila.com/download/rdf-rules.zip
See also:
http://www.icra.org/projects/quatro/techdiscussion/rdf-rulesets.pdf
[2] See http://www.semanticweb.jp/repository/picswizard/PICSWizard.ZIP
Received on Monday, 6 December 2004 15:59:11 UTC