- From: Phil Archer <phil.archer@icra.org>
- Date: Mon, 6 Dec 2004 15:58:34 -0000
- To: <public-quatro@w3.org>
Dear all, Following Kal's latest work and my visit to Japan, here's the current situation as I see it. The key area for debate remains the application rules. Kal has produced a very robust set of these including working code to back it up[1]. From what I can see it does exactly what we have been working towards so far. Give it an RDF instance to look at and a URI, back comes a suitable chunk of RDF/XML with the relevant properties. Eric Prud'hommeaux and I discussed using SPARQL to extract data. This seemed promising and there are ways it could be used but having seen Kal's demo app I'm not sure there's any distinct advantage in this, except that we'd be using a standard method (not a small exception I'll grant). Eric and I also discussed the problem of overrides and this, I think, remains problematic. Imagine our client has an RDF instance in cache. A request is made by the user and the client looks in its RDF instances, essentially to see if it can find a reason NOT to fetch the requested resource. So, to use one of Kal's examples, we request http://example.com/page.html and, checking in Kal's example2.rdf, we get back: <rdf:RDF xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://www.example.com/page.html"> <icra:hasLabel rdf:node="A0"> <dc:copyright>(c) 2004 example.com enterprises</dc:copyright> </rdf:Description> <rdf:Description rdf:nodeID="A0"> <icra:cz>1</icra:cz> <icra:nz>1</icra:nz> <icra:lz>1</icra:lz> <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" /> </rdf:Description> </rdf:RDF> So we know the ICRA label and the DC copyright before we even go there. This is good as it's a real efficiency saving for filters. Now we fetch the page and find that there's a link in it to "http://www.example.com/anotherLabel.rdf#alcohol". Now we'll have a label something like: <rdf:RDF xmlns:icra="http://www.icra.org/ratingsv03/rdfs/#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:ID="alcohol"> <icra:cz>1</icra:cz> <icra:nz>1</icra:nz> <icra:lz>1</icra:lz> <icra:oa>1</icra:oa> <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" /> </rdf:Description> </rdf:RDF> We infer this to be "about" http://www.example.org/page.html. So now we have 2 descriptions of the same thing. But, there are differences. Are the differences sufficient for us write some normative statement that says "use the rdf:ID" contentLabel rather than rdf:nodeID or do we need something else? Perhaps the rule-generated label can add in a property of precedence thus: [...] <rdf:Description rdf:nodeID="A0"> <icra:cz>1</icra:cz> <icra:nz>1</icra:nz> <icra:lz>1</icra:lz> <rdf:type rdf:resource="http://www.icra.org/ratingsv03/rdfs/#label" /> <label:precedence value="1000" /> </rdf:Description> [...] The directly accessed label has no such precedence value, i.e. it's "null" and therefore less than 1000 and can be given priority. It would be up to a content provider (or the tools we create) to assign unique precedence values across a domain so that if a client had any number of labels they should assign unique precedence values to the labels? This is a bit woolly I know! But, if we can create a difference between different labels that could potentially be applied to the same thing, preferably a numerical difference so it's easy to process, then we're not using one chunk of RDF to overwrite an existing chunk based on things like order received. I spent a lot of time talking to various people concerned with the Mobile Filtering Project in Japan. There's a great deal of work going on in terms of research and testing of different architectural designs for where filtering should occur (on the handset vs. on the gateway etc.). One interesting point is that the focus there is very much on 3rd party labels rather than my focus which is on self-labels. Both are crucial. It's clear we need to begin to think about the equivalent of PICSRules and soon. How to interpret a label that says something like "na 1 nb 1 nc 1 sa 1 sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 before you should see this". This applies as much to creating the labels as reading them. Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's developed that generates RDF/XML and N-Triples based on various imported schemas. He's used the ones produced by Kal so far, among others. I've sought clarification on a couple of issues but it seems to map the idea of rating values of 0 - 5 to various schemas. This is something we need to be able to do but it's an implementation issue. This is where the difference between a label and a rating becomes clear. For example, a Korean might "rate" a site as being an adult site because it depicted implied sexual acts. In Britain that's probably going to be rating 12 or 15 at most. Whether the label is produced by a Korean describing it as an adult site or a Briton rating it as a teenager's site, it still has an ICRA label that says "implied sexual acts" so the output is the same. The vision is that labels should be available from multiple resources so maybe our PICSRules-like language needs to specify how to combine multiple labels for the same thing - or more likely we need to define how to define how to combine labels! Would this then obviate the need for overrides, precedence values etc? A couple of specific points: The PICSWizard assumes that a URL can include a wildcard. So, *.jp means "anything on the .jp TLD". Wildcards can occur anywhere so you could have *.example.* to mean anything on either example.com or example.org (or actually anything on example.foo.org etc as well). Unless I am mistaken, the Mobile Filtering Project has not defined this specifically but the meaning is clear, as is the expectation that software manufacturers would implement it. The suggestion from Japan is therefore that we replace the beginsWith, endsWith and contains constructs and replace them simply with hasURL and that the value for this property may contain wildcards. Comments please? It's easier to write but is it as well defined and easy to process? Actually the suggestion was that matches was also dispensed with but I argued that a regular expression has more power than a simple wildcard. Another request was that hasURL (or matches, or beginsWith etc) should be defined as properties not classes. I am unable to comment on this. Kal? Finally, I saw a presentation on using RDF/XML labelling in RSS. I need to get a copy of the slides but I believe the basic point was that it's easy to add contentLabels to RSS 1.0 and Atom but not RSS 2.0. Phil. [1] See http://www.techquila.com/download/rdf-rules.zip See also: http://www.icra.org/projects/quatro/techdiscussion/rdf-rulesets.pdf [2] See http://www.semanticweb.jp/repository/picswizard/PICSWizard.ZIP
Received on Monday, 6 December 2004 15:59:11 UTC