Re: Update 6/12/04

Phil Archer wrote:
> Dear all,
>
> Following Kal's latest work and my visit to Japan, here's the current
> situation as I see it.
>
> The key area for debate remains the application rules. Kal has 
> produced a
> very robust set of these including working code to back it up[1]. From 
> what
> I can see it does exactly what we have been working towards so far. 
> Give it
> an RDF instance to look at and a URI, back comes a suitable chunk of 
> RDF/XML
> with the relevant properties.
>
> Eric Prud'hommeaux and I discussed using SPARQL to extract data. This 
> seemed
> promising and there are ways it could be used but having seen Kal's 
> demo app
> I'm not sure there's any distinct advantage in this, except that we'd 
> be
> using a standard method (not a small exception I'll grant).
>

Is SPARQL now at a stable stage ? My concern would be two-fold:

1) If its not stable we are trying to base our processing on a moving 
target
2) Lack of implementation support

(2) is quite a big issue in terms of getting work on the labelling 
mechanism kick-started. My Java hacking took about 1 day, including 
time spent downloading and refamiliarising myself with the Jena APIs. 
If I had to implement SPARQL too, I would probably still be coding.

> Eric and I also discussed the problem of overrides and this, I think,
> remains problematic.
>
<snip/>
I'm not sure that it is problematic if you think of it as being a 
collection of separate statements about a resource with different 
provenences. e.g. when you are processing http://example.com/page.html 
you may find that the central RDF store on server http://example.com/ 
has a set of statements X but the page itself references a different 
set of statements Y. I don't think you combine X and Y in your RDF 
graph instead you decide which of X or Y you trust the most. In fact, 
this could be something that is user configurable depending on the 
implementation.

The same goes for the use of labelling authorities - you may choose to 
trust the records provided by a labelling authority *more* than those 
provided by the content provider.

I think that a "choose one" model of processing is
a) easier to understand for users
b) easier to implement for developers
c) free of possible conflicts that could be caused by having multiple 
sources
d) more robust against spoofing of inappropriate labels

I would also imagine that the trust work of Quatro would make a strong 
input into the selection process.

This leaves one issue which is : If a labelling authority gives a 
resource a label that would prevent the resource from being displayed, 
should the client still retrieve the resource to see if there is an 
overriding label that would allow the resource to be displayed ? Again, 
I think this is something that should be open. In a low-bandwidth 
environment such as a mobile phone, the developer may choose to not 
fetch the resource. In other situations it may be user controlled 
perhaps (again directed by the question of trust in the resource to 
describe itself properly).

> I spent a lot of time talking to various people concerned with the 
> Mobile
> Filtering Project in Japan. There's a great deal of work going on in 
> terms
> of research and testing of different architectural designs for where
> filtering should occur (on the handset vs. on the gateway etc.). One
> interesting point is that the focus there is very much on 3rd party 
> labels
> rather than my focus which is on self-labels. Both are crucial.
>
> It's clear we need to begin to think about the equivalent of PICSRules 
> and
> soon. How to interpret a label that says something like "na 1 nb 1 nc 
> 1 sa 1
> sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 
> before
> you should see this". This applies as much to creating the labels as 
> reading
> them.
>
> Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's 
> developed
> that generates RDF/XML and N-Triples based on various imported 
> schemas. He's
> used the ones produced by Kal so far, among others. I've sought
> clarification on a couple of issues but it seems to map the idea of 
> rating
> values of 0 - 5 to various schemas. This is something we need to be 
> able to
> do but it's an implementation issue. This is where the difference 
> between a
> label and a rating becomes clear. For example, a Korean might "rate" a 
> site
> as being an adult site because it depicted implied sexual acts. In 
> Britain
> that's probably going to be rating 12 or 15 at most. Whether the label 
> is
> produced by a Korean describing it as an adult site or a Briton rating 
> it as
> a teenager's site, it still has an ICRA label that says "implied sexual
> acts" so the output is the same.
>
> The vision is that labels should be available from multiple resources 
> so
> maybe our PICSRules-like language needs to specify how to combine 
> multiple
> labels for the same thing - or more likely we need to define how to 
> define
> how to combine labels! Would this then obviate the need for overrides,
> precedence values etc?
>

Again, is it a question of combination or a question of trust ?

> A couple of specific points:
>
> The PICSWizard assumes that a URL can include a wildcard. So, *.jp 
> means
> "anything on the .jp TLD". Wildcards can occur anywhere so you could 
> have
> *.example.* to mean anything on either example.com or example.org (or
> actually anything on example.foo.org etc as well). Unless I am 
> mistaken, the
> Mobile Filtering Project has not defined this specifically but the 
> meaning
> is clear, as is the expectation that software manufacturers would 
> implement
> it.
>
> The suggestion from Japan is therefore that we replace the beginsWith,
> endsWith and contains constructs and replace them simply with hasURL 
> and
> that the value for this property may contain wildcards.
>
> Comments please? It's easier to write but is it as well defined and 
> easy to
> process?
>
> Actually the suggestion was that matches was also dispensed with but I
> argued that a regular expression has more power than a simple wildcard.
>

Actually I would propose dropping beginsWith, endsWith and hasURL and 
keep only matches. Regular expressions are far more useful than simple 
wildcarding. For example I could define a match [a-m].adserver.com 
which would match a.adserver.com but not x.adserver.com

> Another request was that hasURL (or matches, or beginsWith etc) should 
> be
> defined as properties not classes. I am unable to comment on this. Kal?
>
I'll take a look at this in more detail. My original thinking was that 
defining these values as classes gave more flexibility for extension, 
but really in RDF that is not true - a property can be just as 
extensible as a class.

> Finally, I saw a presentation on using RDF/XML labelling in RSS. I 
> need to
> get a copy of the slides but I believe the basic point was that it's 
> easy to
> add contentLabels to RSS 1.0 and Atom but not RSS 2.0.
>

Whether the RSS / Atom syntax is used or not, the RSS/Atom mechanisms 
are instructive. I have an RSS newsreader that, when I log on, 
downloads and aggregates the latest headlines from a set of feeds. 
Perhaps a really clever client tool could do the same for content 
labels as a sort of pre-fetch and a way to work out if locally cached 
labels need to be refreshed ?

Cheers,

Kal

Received on Tuesday, 7 December 2004 08:58:58 UTC