Re: Update 6/12/04

I'm keen to see other comments on this too.

Shimuzu San - can you say a few words about why you feel it is better to 
have the URL matching constructs as properties rather than classes? Also, 
what's your view on _just_ having regexp matches rather than string matches, 
with or without wildcards?

Phil.
----- Original Message ----- 
From: "Kal Ahmed" <kal@techquila.com>
To: <public-quatro@w3.org>
Sent: Tuesday, December 07, 2004 8:59 AM
Subject: Re: Update 6/12/04


>
> Phil Archer wrote:
>> Dear all,
>>
>> Following Kal's latest work and my visit to Japan, here's the current
>> situation as I see it.
>>
>> The key area for debate remains the application rules. Kal has produced a
>> very robust set of these including working code to back it up[1]. From 
>> what
>> I can see it does exactly what we have been working towards so far. Give 
>> it
>> an RDF instance to look at and a URI, back comes a suitable chunk of 
>> RDF/XML
>> with the relevant properties.
>>
>> Eric Prud'hommeaux and I discussed using SPARQL to extract data. This 
>> seemed
>> promising and there are ways it could be used but having seen Kal's demo 
>> app
>> I'm not sure there's any distinct advantage in this, except that we'd be
>> using a standard method (not a small exception I'll grant).
>>
>
> Is SPARQL now at a stable stage ? My concern would be two-fold:
>
> 1) If its not stable we are trying to base our processing on a moving 
> target
> 2) Lack of implementation support
>
> (2) is quite a big issue in terms of getting work on the labelling 
> mechanism kick-started. My Java hacking took about 1 day, including time 
> spent downloading and refamiliarising myself with the Jena APIs. If I had 
> to implement SPARQL too, I would probably still be coding.
>
>> Eric and I also discussed the problem of overrides and this, I think,
>> remains problematic.
>>
> <snip/>
> I'm not sure that it is problematic if you think of it as being a 
> collection of separate statements about a resource with different 
> provenences. e.g. when you are processing http://example.com/page.html you 
> may find that the central RDF store on server http://example.com/ has a 
> set of statements X but the page itself references a different set of 
> statements Y. I don't think you combine X and Y in your RDF graph instead 
> you decide which of X or Y you trust the most. In fact, this could be 
> something that is user configurable depending on the implementation.
>
> The same goes for the use of labelling authorities - you may choose to 
> trust the records provided by a labelling authority *more* than those 
> provided by the content provider.
>
> I think that a "choose one" model of processing is
> a) easier to understand for users
> b) easier to implement for developers
> c) free of possible conflicts that could be caused by having multiple 
> sources
> d) more robust against spoofing of inappropriate labels
>
> I would also imagine that the trust work of Quatro would make a strong 
> input into the selection process.
>
> This leaves one issue which is : If a labelling authority gives a resource 
> a label that would prevent the resource from being displayed, should the 
> client still retrieve the resource to see if there is an overriding label 
> that would allow the resource to be displayed ? Again, I think this is 
> something that should be open. In a low-bandwidth environment such as a 
> mobile phone, the developer may choose to not fetch the resource. In other 
> situations it may be user controlled perhaps (again directed by the 
> question of trust in the resource to describe itself properly).
>
>> I spent a lot of time talking to various people concerned with the Mobile
>> Filtering Project in Japan. There's a great deal of work going on in 
>> terms
>> of research and testing of different architectural designs for where
>> filtering should occur (on the handset vs. on the gateway etc.). One
>> interesting point is that the focus there is very much on 3rd party 
>> labels
>> rather than my focus which is on self-labels. Both are crucial.
>>
>> It's clear we need to begin to think about the equivalent of PICSRules 
>> and
>> soon. How to interpret a label that says something like "na 1 nb 1 nc 1 
>> sa 1
>> sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 
>> before
>> you should see this". This applies as much to creating the labels as 
>> reading
>> them.
>>
>> Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's 
>> developed
>> that generates RDF/XML and N-Triples based on various imported schemas. 
>> He's
>> used the ones produced by Kal so far, among others. I've sought
>> clarification on a couple of issues but it seems to map the idea of 
>> rating
>> values of 0 - 5 to various schemas. This is something we need to be able 
>> to
>> do but it's an implementation issue. This is where the difference between 
>> a
>> label and a rating becomes clear. For example, a Korean might "rate" a 
>> site
>> as being an adult site because it depicted implied sexual acts. In 
>> Britain
>> that's probably going to be rating 12 or 15 at most. Whether the label is
>> produced by a Korean describing it as an adult site or a Briton rating it 
>> as
>> a teenager's site, it still has an ICRA label that says "implied sexual
>> acts" so the output is the same.
>>
>> The vision is that labels should be available from multiple resources so
>> maybe our PICSRules-like language needs to specify how to combine 
>> multiple
>> labels for the same thing - or more likely we need to define how to 
>> define
>> how to combine labels! Would this then obviate the need for overrides,
>> precedence values etc?
>>
>
> Again, is it a question of combination or a question of trust ?
>
>> A couple of specific points:
>>
>> The PICSWizard assumes that a URL can include a wildcard. So, *.jp means
>> "anything on the .jp TLD". Wildcards can occur anywhere so you could have
>> *.example.* to mean anything on either example.com or example.org (or
>> actually anything on example.foo.org etc as well). Unless I am mistaken, 
>> the
>> Mobile Filtering Project has not defined this specifically but the 
>> meaning
>> is clear, as is the expectation that software manufacturers would 
>> implement
>> it.
>>
>> The suggestion from Japan is therefore that we replace the beginsWith,
>> endsWith and contains constructs and replace them simply with hasURL and
>> that the value for this property may contain wildcards.
>>
>> Comments please? It's easier to write but is it as well defined and easy 
>> to
>> process?
>>
>> Actually the suggestion was that matches was also dispensed with but I
>> argued that a regular expression has more power than a simple wildcard.
>>
>
> Actually I would propose dropping beginsWith, endsWith and hasURL and keep 
> only matches. Regular expressions are far more useful than simple 
> wildcarding. For example I could define a match [a-m].adserver.com which 
> would match a.adserver.com but not x.adserver.com
>
>> Another request was that hasURL (or matches, or beginsWith etc) should be
>> defined as properties not classes. I am unable to comment on this. Kal?
>>
> I'll take a look at this in more detail. My original thinking was that 
> defining these values as classes gave more flexibility for extension, but 
> really in RDF that is not true - a property can be just as extensible as a 
> class.
>
>> Finally, I saw a presentation on using RDF/XML labelling in RSS. I need 
>> to
>> get a copy of the slides but I believe the basic point was that it's easy 
>> to
>> add contentLabels to RSS 1.0 and Atom but not RSS 2.0.
>>
>
> Whether the RSS / Atom syntax is used or not, the RSS/Atom mechanisms are 
> instructive. I have an RSS newsreader that, when I log on, downloads and 
> aggregates the latest headlines from a set of feeds. Perhaps a really 
> clever client tool could do the same for content labels as a sort of 
> pre-fetch and a way to work out if locally cached labels need to be 
> refreshed ?
>
> Cheers,
>
> Kal
>
>
> 

Received on Tuesday, 7 December 2004 09:55:50 UTC