- From: Phil Archer <phil.archer@icra.org>
- Date: Tue, 7 Dec 2004 09:54:55 -0000
- To: <public-quatro@w3.org>
I'm keen to see other comments on this too. Shimuzu San - can you say a few words about why you feel it is better to have the URL matching constructs as properties rather than classes? Also, what's your view on _just_ having regexp matches rather than string matches, with or without wildcards? Phil. ----- Original Message ----- From: "Kal Ahmed" <kal@techquila.com> To: <public-quatro@w3.org> Sent: Tuesday, December 07, 2004 8:59 AM Subject: Re: Update 6/12/04 > > Phil Archer wrote: >> Dear all, >> >> Following Kal's latest work and my visit to Japan, here's the current >> situation as I see it. >> >> The key area for debate remains the application rules. Kal has produced a >> very robust set of these including working code to back it up[1]. From >> what >> I can see it does exactly what we have been working towards so far. Give >> it >> an RDF instance to look at and a URI, back comes a suitable chunk of >> RDF/XML >> with the relevant properties. >> >> Eric Prud'hommeaux and I discussed using SPARQL to extract data. This >> seemed >> promising and there are ways it could be used but having seen Kal's demo >> app >> I'm not sure there's any distinct advantage in this, except that we'd be >> using a standard method (not a small exception I'll grant). >> > > Is SPARQL now at a stable stage ? My concern would be two-fold: > > 1) If its not stable we are trying to base our processing on a moving > target > 2) Lack of implementation support > > (2) is quite a big issue in terms of getting work on the labelling > mechanism kick-started. My Java hacking took about 1 day, including time > spent downloading and refamiliarising myself with the Jena APIs. If I had > to implement SPARQL too, I would probably still be coding. > >> Eric and I also discussed the problem of overrides and this, I think, >> remains problematic. >> > <snip/> > I'm not sure that it is problematic if you think of it as being a > collection of separate statements about a resource with different > provenences. e.g. when you are processing http://example.com/page.html you > may find that the central RDF store on server http://example.com/ has a > set of statements X but the page itself references a different set of > statements Y. I don't think you combine X and Y in your RDF graph instead > you decide which of X or Y you trust the most. In fact, this could be > something that is user configurable depending on the implementation. > > The same goes for the use of labelling authorities - you may choose to > trust the records provided by a labelling authority *more* than those > provided by the content provider. > > I think that a "choose one" model of processing is > a) easier to understand for users > b) easier to implement for developers > c) free of possible conflicts that could be caused by having multiple > sources > d) more robust against spoofing of inappropriate labels > > I would also imagine that the trust work of Quatro would make a strong > input into the selection process. > > This leaves one issue which is : If a labelling authority gives a resource > a label that would prevent the resource from being displayed, should the > client still retrieve the resource to see if there is an overriding label > that would allow the resource to be displayed ? Again, I think this is > something that should be open. In a low-bandwidth environment such as a > mobile phone, the developer may choose to not fetch the resource. In other > situations it may be user controlled perhaps (again directed by the > question of trust in the resource to describe itself properly). > >> I spent a lot of time talking to various people concerned with the Mobile >> Filtering Project in Japan. There's a great deal of work going on in >> terms >> of research and testing of different architectural designs for where >> filtering should occur (on the handset vs. on the gateway etc.). One >> interesting point is that the focus there is very much on 3rd party >> labels >> rather than my focus which is on self-labels. Both are crucial. >> >> It's clear we need to begin to think about the equivalent of PICSRules >> and >> soon. How to interpret a label that says something like "na 1 nb 1 nc 1 >> sa 1 >> sb 1" (in ICRA-speak) and says "if you're Spanish you need to be 18 >> before >> you should see this". This applies as much to creating the labels as >> reading >> them. >> >> Shimuzu Noboru demonstrated the PICSWizard[2]. This is a tool he's >> developed >> that generates RDF/XML and N-Triples based on various imported schemas. >> He's >> used the ones produced by Kal so far, among others. I've sought >> clarification on a couple of issues but it seems to map the idea of >> rating >> values of 0 - 5 to various schemas. This is something we need to be able >> to >> do but it's an implementation issue. This is where the difference between >> a >> label and a rating becomes clear. For example, a Korean might "rate" a >> site >> as being an adult site because it depicted implied sexual acts. In >> Britain >> that's probably going to be rating 12 or 15 at most. Whether the label is >> produced by a Korean describing it as an adult site or a Briton rating it >> as >> a teenager's site, it still has an ICRA label that says "implied sexual >> acts" so the output is the same. >> >> The vision is that labels should be available from multiple resources so >> maybe our PICSRules-like language needs to specify how to combine >> multiple >> labels for the same thing - or more likely we need to define how to >> define >> how to combine labels! Would this then obviate the need for overrides, >> precedence values etc? >> > > Again, is it a question of combination or a question of trust ? > >> A couple of specific points: >> >> The PICSWizard assumes that a URL can include a wildcard. So, *.jp means >> "anything on the .jp TLD". Wildcards can occur anywhere so you could have >> *.example.* to mean anything on either example.com or example.org (or >> actually anything on example.foo.org etc as well). Unless I am mistaken, >> the >> Mobile Filtering Project has not defined this specifically but the >> meaning >> is clear, as is the expectation that software manufacturers would >> implement >> it. >> >> The suggestion from Japan is therefore that we replace the beginsWith, >> endsWith and contains constructs and replace them simply with hasURL and >> that the value for this property may contain wildcards. >> >> Comments please? It's easier to write but is it as well defined and easy >> to >> process? >> >> Actually the suggestion was that matches was also dispensed with but I >> argued that a regular expression has more power than a simple wildcard. >> > > Actually I would propose dropping beginsWith, endsWith and hasURL and keep > only matches. Regular expressions are far more useful than simple > wildcarding. For example I could define a match [a-m].adserver.com which > would match a.adserver.com but not x.adserver.com > >> Another request was that hasURL (or matches, or beginsWith etc) should be >> defined as properties not classes. I am unable to comment on this. Kal? >> > I'll take a look at this in more detail. My original thinking was that > defining these values as classes gave more flexibility for extension, but > really in RDF that is not true - a property can be just as extensible as a > class. > >> Finally, I saw a presentation on using RDF/XML labelling in RSS. I need >> to >> get a copy of the slides but I believe the basic point was that it's easy >> to >> add contentLabels to RSS 1.0 and Atom but not RSS 2.0. >> > > Whether the RSS / Atom syntax is used or not, the RSS/Atom mechanisms are > instructive. I have an RSS newsreader that, when I log on, downloads and > aggregates the latest headlines from a set of feeds. Perhaps a really > clever client tool could do the same for content labels as a sort of > pre-fetch and a way to work out if locally cached labels need to be > refreshed ? > > Cheers, > > Kal > > >
Received on Tuesday, 7 December 2004 09:55:50 UTC