Musings on resource grouping

The method by which we can group resources is a key part of what the 
POWDER WG is trying to define.

The following few lines of RDF/XML indicate the beginnings of one 
possible approach but also throw up a lot of questions so I wanted to 
put this in the public domain. Comment is very welcome on this - 
absolutely none of it is set in stone!


1  <wdr:Scope>
2    <wdr:hasScheme>^http$</wdr:hasScheme>
3    <wdr:hasHost>example.org$</wdr:hasHost>
4    <wdr:hasIP>213.249.189.194</wdr:hasIP>

5    <wdr:hasPath>foo</wdr:hasPath>
6    <wdr:hasPath>bar</wdr:hasPath>

7    <wdr:hasProperty>
8      <wdr:Property>
9        <ex:colour>red</ex:colour>
10     </wdr:Property>
11   </wdr:hasProperty>

12   <wdr:propLookUp rdf:resource="http://sparql.example.com" />

13   <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
14 </wdr:Scope>


The basic idea of an RDF Class containing the definition of the Scope 
seems straightforward enough?*

For a given URI, we wish to find out whether the resource to which it 
resolves is in scope or not. So first split it up into its component 
parts and then do some pattern matching using (Perl 5) regular expressions.

Line 2 uses a regular expression to indicate the the scope applies to 
resources fetched using HTTP. The caret and dollar sign require an exact 
match so that, for example, HTTPS is not in scope (^https?$ would cover 
exactly either HTTP or HTTPS).

Line 3 uses a similar approach to define the scope as being resources on 
the example.org domain or any subdomain thereof (if you want to restrict 
it specifically to example.org, put a caret in front of it).

Importantly, the dollar sign at the end avoids example.org.phishing.com 
being in scope.

4. Line 4 restricts the scope to resources delivered from the given IP 
address. This could be given as an IP range. Useful for large scale CMS 
that generates numeric URIs with no easy pattern matching ability??

Lines 5 and 5 define two elements that must be in the path if a resource 
is to be in scope. The intention is that, as with all elements here, 
these should be combined using logical AND. If logic OR is required, 
they can be presented readily in a single RegExp (foo|bar).

Lines 7 - 11 are an attempt to handle scoping by property. POWDER would 
provide a framework for properties to be used in this way but mustn't 
step over the line to define what kind of properties should be used.

In line 8 a Property Class is defined.

Line 9 provides an example to say that a resource must have the property 
of having the colour red.

Line 12 is intended to indicate that you can find out whether the 
resource is red by sending a SPARQL query to http://sparql.example.com. 
Such provision would be optional since it must cover several use cases:

  - where the content provider is making DRs available and is able to 
provide a look up data table for its resources to facilitate grouping.

  - where the content provider is unable to provide such data and wishes 
to state that the Description only applies to resources that are red - 
and you have to fetch the resources to find this out.

- where a third party is providing DRs and is making an assertion that 
is only true of red resources.

For example, they may wish to say that "all documents written in red ink 
are really hard to read on Mars." In such cases, the assertion remains 
consistent with or without the look up table/service.

Is this approach workable? Should we demand SPARQL or make it more 
generic? In which case we may need something more complex like:

<wdr:hasPropLookUp>
   <wdr:PropLookUp>
     <wdr:propLookUpURI rdf:resource="http://sparql.example.com" />
     <wdr:propLookUpType 
rdf:resource="http://www.w3.org/TR/rdf-sparql-query/" />
   </wdr:PropLookUp>
</wdr:hasPropLookUp>

This is more flexible and extensible but it means that a "generic POWDER 
processor" couldn't be built since it would have to deal with an 
unbounded number of mechanisms for retrieving property data.

Specifying SPARQL may limit usefulness for some? Where SPARQL is used, 
should we actually embed the SPARQL query?

Back to the original example, line 13 simply states that 
http://www.example.org/foo/bar.png is not in scope, despite it meeting 
the other criteria. This serves to exemplify the idea of simply listing 
URIs as being in/out of scope, and of including negation for all elements.

As I said at the top - comments welcome.

Phil.


* As discussed in the WCL-XG, it would, of course, be perfectly possible 
to encode Scope using another format, such as XML. This is still being 
considered by the WG - we could point to an XML literal from the RDF 
graph, for example.


-- 
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
w. http://www.fosi.org/people/philarcher/

Received on Monday, 26 March 2007 11:02:18 UTC