W3C home > Mailing lists > Public > public-powderwg@w3.org > March 2007

RE: Musings on resource grouping

From: Smith, Kevin, VF-Group <Kevin.Smith@vodafone.com>
Date: Tue, 27 Mar 2007 19:25:31 +0200
Message-ID: <7753CA22B9752F4496FFDAFFF6627A1460576A@EITO-MBX03.internal.vodafone.com>
To: <public-powderwg@w3.org>

Another way could be to distinguish the first and second level domains:

     <match tld="mobi" sld="example" />

...and so on for every appropriate level of domain. Takes out the
guesswork, but removes the whizzo regex flexibility...


-----Original Message-----
From: public-powderwg-request@w3.org
[mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
Sent: 27 March 2007 17:10
To: public-powderwg@w3.org
Subject: Re: Musings on resource grouping

You're right Jo, personal taste comes into this. personally, I like 
regular expressions because you can do whizzo things with them and 
simple ones, like example.com$ are not hard to master.

I should also say that I don't foresee many people writing DRs by hand -

  we need tools for this.

But there's another point here. You say "I think most people understand 
that the natural order for a domain name is a rightmost match" - and of 
course you're right. But it's that clause "most people understand" that 
is critical. Computers are, of course, stupid and need to be told the 
simplest things. So, I could write this bit of XML:

<Scope xmlns="http://blah">
     <match name="example.mobi"/>

And publish supplementary information that says "unless told otherwise 
you should match the host right-wise." Or you can express in your DTD 
that the range of host is a Perl 5 Regular Expression and leave the data

     <match name="example.mobi$" />

That said (you know I'd have to say that) I do, of course, take the 
point about ease of use and reducing the opportunities for errors as far

as possible.


Jo Rabin wrote:
> Use of regex is one of those questions of taste, I suppose. And a
> about its merits or otherwise has the risk of becoming akin to a
> as to whether it is "better" to drive on the right hand side or the
> hand side of the road (left of course).
> That said ... I think that the syntax that is used should be measured
> against some requirements (which would preferably be stated
> like minimising the possibility of error, simplicity etc.
> In his original post (copied below), Phil points out that the
> example.org$ prevents a match by example.org.phishing.com. However, it
> to me that this is quite an error prone mechanism and that it is
likely that
> many scoping statements would omit the $ and hence potentially be open
> abuse.
> I think most people understand that the natural order for a domain
name is a
> rightmost match, and the potential for error is reduced if the match
> "example.com" means an exact and sub-domain match, and does not mean
> xxxexample.com, example.com.phishing.com etc. 
> As a design rule, it seems to me that the simplest expression should
be used
> for the commonest use case, rather than demanding that the commonest
> case employs special signifiers.
> The use case for matching a sub-domain on its own, or matching random
> domains ending in a particular string is actually quite unlikely it
seems to
> me. From this perspective use of regular expressions would not appear
to fit
> requirements of simplicity, safety and fitness for purpose ...
> Jo 
> ===
> The method by which we can group resources is a key part of what the 
> POWDER WG is trying to define.
> The following few lines of RDF/XML indicate the beginnings of one 
> possible approach but also throw up a lot of questions so I wanted to 
> put this in the public domain. Comment is very welcome on this - 
> absolutely none of it is set in stone!
> 1  <wdr:Scope>
> 2    <wdr:hasScheme>^http$</wdr:hasScheme>
> 3    <wdr:hasHost>example.org$</wdr:hasHost>
> 4    <wdr:hasIP></wdr:hasIP>
> 5    <wdr:hasPath>foo</wdr:hasPath>
> 6    <wdr:hasPath>bar</wdr:hasPath>
> 7    <wdr:hasProperty>
> 8      <wdr:Property>
> 9        <ex:colour>red</ex:colour>
> 10     </wdr:Property>
> 11   </wdr:hasProperty>
> 12   <wdr:propLookUp rdf:resource="http://sparql.example.com" />
> 13   <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
> 14 </wdr:Scope>
> The basic idea of an RDF Class containing the definition of the Scope 
> seems straightforward enough?*
> For a given URI, we wish to find out whether the resource to which it 
> resolves is in scope or not. So first split it up into its component 
> parts and then do some pattern matching using (Perl 5) regular
> Line 2 uses a regular expression to indicate the the scope applies to 
> resources fetched using HTTP. The caret and dollar sign require an
> match so that, for example, HTTPS is not in scope (^https?$ would
> exactly either HTTP or HTTPS).
> Line 3 uses a similar approach to define the scope as being resources
> the example.org domain or any subdomain thereof (if you want to
> it specifically to example.org, put a caret in front of it).
> Importantly, the dollar sign at the end avoids
> being in scope.
> 4. Line 4 restricts the scope to resources delivered from the given IP

> address. This could be given as an IP range. Useful for large scale
> that generates numeric URIs with no easy pattern matching ability??
> Lines 5 and 5 define two elements that must be in the path if a
> is to be in scope. The intention is that, as with all elements here, 
> these should be combined using logical AND. If logic OR is required, 
> they can be presented readily in a single RegExp (foo|bar).
> Lines 7 - 11 are an attempt to handle scoping by property. POWDER
> provide a framework for properties to be used in this way but mustn't 
> step over the line to define what kind of properties should be used.
> In line 8 a Property Class is defined.
> Line 9 provides an example to say that a resource must have the
> of having the colour red.
> Line 12 is intended to indicate that you can find out whether the 
> resource is red by sending a SPARQL query to
> Such provision would be optional since it must cover several use
>   - where the content provider is making DRs available and is able to 
> provide a look up data table for its resources to facilitate grouping.
>   - where the content provider is unable to provide such data and
> to state that the Description only applies to resources that are red -

> and you have to fetch the resources to find this out.
> - where a third party is providing DRs and is making an assertion that

> is only true of red resources.
> For example, they may wish to say that "all documents written in red
> are really hard to read on Mars." In such cases, the assertion remains

> consistent with or without the look up table/service.
> Is this approach workable? Should we demand SPARQL or make it more 
> generic? In which case we may need something more complex like:
> <wdr:hasPropLookUp>
>    <wdr:PropLookUp>
>      <wdr:propLookUpURI rdf:resource="http://sparql.example.com" />
>      <wdr:propLookUpType 
> rdf:resource="http://www.w3.org/TR/rdf-sparql-query/" />
>    </wdr:PropLookUp>
> </wdr:hasPropLookUp>
> This is more flexible and extensible but it means that a "generic
> processor" couldn't be built since it would have to deal with an 
> unbounded number of mechanisms for retrieving property data.
> Specifying SPARQL may limit usefulness for some? Where SPARQL is used,

> should we actually embed the SPARQL query?
> Back to the original example, line 13 simply states that 
> http://www.example.org/foo/bar.png is not in scope, despite it meeting

> the other criteria. This serves to exemplify the idea of simply
> URIs as being in/out of scope, and of including negation for all
> As I said at the top - comments welcome.
> Phil.
> * As discussed in the WCL-XG, it would, of course, be perfectly
> to encode Scope using another format, such as XML. This is still being

> considered by the WG - we could point to an XML literal from the RDF 
> graph, for example.
Received on Tuesday, 27 March 2007 17:25:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:42:11 GMT