- From: Smith, Kevin, VF-Group <Kevin.Smith@vodafone.com>
- Date: Tue, 27 Mar 2007 19:25:31 +0200
- To: <public-powderwg@w3.org>
Another way could be to distinguish the first and second level domains: <Scope> <host> <match tld="mobi" sld="example" /> </host> </Scope> ...and so on for every appropriate level of domain. Takes out the guesswork, but removes the whizzo regex flexibility... Kevin -----Original Message----- From: public-powderwg-request@w3.org [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer Sent: 27 March 2007 17:10 To: public-powderwg@w3.org Subject: Re: Musings on resource grouping You're right Jo, personal taste comes into this. personally, I like regular expressions because you can do whizzo things with them and simple ones, like example.com$ are not hard to master. I should also say that I don't foresee many people writing DRs by hand - we need tools for this. But there's another point here. You say "I think most people understand that the natural order for a domain name is a rightmost match" - and of course you're right. But it's that clause "most people understand" that is critical. Computers are, of course, stupid and need to be told the simplest things. So, I could write this bit of XML: <Scope xmlns="http://blah"> <host> <match name="example.mobi"/> </host> </Scope> And publish supplementary information that says "unless told otherwise you should match the host right-wise." Or you can express in your DTD that the range of host is a Perl 5 Regular Expression and leave the data as <Scope> <host> <match name="example.mobi$" /> </host> </Scope> That said (you know I'd have to say that) I do, of course, take the point about ease of use and reducing the opportunities for errors as far as possible. Phil. Jo Rabin wrote: > Use of regex is one of those questions of taste, I suppose. And a discussion > about its merits or otherwise has the risk of becoming akin to a discussion > as to whether it is "better" to drive on the right hand side or the left > hand side of the road (left of course). > > That said ... I think that the syntax that is used should be measured > against some requirements (which would preferably be stated requirements) > like minimising the possibility of error, simplicity etc. > > In his original post (copied below), Phil points out that the expression > example.org$ prevents a match by example.org.phishing.com. However, it seems > to me that this is quite an error prone mechanism and that it is likely that > many scoping statements would omit the $ and hence potentially be open to > abuse. > > I think most people understand that the natural order for a domain name is a > rightmost match, and the potential for error is reduced if the match pattern > "example.com" means an exact and sub-domain match, and does not mean > xxxexample.com, example.com.phishing.com etc. > > As a design rule, it seems to me that the simplest expression should be used > for the commonest use case, rather than demanding that the commonest use > case employs special signifiers. > > The use case for matching a sub-domain on its own, or matching random > domains ending in a particular string is actually quite unlikely it seems to > me. From this perspective use of regular expressions would not appear to fit > requirements of simplicity, safety and fitness for purpose ... > > Jo > > === > > > The method by which we can group resources is a key part of what the > POWDER WG is trying to define. > > The following few lines of RDF/XML indicate the beginnings of one > possible approach but also throw up a lot of questions so I wanted to > put this in the public domain. Comment is very welcome on this - > absolutely none of it is set in stone! > > > 1 <wdr:Scope> > 2 <wdr:hasScheme>^http$</wdr:hasScheme> > 3 <wdr:hasHost>example.org$</wdr:hasHost> > 4 <wdr:hasIP>213.249.189.194</wdr:hasIP> > > 5 <wdr:hasPath>foo</wdr:hasPath> > 6 <wdr:hasPath>bar</wdr:hasPath> > > 7 <wdr:hasProperty> > 8 <wdr:Property> > 9 <ex:colour>red</ex:colour> > 10 </wdr:Property> > 11 </wdr:hasProperty> > > 12 <wdr:propLookUp rdf:resource="http://sparql.example.com" /> > > 13 <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI> > 14 </wdr:Scope> > > > The basic idea of an RDF Class containing the definition of the Scope > seems straightforward enough?* > > For a given URI, we wish to find out whether the resource to which it > resolves is in scope or not. So first split it up into its component > parts and then do some pattern matching using (Perl 5) regular expressions. > > Line 2 uses a regular expression to indicate the the scope applies to > resources fetched using HTTP. The caret and dollar sign require an exact > match so that, for example, HTTPS is not in scope (^https?$ would cover > exactly either HTTP or HTTPS). > > Line 3 uses a similar approach to define the scope as being resources on > the example.org domain or any subdomain thereof (if you want to restrict > it specifically to example.org, put a caret in front of it). > > Importantly, the dollar sign at the end avoids example.org.phishing.com > being in scope. > > 4. Line 4 restricts the scope to resources delivered from the given IP > address. This could be given as an IP range. Useful for large scale CMS > that generates numeric URIs with no easy pattern matching ability?? > > Lines 5 and 5 define two elements that must be in the path if a resource > is to be in scope. The intention is that, as with all elements here, > these should be combined using logical AND. If logic OR is required, > they can be presented readily in a single RegExp (foo|bar). > > Lines 7 - 11 are an attempt to handle scoping by property. POWDER would > provide a framework for properties to be used in this way but mustn't > step over the line to define what kind of properties should be used. > > In line 8 a Property Class is defined. > > Line 9 provides an example to say that a resource must have the property > of having the colour red. > > Line 12 is intended to indicate that you can find out whether the > resource is red by sending a SPARQL query to http://sparql.example.com. > Such provision would be optional since it must cover several use cases: > > - where the content provider is making DRs available and is able to > provide a look up data table for its resources to facilitate grouping. > > - where the content provider is unable to provide such data and wishes > to state that the Description only applies to resources that are red - > and you have to fetch the resources to find this out. > > - where a third party is providing DRs and is making an assertion that > is only true of red resources. > > For example, they may wish to say that "all documents written in red ink > are really hard to read on Mars." In such cases, the assertion remains > consistent with or without the look up table/service. > > Is this approach workable? Should we demand SPARQL or make it more > generic? In which case we may need something more complex like: > > <wdr:hasPropLookUp> > <wdr:PropLookUp> > <wdr:propLookUpURI rdf:resource="http://sparql.example.com" /> > <wdr:propLookUpType > rdf:resource="http://www.w3.org/TR/rdf-sparql-query/" /> > </wdr:PropLookUp> > </wdr:hasPropLookUp> > > This is more flexible and extensible but it means that a "generic POWDER > processor" couldn't be built since it would have to deal with an > unbounded number of mechanisms for retrieving property data. > > Specifying SPARQL may limit usefulness for some? Where SPARQL is used, > should we actually embed the SPARQL query? > > Back to the original example, line 13 simply states that > http://www.example.org/foo/bar.png is not in scope, despite it meeting > the other criteria. This serves to exemplify the idea of simply listing > URIs as being in/out of scope, and of including negation for all elements. > > As I said at the top - comments welcome. > > Phil. > > > * As discussed in the WCL-XG, it would, of course, be perfectly possible > to encode Scope using another format, such as XML. This is still being > considered by the WG - we could point to an XML literal from the RDF > graph, for example. > >
Received on Tuesday, 27 March 2007 17:25:43 UTC