RE: Musings on resource grouping

I like Kevin's idea.  Clean and simple.
Considering some of the domains that marketing people come up with, far
from the intended use of domains, this seems reasonable.
Except that we would need to come up with a good way of labeling
infinite levels of subdomains, as I could not find a definite nameing
scheme.  Every RFC I looked at refers to nth level or something to that
effect, but doesn't spell it out.

tld, sld, 3ld, 4ld, 5ld, etc. ?


-- Kai


> -----Original Message-----
> From: public-powderwg-request@w3.org 
> [mailto:public-powderwg-request@w3.org] On Behalf Of Smith, 
> Kevin, VF-Group
> Sent: Tuesday, March 27, 2007 7:26 PM
> To: public-powderwg@w3.org
> Subject: RE: Musings on resource grouping
> 
> 
>  
> Another way could be to distinguish the first and second 
> level domains:
> 
> <Scope>
>    <host>
>      <match tld="mobi" sld="example" />
>    </host>
> </Scope>
> 
> ...and so on for every appropriate level of domain. Takes out 
> the guesswork, but removes the whizzo regex flexibility...
> 
> Kevin 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: public-powderwg-request@w3.org
> [mailto:public-powderwg-request@w3.org] On Behalf Of Phil Archer
> Sent: 27 March 2007 17:10
> To: public-powderwg@w3.org
> Subject: Re: Musings on resource grouping
> 
> 
> You're right Jo, personal taste comes into this. personally, 
> I like regular expressions because you can do whizzo things 
> with them and simple ones, like example.com$ are not hard to master.
> 
> I should also say that I don't foresee many people writing 
> DRs by hand -
> 
>   we need tools for this.
> 
> But there's another point here. You say "I think most people 
> understand that the natural order for a domain name is a 
> rightmost match" - and of course you're right. But it's that 
> clause "most people understand" that is critical. Computers 
> are, of course, stupid and need to be told the simplest 
> things. So, I could write this bit of XML:
> 
> <Scope xmlns="http://blah">
>    <host>
>      <match name="example.mobi"/>
>    </host>
> </Scope>
> 
> And publish supplementary information that says "unless told 
> otherwise you should match the host right-wise." Or you can 
> express in your DTD that the range of host is a Perl 5 
> Regular Expression and leave the data as
> 
> <Scope>
>    <host>
>      <match name="example.mobi$" />
>    </host>
> </Scope>
> 
> That said (you know I'd have to say that) I do, of course, 
> take the point about ease of use and reducing the 
> opportunities for errors as far
> 
> as possible.
> 
> Phil.
> 
> Jo Rabin wrote:
> > Use of regex is one of those questions of taste, I suppose. And a
> discussion
> > about its merits or otherwise has the risk of becoming akin to a
> discussion
> > as to whether it is "better" to drive on the right hand side or the
> left
> > hand side of the road (left of course).
> > 
> > That said ... I think that the syntax that is used should 
> be measured 
> > against some requirements (which would preferably be stated
> requirements)
> > like minimising the possibility of error, simplicity etc.
> > 
> > In his original post (copied below), Phil points out that the
> expression
> > example.org$ prevents a match by example.org.phishing.com. 
> However, it
> seems
> > to me that this is quite an error prone mechanism and that it is
> likely that
> > many scoping statements would omit the $ and hence 
> potentially be open
> to
> > abuse.
> > 
> > I think most people understand that the natural order for a domain
> name is a
> > rightmost match, and the potential for error is reduced if the match
> pattern
> > "example.com" means an exact and sub-domain match, and does 
> not mean 
> > xxxexample.com, example.com.phishing.com etc.
> > 
> > As a design rule, it seems to me that the simplest expression should
> be used
> > for the commonest use case, rather than demanding that the commonest
> use
> > case employs special signifiers.
> > 
> > The use case for matching a sub-domain on its own, or 
> matching random 
> > domains ending in a particular string is actually quite unlikely it
> seems to
> > me. From this perspective use of regular expressions would 
> not appear
> to fit
> > requirements of simplicity, safety and fitness for purpose ...
> > 
> > Jo
> > 
> > ===
> > 
> > 
> > The method by which we can group resources is a key part of 
> what the 
> > POWDER WG is trying to define.
> > 
> > The following few lines of RDF/XML indicate the beginnings of one 
> > possible approach but also throw up a lot of questions so I 
> wanted to 
> > put this in the public domain. Comment is very welcome on this - 
> > absolutely none of it is set in stone!
> > 
> > 
> > 1  <wdr:Scope>
> > 2    <wdr:hasScheme>^http$</wdr:hasScheme>
> > 3    <wdr:hasHost>example.org$</wdr:hasHost>
> > 4    <wdr:hasIP>213.249.189.194</wdr:hasIP>
> > 
> > 5    <wdr:hasPath>foo</wdr:hasPath>
> > 6    <wdr:hasPath>bar</wdr:hasPath>
> > 
> > 7    <wdr:hasProperty>
> > 8      <wdr:Property>
> > 9        <ex:colour>red</ex:colour>
> > 10     </wdr:Property>
> > 11   </wdr:hasProperty>
> > 
> > 12   <wdr:propLookUp rdf:resource="http://sparql.example.com" />
> > 
> > 13   
> <wdr:hasNotURI>http://www.example.org/foo/bar.png</wdr:hasNotURI>
> > 14 </wdr:Scope>
> > 
> > 
> > The basic idea of an RDF Class containing the definition of 
> the Scope 
> > seems straightforward enough?*
> > 
> > For a given URI, we wish to find out whether the resource 
> to which it 
> > resolves is in scope or not. So first split it up into its 
> component 
> > parts and then do some pattern matching using (Perl 5) regular
> expressions.
> > 
> > Line 2 uses a regular expression to indicate the the scope 
> applies to 
> > resources fetched using HTTP. The caret and dollar sign require an
> exact 
> > match so that, for example, HTTPS is not in scope (^https?$ would
> cover 
> > exactly either HTTP or HTTPS).
> > 
> > Line 3 uses a similar approach to define the scope as being 
> resources
> on 
> > the example.org domain or any subdomain thereof (if you want to
> restrict 
> > it specifically to example.org, put a caret in front of it).
> > 
> > Importantly, the dollar sign at the end avoids
> example.org.phishing.com 
> > being in scope.
> > 
> > 4. Line 4 restricts the scope to resources delivered from 
> the given IP
> 
> > address. This could be given as an IP range. Useful for large scale
> CMS 
> > that generates numeric URIs with no easy pattern matching ability??
> > 
> > Lines 5 and 5 define two elements that must be in the path if a
> resource 
> > is to be in scope. The intention is that, as with all 
> elements here, 
> > these should be combined using logical AND. If logic OR is 
> required, 
> > they can be presented readily in a single RegExp (foo|bar).
> > 
> > Lines 7 - 11 are an attempt to handle scoping by property. POWDER
> would 
> > provide a framework for properties to be used in this way 
> but mustn't 
> > step over the line to define what kind of properties should be used.
> > 
> > In line 8 a Property Class is defined.
> > 
> > Line 9 provides an example to say that a resource must have the
> property 
> > of having the colour red.
> > 
> > Line 12 is intended to indicate that you can find out whether the 
> > resource is red by sending a SPARQL query to
> http://sparql.example.com. 
> > Such provision would be optional since it must cover several use
> cases:
> > 
> >   - where the content provider is making DRs available and 
> is able to 
> > provide a look up data table for its resources to 
> facilitate grouping.
> > 
> >   - where the content provider is unable to provide such data and
> wishes 
> > to state that the Description only applies to resources 
> that are red -
> 
> > and you have to fetch the resources to find this out.
> > 
> > - where a third party is providing DRs and is making an 
> assertion that
> 
> > is only true of red resources.
> > 
> > For example, they may wish to say that "all documents written in red
> ink 
> > are really hard to read on Mars." In such cases, the 
> assertion remains
> 
> > consistent with or without the look up table/service.
> > 
> > Is this approach workable? Should we demand SPARQL or make it more 
> > generic? In which case we may need something more complex like:
> > 
> > <wdr:hasPropLookUp>
> >    <wdr:PropLookUp>
> >      <wdr:propLookUpURI rdf:resource="http://sparql.example.com" />
> >      <wdr:propLookUpType
> > rdf:resource="http://www.w3.org/TR/rdf-sparql-query/" />
> >    </wdr:PropLookUp>
> > </wdr:hasPropLookUp>
> > 
> > This is more flexible and extensible but it means that a "generic
> POWDER 
> > processor" couldn't be built since it would have to deal with an 
> > unbounded number of mechanisms for retrieving property data.
> > 
> > Specifying SPARQL may limit usefulness for some? Where 
> SPARQL is used,
> 
> > should we actually embed the SPARQL query?
> > 
> > Back to the original example, line 13 simply states that 
> > http://www.example.org/foo/bar.png is not in scope, despite 
> it meeting
> 
> > the other criteria. This serves to exemplify the idea of simply
> listing 
> > URIs as being in/out of scope, and of including negation for all
> elements.
> > 
> > As I said at the top - comments welcome.
> > 
> > Phil.
> > 
> > 
> > * As discussed in the WCL-XG, it would, of course, be perfectly
> possible 
> > to encode Scope using another format, such as XML. This is 
> still being
> 
> > considered by the WG - we could point to an XML literal 
> from the RDF 
> > graph, for example.
> > 
> > 
> 
> 
> 
> 
> 

Received on Wednesday, 28 March 2007 07:10:26 UTC