- From: Phil Archer <phila@w3.org>
- Date: Mon, 19 Nov 2012 12:06:13 +0000
- To: Henry Story <henry.story@bblfish.net>
- CC: Read-Write-Web <public-rww@w3.org>, nathan <nathan@webr3.org>, Ruben Verborgh <ruben.verborgh@ugent.be>, Alexandre Bertails <bertails@w3.org>
Henry, everyone, let me see what I can offer here (for the many for whom my name means nothing, I lead the work on POWDER and am indelibly associated with it). The problem we faced is, I think, much the same as you have here. You want something that is easy to understand, such as "everyone with a URI that begins with http://example.org/people/trusted/" but at the same time have a processable means of handling this. So, we created a set of XML elements that were meant to be easy to use, such as: includehosts includepathstartswith includequerycontains For every 'include' there's a matching 'exclude' - and we covered scheme, host, path contains, path starts with, path ends with, ports, query strings and regexes and a full URI. That's what we called POWDER Grouping and it has its own separate Recommendation [1]. But this is a simplification layer. Within that doc we also defined how to turn any of those 'user-friendly elements' into regular expressions, for which we provided templates that you can bet we tested and re-tested. They're not simple but they are meant to be robust (the one that lets you include query string name/value pairs in any order was a lot of fun - not). The doc also covers IRI canonicalization which is important in this space. You can programmatically replace any of the user-friendly terms with matcheseregex (which we called POWDER-BASE) and it is *that* property (and notmatchesregex) that is the subject of POWDER's Semantic Extension [2]. The semantics of POWDER are fully defined. Any POWDER document (XML) can be transformed into POWDER-BASE (also XML, identical except that the only IRI set defining properties allowed are (not)matchesregex) and that can then be transformed into OWL *with the semantic extension* that allows you to run a regex against a URI - think of it SPARQL's regex(str(URI)). Semantically, all 3 flavours of a POWDER document are defined as identical. Only the syntax changes. POWDER can define any set of URIs, no matter how complex [3] The domain of wdrs:matchesregex is rdfs:Resource, its range xsd:string [4] - i.e. there's no weird inferencing there. Although I seem to recall looking it up, I see that we didn't actually define the regex syntax we used. I can only leave it to other to answer the Java Regex issue. There are some POWDER tools at [5] including a grouping tester [6]. That lets you put in values for the user-friendly URI components and then test a given URI to see if it is or is not covered. Hope this helps? Shout if you need more Phil. [1] http://www.w3.org/TR/powder-grouping/ [2] http://www.w3.org/TR/powder-formal/#regexSemantics [3] http://www.w3.org/TR/powder-grouping/#conj-disj [4] http://www.w3.org/2007/05/powder-s#matchesregex [5] http://philarcher.org/powder/ [6] http://philarcher.org/cgi-bin/powder-group.cgi On 19/11/2012 11:01, Henry Story wrote: > CCing Phil Archer. > ( Phil the thread for this starts here: > http://lists.w3.org/Archives/Public/public-rww/2012Nov/0119.html ) > > On 19 Nov 2012, at 02:31, Alexandre Bertails <bertails@w3.org> wrote: > >> On 11/18/2012 04:06 PM, Nathan wrote: >>> Henry Story wrote: >>>> [] wac:accessToClass [ wac:regex "http://joe.example/blog/.*" ]; >> >> For file matching patterns, I'd suggest not to reinvent the wheel and >> use something that has existed for a long time: ant patterns [1]. It's >> already defined, and the regex can be easily parsed and then compiled >> down to any language specific regex. > > I just came across the following discussion on IRC, which seems relevant to this. > > <blockquote> > 21:49 presbrey: bblfish, if you want to have regex we should support simple globbing too > 21:50 presbrey: most users do not write /admin/.*, they write /admin/* > 21:51 presbrey: also do we really want to incorporate blank nodes? this is the first proposal to do so > 21:54 presbrey: such a pattern also seems to duplicate eg. > 21:54 presbrey: acl:defaultForNew </admin/> > 21:57 presbrey: also in this particular scenario, it costs more to compile the regex pattern than to evaluate it > 21:58 presbrey: in more complex examples, the server now needs a resident regex cache > 21:59 melvster: perhaps arbitrary regex could be an attack surface too depending on who has accesss > 22:17 betehess would prefer to have ant style > 22:23 presbrey: betehess, do you know how I can parse ant style in python or php? > 22:24 presbrey: and javascript? :) > 22:24 betehess: shouldn't be difficult > 22:24 betehess: we'll need to define the regex grammar anyway > 22:25 betehess: at the end, any language should be able to compile them down to their own native regex style > 22:26 presbrey: at the end? > 22:26 betehess: http://trac.mach-ii.com/machii/wiki/ANTPatternMatcher > 22:26 betehess: just three wildcards > 22:26 betehess: having both ** and * is pretty cool > </blockquote> > > Yes, I can see that less powerful than full regexs could be helpful in reducing > regex based denial of service attacks for remotely published regex rules. Also > it is easier to specify for people correctly. > > That is why POWDER already has worked on simplified groupings, by proposing an > XML format for simple definitions. See for example here: > > http://www.w3.org/TR/powder-grouping/#wild > > I think it would be nice to semanticise those higher level relations so that > one can also use them directly in Turtle. Perhaps this is something we can ask > the POWDER group to do, if they are still around? > > Henry > > >> >> Alexandre. >> >> [1] http://ant.apache.org/manual/dirtasks.html#patterns >> >>> >>> What would [ wac:regex "http://joe.example/blog/.*" ] mean? >>> >>> Using OWL 2 we can create a datatype definition, using a datatype >>> restriction, on strings and the like - but that doesn't (anywhere near) >>> cover what's required here. >>> >>> I'm unsure how we'd actually create a Class of things based on the >>> lexical form of a URI though, or even, whether it's a good idea to do so >>> - we are basically saying that if a URI has a lexical form which matches >>> the regular expression x, then that URI denotes something which is of >>> the class y. This feels wrong. >>> >>> Cheers, >>> >>> Nathan >>> >>> >> > > Social Web Architect > http://bblfish.net/ > -- Phil Archer W3C eGovernment http://www.w3.org/egov/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Monday, 19 November 2012 12:06:42 UTC