W3C home > Mailing lists > Public > public-rww@w3.org > November 2012

Re: [WAC] regexps in WebAccessControl

From: Henry Story <henry.story@bblfish.net>
Date: Mon, 19 Nov 2012 12:01:53 +0100
Cc: nathan <nathan@webr3.org>, Ruben Verborgh <ruben.verborgh@ugent.be>, Alexandre Bertails <bertails@w3.org>
Message-Id: <F85A8390-57B1-40AF-8417-275BB1276B9F@bblfish.net>
To: Read-Write-Web <public-rww@w3.org>, phila@w3.org
CCing Phil Archer.
( Phil the thread for this starts here:
   http://lists.w3.org/Archives/Public/public-rww/2012Nov/0119.html )

On 19 Nov 2012, at 02:31, Alexandre Bertails <bertails@w3.org> wrote:

> On 11/18/2012 04:06 PM, Nathan wrote:
>> Henry Story wrote:
>>>  []  wac:accessToClass [ wac:regex "http://joe.example/blog/.*" ];
> 
> For file matching patterns, I'd suggest not to reinvent the wheel and
> use something that has existed for a long time: ant patterns [1]. It's
> already defined, and the regex can be easily parsed and then compiled
> down to any language specific regex.

I just came across the following discussion on IRC, which seems relevant to this.

<blockquote>
21:49 presbrey: bblfish, if you want to have regex we should support simple globbing too
21:50 presbrey: most users do not write /admin/.*, they write /admin/*
21:51 presbrey: also do we really want to incorporate blank nodes? this is the first proposal to do so
21:54 presbrey: such a pattern also seems to duplicate eg.
21:54 presbrey: acl:defaultForNew </admin/>
21:57 presbrey: also in this particular scenario, it costs more to compile the regex pattern than to evaluate it
21:58 presbrey: in more complex examples, the server now needs a resident regex cache
21:59 melvster: perhaps arbitrary regex could be an attack surface too depending on who has accesss
22:17 betehess would prefer to have ant style
22:23 presbrey: betehess, do you know how I can parse ant style in python or php?
22:24 presbrey: and javascript? :)
22:24 betehess: shouldn't be difficult
22:24 betehess: we'll need to define the regex grammar anyway
22:25 betehess: at the end, any language should be able to compile them down to their own native regex style
22:26 presbrey: at the end?
22:26 betehess: http://trac.mach-ii.com/machii/wiki/ANTPatternMatcher
22:26 betehess: just three wildcards
22:26 betehess: having both ** and * is pretty cool
</blockquote>

Yes, I can see that less powerful than full regexs could be helpful in reducing
regex based denial of service attacks for remotely published regex rules. Also 
it is easier to specify for people correctly.

That is why POWDER already has worked on simplified groupings, by proposing an 
XML format for simple definitions. See for example here:

  http://www.w3.org/TR/powder-grouping/#wild

I think it would be nice to semanticise those higher level relations so that
one can also use them directly in Turtle. Perhaps this is something we can ask 
the POWDER group to do, if they are still around? 

Henry


> 
> Alexandre.
> 
> [1] http://ant.apache.org/manual/dirtasks.html#patterns
> 
>> 
>> What would [ wac:regex "http://joe.example/blog/.*" ] mean?
>> 
>> Using OWL 2 we can create a datatype definition, using a datatype
>> restriction, on strings and the like - but that doesn't (anywhere near)
>> cover what's required here.
>> 
>> I'm unsure how we'd actually create a Class of things based on the
>> lexical form of a URI though, or even, whether it's a good idea to do so
>> - we are basically saying that if a URI has a lexical form which matches
>> the regular expression x, then that URI denotes something which is of
>> the class y. This feels wrong.
>> 
>> Cheers,
>> 
>> Nathan
>> 
>> 
> 

Social Web Architect
http://bblfish.net/



Received on Monday, 19 November 2012 11:02:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 19 November 2012 11:02:28 GMT