Re: [WAC] regexps in WebAccessControl from Phil Archer on 2012-11-19 (public-rww@w3.org from November 2012)

From: Phil Archer <phila@w3.org>
Date: Mon, 19 Nov 2012 12:06:13 +0000
To: Henry Story <henry.story@bblfish.net>
CC: Read-Write-Web <public-rww@w3.org>, nathan <nathan@webr3.org>, Ruben Verborgh <ruben.verborgh@ugent.be>, Alexandre Bertails <bertails@w3.org>
Message-ID: <50AA20B5.2010304@w3.org>
Henry, everyone, let me see what I can offer here (for the many for whom 
my name means nothing, I lead the work on POWDER and am indelibly 
associated with it).

The problem we faced is, I think, much the same as you have here. You 
want something that is easy to understand, such as "everyone with a URI 
that begins with http://example.org/people/trusted/" but at the same 
time have a processable means of handling this.

So, we created a set of XML elements that were meant to be easy to use, 
such as:

includehosts
includepathstartswith
includequerycontains

For every 'include' there's a matching 'exclude' - and we covered 
scheme, host, path contains, path starts with, path ends with, ports, 
query strings and regexes and a full URI.

That's what we called POWDER Grouping and it has its own separate 
Recommendation [1]. But this is a simplification layer. Within that doc 
we also defined how to turn any of those 'user-friendly elements' into 
regular expressions, for which we provided templates that you can bet we 
tested and re-tested. They're not simple but they are meant to be robust 
(the one that lets you include query string name/value pairs in any 
order was a lot of fun - not). The doc also covers IRI canonicalization 
which is important in this space.

You can programmatically replace any of the user-friendly terms with 
matcheseregex (which we called POWDER-BASE) and it is *that* property 
(and notmatchesregex) that is the subject of POWDER's Semantic Extension 
[2]. The semantics of POWDER are fully defined.

Any POWDER document (XML) can be transformed into POWDER-BASE (also XML, 
identical except that the only IRI set defining properties allowed are 
(not)matchesregex) and that can then be transformed into OWL *with the 
semantic extension* that allows you to run a regex against a URI - think 
of it SPARQL's regex(str(URI)).

Semantically, all 3 flavours of a POWDER document are defined as 
identical. Only the syntax changes.

POWDER can define any set of URIs, no matter how complex [3]

The domain of wdrs:matchesregex is rdfs:Resource, its range xsd:string 
[4] - i.e. there's no weird inferencing there.

Although I seem to recall looking it up, I see that we didn't actually 
define the regex syntax we used. I can only leave it to other to answer 
the Java Regex issue.

There are some POWDER tools at [5] including a grouping tester [6]. That 
lets you put in values for the user-friendly URI components and then 
test a given URI to see if it is or is not covered.

Hope this helps?

Shout if you need more

Phil.


[1] http://www.w3.org/TR/powder-grouping/
[2] http://www.w3.org/TR/powder-formal/#regexSemantics
[3] http://www.w3.org/TR/powder-grouping/#conj-disj
[4] http://www.w3.org/2007/05/powder-s#matchesregex
[5] http://philarcher.org/powder/
[6] http://philarcher.org/cgi-bin/powder-group.cgi



On 19/11/2012 11:01, Henry Story wrote:
> CCing Phil Archer.
> ( Phil the thread for this starts here:
>     http://lists.w3.org/Archives/Public/public-rww/2012Nov/0119.html )
>
> On 19 Nov 2012, at 02:31, Alexandre Bertails <bertails@w3.org> wrote:
>
>> On 11/18/2012 04:06 PM, Nathan wrote:
>>> Henry Story wrote:
>>>>   []  wac:accessToClass [ wac:regex "http://joe.example/blog/.*" ];
>>
>> For file matching patterns, I'd suggest not to reinvent the wheel and
>> use something that has existed for a long time: ant patterns [1]. It's
>> already defined, and the regex can be easily parsed and then compiled
>> down to any language specific regex.
>
> I just came across the following discussion on IRC, which seems relevant to this.
>
> <blockquote>
> 21:49 presbrey: bblfish, if you want to have regex we should support simple globbing too
> 21:50 presbrey: most users do not write /admin/.*, they write /admin/*
> 21:51 presbrey: also do we really want to incorporate blank nodes? this is the first proposal to do so
> 21:54 presbrey: such a pattern also seems to duplicate eg.
> 21:54 presbrey: acl:defaultForNew </admin/>
> 21:57 presbrey: also in this particular scenario, it costs more to compile the regex pattern than to evaluate it
> 21:58 presbrey: in more complex examples, the server now needs a resident regex cache
> 21:59 melvster: perhaps arbitrary regex could be an attack surface too depending on who has accesss
> 22:17 betehess would prefer to have ant style
> 22:23 presbrey: betehess, do you know how I can parse ant style in python or php?
> 22:24 presbrey: and javascript? :)
> 22:24 betehess: shouldn't be difficult
> 22:24 betehess: we'll need to define the regex grammar anyway
> 22:25 betehess: at the end, any language should be able to compile them down to their own native regex style
> 22:26 presbrey: at the end?
> 22:26 betehess: http://trac.mach-ii.com/machii/wiki/ANTPatternMatcher
> 22:26 betehess: just three wildcards
> 22:26 betehess: having both ** and * is pretty cool
> </blockquote>
>
> Yes, I can see that less powerful than full regexs could be helpful in reducing
> regex based denial of service attacks for remotely published regex rules. Also
> it is easier to specify for people correctly.
>
> That is why POWDER already has worked on simplified groupings, by proposing an
> XML format for simple definitions. See for example here:
>
>    http://www.w3.org/TR/powder-grouping/#wild
>
> I think it would be nice to semanticise those higher level relations so that
> one can also use them directly in Turtle. Perhaps this is something we can ask
> the POWDER group to do, if they are still around?
>
> Henry
>
>
>>
>> Alexandre.
>>
>> [1] http://ant.apache.org/manual/dirtasks.html#patterns
>>
>>>
>>> What would [ wac:regex "http://joe.example/blog/.*" ] mean?
>>>
>>> Using OWL 2 we can create a datatype definition, using a datatype
>>> restriction, on strings and the like - but that doesn't (anywhere near)
>>> cover what's required here.
>>>
>>> I'm unsure how we'd actually create a Class of things based on the
>>> lexical form of a URI though, or even, whether it's a good idea to do so
>>> - we are basically saying that if a URI has a lexical form which matches
>>> the regular expression x, then that URI denotes something which is of
>>> the class y. This feels wrong.
>>>
>>> Cheers,
>>>
>>> Nathan
>>>
>>>
>>
>
> Social Web Architect
> http://bblfish.net/
>

-- 


Phil Archer
W3C eGovernment
http://www.w3.org/egov/

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Monday, 19 November 2012 12:06:42 UTC