W3C home > Mailing lists > Public > public-rww@w3.org > November 2012

Re: [WAC] regexps in WebAccessControl

From: Henry Story <henry.story@bblfish.net>
Date: Wed, 21 Nov 2012 12:09:37 +0100
Cc: Read-Write-Web <public-rww@w3.org>, nathan <nathan@webr3.org>, Ruben Verborgh <ruben.verborgh@ugent.be>, Alexandre Bertails <bertails@w3.org>
Message-Id: <85E13C7C-FD8D-46EC-B622-5DEDC1539B38@bblfish.net>
To: Phil Archer <phila@w3.org>
Hi Phil,

   Thanks for the very helpful overview on POWDER. From the comments earlier on this thread
I heard people worry about full regex being 

  1. too complicated to parse/write
  2. memory intensive ( a server would need to keep a cache of regexps )
  3. dangerous if one fetches them off the web, as currently it would be possible to with WebACLs

So for all of the above your answer is that you have an XML syntax that is easy to write.

<iriset>
  <includehosts>example.org</includehosts>
  <includepathstartswith>/foo</includepathstartswith>
</iriset>

whose semantics are defined as being equivalent to some rdf. The above example 
for example being equivalent to 

@prefix powder: <http://www.w3.org/2007/05/powder-s#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

:joesNS owl:equivalentClass [ owl:intersectionOf ( 
        [ a owl:Restriction;
          owl:onProperty powder:matchesregex;
          owl:hasValue "(([^\/\?\#]*)\@)?([^\:\/\?\#\@]+\.)?(example\.org)(:([0-9]+))?\/"],
        [ a owl:Restriction;
          owl:onProperty powder:matchesregex;
          owl:hasValue "(([^\/\?\#]*)\@)?([^\:\/\?\#\@]*)(\:([0-9]+))?(\/foo)" ]
       )
     ] .  // :-)
 
It is clear that the xml iri set notation could be coded very efficiently using normal programming 
tools. All programming languages have a URL class already defined, so that one could use those
parsers directly. I imagine that if one holds oneself to the xml notation one cannot get a 
denial of service regexp ( are there such things?)

So as we want to be able to work with the results of the LDP group [8], we need to have
a syntax to express your xml in Turtle. Something like this:

:joesNS a p:IriSet;
   p:includeHost "example.org";
   p:includePathStartsWith "/foo" .

I was wondering if this simple semantics is something the POWDER WG could feasibly publish.
We have a couple of use cases:

  A. determining groups of resources ( that can be accessed )
  B. determining groups of users ( that can access a resource )

A. groups of resources
---------------------

I think :joesNS is a class so that one should be able to just write

@prefix wac: <http://www.w3.org/ns/auth/acl> .

 [ wac:accessToClass :joesNS; 
   wac:mode wac:Read, wac:Write;  
   acl:agent <card#i>].


B. groups of people
-------------------

 Again I think here we could have

# I could not get anything simpler than this
:coEmployees a p:IriSet;
   p:includeRegex "^https://company.com/ppl/[^/]+#me$" .


:coRes a p:IriSet
   p:includeRegex "^https://company.com/ppl/.*"


which should alow us to build the following Rule

 [] wac:accessToClass :coRes; 
   wac:mode wac:Read;  
   acl:agentClass :coEmployees .


So if one could have those terms defined then we would be able to use those
and put them up on the WebAccessControl wiki page, in preparation for writing out
a spec.

	Henry


[8] http://www.w3.org/2012/ldp/hg/ldp.html


On 19 Nov 2012, at 13:06, Phil Archer <phila@w3.org> wrote:

> Henry, everyone, let me see what I can offer here (for the many for whom my name means nothing, I lead the work on POWDER and am indelibly associated with it).
> 
> The problem we faced is, I think, much the same as you have here. You want something that is easy to understand, such as "everyone with a URI that begins with http://example.org/people/trusted/" but at the same time have a processable means of handling this.
> 
> So, we created a set of XML elements that were meant to be easy to use, such as:
> 
> includehosts
> includepathstartswith
> includequerycontains
> 
> For every 'include' there's a matching 'exclude' - and we covered scheme, host, path contains, path starts with, path ends with, ports, query strings and regexes and a full URI.
> 
> That's what we called POWDER Grouping and it has its own separate Recommendation [1]. But this is a simplification layer. Within that doc we also defined how to turn any of those 'user-friendly elements' into regular expressions, for which we provided templates that you can bet we tested and re-tested. They're not simple but they are meant to be robust (the one that lets you include query string name/value pairs in any order was a lot of fun - not). The doc also covers IRI canonicalization which is important in this space.
> 
> You can programmatically replace any of the user-friendly terms with matcheseregex (which we called POWDER-BASE) and it is *that* property (and notmatchesregex) that is the subject of POWDER's Semantic Extension [2]. The semantics of POWDER are fully defined.
> 
> Any POWDER document (XML) can be transformed into POWDER-BASE (also XML, identical except that the only IRI set defining properties allowed are (not)matchesregex) and that can then be transformed into OWL *with the semantic extension* that allows you to run a regex against a URI - think of it SPARQL's regex(str(URI)).
> 
> Semantically, all 3 flavours of a POWDER document are defined as identical. Only the syntax changes.
> 
> POWDER can define any set of URIs, no matter how complex [3]
> 
> The domain of wdrs:matchesregex is rdfs:Resource, its range xsd:string [4] - i.e. there's no weird inferencing there.
> 
> Although I seem to recall looking it up, I see that we didn't actually define the regex syntax we used. I can only leave it to other to answer the Java Regex issue.
> 
> There are some POWDER tools at [5] including a grouping tester [6]. That lets you put in values for the user-friendly URI components and then test a given URI to see if it is or is not covered.
> 
> Hope this helps?
> 
> Shout if you need more
> 
> Phil.
> 
> 
> [1] http://www.w3.org/TR/powder-grouping/
> [2] http://www.w3.org/TR/powder-formal/#regexSemantics
> [3] http://www.w3.org/TR/powder-grouping/#conj-disj
> [4] http://www.w3.org/2007/05/powder-s#matchesregex
> [5] http://philarcher.org/powder/
> [6] http://philarcher.org/cgi-bin/powder-group.cgi
> 
> 
> 
> On 19/11/2012 11:01, Henry Story wrote:
>> CCing Phil Archer.
>> ( Phil the thread for this starts here:
>>    http://lists.w3.org/Archives/Public/public-rww/2012Nov/0119.html )
>> 
>> On 19 Nov 2012, at 02:31, Alexandre Bertails <bertails@w3.org> wrote:
>> 
>>> On 11/18/2012 04:06 PM, Nathan wrote:
>>>> Henry Story wrote:
>>>>>  []  wac:accessToClass [ wac:regex "http://joe.example/blog/.*" ];
>>> 
>>> For file matching patterns, I'd suggest not to reinvent the wheel and
>>> use something that has existed for a long time: ant patterns [1]. It's
>>> already defined, and the regex can be easily parsed and then compiled
>>> down to any language specific regex.
>> 
>> I just came across the following discussion on IRC, which seems relevant to this.
>> 
>> <blockquote>
>> 21:49 presbrey: bblfish, if you want to have regex we should support simple globbing too
>> 21:50 presbrey: most users do not write /admin/.*, they write /admin/*
>> 21:51 presbrey: also do we really want to incorporate blank nodes? this is the first proposal to do so
>> 21:54 presbrey: such a pattern also seems to duplicate eg.
>> 21:54 presbrey: acl:defaultForNew </admin/>
>> 21:57 presbrey: also in this particular scenario, it costs more to compile the regex pattern than to evaluate it
>> 21:58 presbrey: in more complex examples, the server now needs a resident regex cache
>> 21:59 melvster: perhaps arbitrary regex could be an attack surface too depending on who has accesss
>> 22:17 betehess would prefer to have ant style
>> 22:23 presbrey: betehess, do you know how I can parse ant style in python or php?
>> 22:24 presbrey: and javascript? :)
>> 22:24 betehess: shouldn't be difficult
>> 22:24 betehess: we'll need to define the regex grammar anyway
>> 22:25 betehess: at the end, any language should be able to compile them down to their own native regex style
>> 22:26 presbrey: at the end?
>> 22:26 betehess: http://trac.mach-ii.com/machii/wiki/ANTPatternMatcher
>> 22:26 betehess: just three wildcards
>> 22:26 betehess: having both ** and * is pretty cool
>> </blockquote>
>> 
>> Yes, I can see that less powerful than full regexs could be helpful in reducing
>> regex based denial of service attacks for remotely published regex rules. Also
>> it is easier to specify for people correctly.
>> 
>> That is why POWDER already has worked on simplified groupings, by proposing an
>> XML format for simple definitions. See for example here:
>> 
>>   http://www.w3.org/TR/powder-grouping/#wild
>> 
>> I think it would be nice to semanticise those higher level relations so that
>> one can also use them directly in Turtle. Perhaps this is something we can ask
>> the POWDER group to do, if they are still around?
>> 
>> Henry
>> 
>> 
>>> 
>>> Alexandre.
>>> 
>>> [1] http://ant.apache.org/manual/dirtasks.html#patterns
>>> 
>>>> 
>>>> What would [ wac:regex "http://joe.example/blog/.*" ] mean?
>>>> 
>>>> Using OWL 2 we can create a datatype definition, using a datatype
>>>> restriction, on strings and the like - but that doesn't (anywhere near)
>>>> cover what's required here.
>>>> 
>>>> I'm unsure how we'd actually create a Class of things based on the
>>>> lexical form of a URI though, or even, whether it's a good idea to do so
>>>> - we are basically saying that if a URI has a lexical form which matches
>>>> the regular expression x, then that URI denotes something which is of
>>>> the class y. This feels wrong.
>>>> 
>>>> Cheers,
>>>> 
>>>> Nathan
>>>> 
>>>> 
>>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 
> -- 
> 
> 
> Phil Archer
> W3C eGovernment
> http://www.w3.org/egov/
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1

Social Web Architect
http://bblfish.net/



Received on Wednesday, 21 November 2012 11:10:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 21 November 2012 11:10:12 GMT