- From: Phil Archer <parcher@icra.org>
- Date: Wed, 26 Mar 2008 11:10:17 +0000
- To: Public POWDER <public-powderwg@w3.org>
Stasinos Konstantopoulos wrote: > On Tue Mar 25 11:55:42 2008 Phil Archer said: > >> But let's make this progressively more complex and see whether we can >> convert _all_ possible POWDER IRI sets into POWDER-S versions with a >> single reg ex. > > Why a single regex? This is an unnecessary complication. Ah, right. We have said that except for in/exclude query contains and in/exclude path contains, each element can only appear once. This helps to minimise mistakes and makes the validation easier. > Each string pattern need only map to a single reg ex pattern, and a URI > has to pass all reg ex tests to match the iriset. Yes, in that case, it makes sense. We'd have to change things a little to say that the 'once per IRI set rule' applies to POWDER but not to includeregex in POWDER-S. Conjunction can be > very naturally represented in in OWL/RDF, so that's not an issue. > >> [easy disjunction snipped] >> >> OK, let's cut to the chase. POWDER allows very sophisticated IRI set >> definitions like this: >> >> <iriset> >> <includeschemes>http https</includeschemes> >> <includehosts>example.org example.com</includehosts> >> <includepathcontains>foo bar</includepathcontains> >> <includepathcontains>red blue</includepathcontains> >> </iriset> >> >> Here we have either http or https. OK, in reg ex that's https? add in >> the host and we get >> >> ^https?://(.*\.)?(example.com|example.org) > > I would strongly discourage the XSLT author from trying to get smart and > suggest they keep it simple instead: > > <includeregex>(^http) | (^https)</includeregex> > <includeregex>(^[^/]+//example.org)| (^[^/]+//example.com)</includeregex> > <includeregex>(^[^/]+//[^/]/.*foo) | (^[^/]+//[^/]/.*bar) </includeregex> > <includeregex>(^[^/]+//[^/]/.*red) | (^[^/]+//[^/]/.*blue) </includeregex> > > Easy, and straight to the point. Yes, that's OK. There's lots of automatic finite-state > combination and optimization tools out there, if the implementor needs > to worry about efficient application of the patterns. Although, > implementors who care about efficiency are better off directly > implementing the extensions in the first place. OK, that answers my next point which was that people often complain about the processing overhead of using a single reg ex, never mind a load of them. > >> And would anyone like to hazard a bit of code that rendered this as a >> reg ex: >> >> <iriset> >> <includeschemes>http https</includeschemes> >> <includehosts>example.org example.com</includehosts> >> <includepathcontains>foo bar</includepathcontains> >> <includepathcontains>red blue</includepathcontains> >> <excludeexactqueries>name1=value1&name2=value2 >> </excludeexactqueries> >> </iriset> > > Gladly. It is: > > <includeregex>(^http) | (^https)</includeregex> > <includeregex>(^[^/]+//example.org)| (^[^/]+//example.com)</includeregex> > <includeregex>(^[^/]+//[^/]/.*foo) | (^[^/]+//[^/]/.*bar) </includeregex> > <includeregex>(^[^/]+//[^/]/.*red) | (^[^/]+//[^/]/.*blue) </includeregex> > <includeregex>(^[^?]+?.*name1=value1)</includeregex> > <includeregex>(^[^?]+?.*name2=value2)</includeregex> > > It doesn't look all that bad to me. True, but you've not negated the query strings... would you keep excluderegex?? Could I ask you please to create a POWDER-S OWL class that captured this? Phil.
Received on Wednesday, 26 March 2008 11:10:59 UTC