- From: Phil Archer <parcher@icra.org>
- Date: Tue, 25 Mar 2008 11:55:42 +0000
- To: Public POWDER <public-powderwg@w3.org>
N.B. This discussion refers to the Grouping Doc dated 20 March and
available at [1], currently only with member access. This is expected to
be published at the same URI within the next 24 hours or so.
Over on the member list it has been suggested that POWDER-S should
_only_ support IRI constraint by regular expression [2], although POWDER
would retain things like includehosts for ease of use.
The argument is initially attractive since we expect to see IRI sets
like this most commonly:
<iriset>
<includehosts>example.org</includehosts>
</iriset>
i.e. a single domain name given as the IRI set so we're describing
'everything on example.org. This can be transformed into POWDER-S thus:
<wdr:iriset>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Restriction>
<owl:onProperty rdf:resource="&wdr;includeregex" />
<owl:hasValue>example.org</owl:hasValue>
</owl:Restriction>
</owl:intersectionOf>
</wdr:iriset>
i.e. the reg ex is the same in both cases. Easy. Since we expect POWDER
to be the main transport mechanism and for POWDER-S to (almost) always
be derived programmatically, it doesn't matter how complex a POWDER-S
doc is.
But let's make this progressively more complex and see whether we can
convert _all_ possible POWDER IRI sets into POWDER-S versions with a
single reg ex.
Let's try multiple hosts.
<includehosts>example.org example.com</includehosts>
becomes
example.org|example.com
OK, let's cut to the chase. POWDER allows very sophisticated IRI set
definitions like this:
<iriset>
<includeschemes>http https</includeschemes>
<includehosts>example.org example.com</includehosts>
<includepathcontains>foo bar</includepathcontains>
<includepathcontains>red blue</includepathcontains>
</iriset>
Here we have either http or https. OK, in reg ex that's https? add in
the host and we get
^https?://(.*\.)?(example.com|example.org)
But those multiple path constraints are going to kill us. They say that
the path must contain either foo or bar AND either red or blue _in any
order_.
So the following all match:
http://example.com/red/bar
http://example.com/foo/blue
https://example.org/bluefoo/bar.html
And this doesn't:
http://example.org/foo/bar/
Now, I _could_ work out a Reg Ex that did all this, but I'm not sure I
could write some code that turned _any valid_ POWDER IRI set definition
into a Reg Ex.
And would anyone like to hazard a bit of code that rendered this as a
reg ex:
<iriset>
<includeschemes>http https</includeschemes>
<includehosts>example.org example.com</includehosts>
<includepathcontains>foo bar</includepathcontains>
<includepathcontains>red blue</includepathcontains>
<excludeexactqueries>name1=value1&name2=value2
</excludeexactqueries>
</iriset>
Bearing in mind that this means that if the query string contains name 1
= value 1 and name 2 = value 2 pairs in any order then they're to be
excluded?
Yikes!
So, I think it would be a lot easier to retain string-based matching in
POWDER-S.
Phil.
[1] http://www.w3.org/2007/powder/Group/powder-grouping/20080320.html
[2] http://lists.w3.org/Archives/Member/member-powderwg/2008Mar/0119.html
--
Phil Archer
Chief Technical Officer,
Family Online Safety Institute
w. http://www.fosi.org/people/philarcher/
Received on Tuesday, 25 March 2008 11:56:25 UTC