Re: XQuery/Reg Exes

On Thu, Sep 06, 2007 at 10:43:13AM +0100, Phil Archer wrote:
> 1. Is it fair to say: "The XQuery 1.0/XPath 2.0 Regular Expression 
> avoids some of the more advanced RE features that can require excessive 
> processing which is not appropriate or needed when establishing whether 
> a candidate resource is or it not an element of a Resource Set."

Not really.  If it is true it's certainly by coincidence, as
the term "resource set" is not part of XQuery, nor of XPath 2.0.

Regular expression matching, for input of unbounded length,
can probably have arbitrary algorithmic complexity.  Certainly
iy is exponential, even with only the Kleene closure operator,
and you can't get a simpler regular expression language than that.
The canonical example is
    aa*aa*b
given a string of 4,000 "a" characters followed by either a "b" or a
"c".

Where we cut features out it was mostly to try and help implementors
in terms of code complexity.  In practice, for matching against an
IRI, people aren't likely too want to write regular expresisons that
do not evaluate efficiently, but regular expressions can be a possible
denial of service attack for network applications.

See http://www.cs.rice.edu/~scrosby/hash/ for some brief notes.

> (i.e. all we're ever doing is matching an RE against a URI to find out 
> if there is or is not a match so we'd never need the Perl list function, 
> for example.)
Perl doesn't have a function called "list".  The XPath 2.0 F&O does
have the notion of a sequence, though.

> 2. What is the identifier for an XPath 2.0 RE? We're defining an RDF 
> property that has an RE as its range but if we just use XML Schema that 
> won't include the modifications in XPath. So perhaps it should be 
> http://www.w3.org/2005/xpath-functions#regex-syntax ?

It's not really a meaningful question.

The URI you give isn't defined -- that is, there is no such
anchor in the document provided.  There's no guarantee that the
XSL and XQuery Working Groups will not define such an anchor
in the future, but if they do, there's *certainly* no guarantee
that it will be what you want.  Please don't make up anchors in
other groups' documents :-)

http://www.w3.org/TR/2007/REC-xpath-functions-20070123/#regex-syntax
might be closer to what you want, as long as you realise that this
is also a pointer to the documentation for the regular expression
syntax, and someone might want to use RDF to say things about it,
as well (of course) as wanting to dereference it.

I'm not sure I've helped very much.  When I get back from the
current trip (I'm in Tokyo) we could talk on the 'phone if it
would help more.  Or email is fine.

Liam


-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

Received on Thursday, 6 September 2007 12:43:24 UTC