- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Wed, 12 May 2004 12:09:11 +0100
- To: Phil Dawes <pdawes@users.sourceforge.net>
- Cc: www-rdf-interest@w3.org
-------- Original Message -------- > From: Phil Dawes <mailto:pdawes@users.sourceforge.net> > Date: 11 May 2004 17:50 > > Hi Andy, Hi All, > > An area where I've found RDQL a little underspecified is in the syntax > for regex searches. RAP encloses the regex in quotes (e.g. "/regex/"), > I'm not sure what sesame does, and Jena regexes must be unquoted > (e.g. /regex/). > > Unfortunately the grammar in the spec doesn't specify this so I'm not > sure which is 'correct'. There isn't a comformance spec for RDQL; 'correct' relies on the implementers agreeing. In Jena it is the latter - "" makes it a string, not a regular expression. Regular expressions are not strings. The syntax follows Perl with the small addition that the "m" is optional for more characters (non-alphanumerics, and not " or ' as that creates confusion with strings). This is because tests might be on URIs having / in them so writing ?p =~ !^http://host/namespace#! can be done. > > > While I'm on the subject, the other unfortunate thing about RDQL > pattern searches is that most relational databases don't support > regexes and so for rdb backed stores the filtering has to be done > in-memory. PostgreSQL (operator ~) and MySQL (operator REGEXP) provide regular expressions. > Unfortunately the most common usage of this feature for me > is to do a global label search, which involves e.g. > > SELECT ?subj, ?label > WHERE (?subj,<rdfs:label>,?label) > AND ?label =~ '/phi/i' We plan to compile that to SQL in Jena. By looking at the regular expression, the common cases of case insensitive substring searching and prefixes of strings can quite simple be turned into appropriate SQL. This works through Jena's query handler abstraction, which includes latteing the store take over all or part of the query evaluation. A standard utilities to do the reg exp analysis wil probably be provided so that a query will normally have certain string operations marked as being thre simpler cases. That can then turn into an SQL LIKE or SQL regexps; other storage system might be able to do a good job of string prefix testing Andy > > This is obviously a bit of a problem to do in-memory for large > stores. Is there potentual for a more restricted form of pattern in > RDQL that could be done in-database? > e.g. something like SeRQLs 'LIKE' clause > > (for those unfamiliar with seRQL/sesame, this does a case-insensitive > match with a single wildcard character '*' which matches zero or more > characters - this nicely mapps to LIKE and % in SQL). > > Cheers, > > Phil
Received on Wednesday, 12 May 2004 07:09:55 UTC