W3C home > Mailing lists > Public > public-qt-comments@w3.org > November 2004

RE: [F&O] regular expressions: non-capturing groups

From: Michael Kay <mhk@mhk.me.uk>
Date: Tue, 23 Nov 2004 13:33:25 -0000
To: "'Tobias Reif'" <tobiasreif@pinkjuice.com>, <public-qt-comments@w3.org>
Message-ID: <E1CWanw-0006Xe-Oy@frink.w3.org>

I haven't had a chance to study the detailed use case, but just a procedural
observation. There comes a time when, if you want to finish a project, you
have to say "no new functionality". We think we have passed that point.
We're in the phase now where we are trying to eliminate errors,
inconsistencies, and bugs, and requests for new functionality are getting an
automatic response of "not in 2.0".

Michael Kay

> -----Original Message-----
> From: public-qt-comments-request@w3.org 
> [mailto:public-qt-comments-request@w3.org] On Behalf Of Tobias Reif
> Sent: 22 November 2004 18:09
> To: public-qt-comments@w3.org
> Subject: [F&O] regular expressions: non-capturing groups
> 
> 
> Hi
> 
> A while ago I wrote some XSLT2 to add syntax markup to code listings
> (while transforming DocBook to XHTML) [1].
> 
> This basically worked well (enabling syntax highlighting), until I hit
> a road block: I noticed that when two keywords appear in direct
> succession (just one, shared delimiter), the second one won't be
> matched (and thus won't be marked up) [2]. My regex matches keywords
> by specifying any possible delimiters, often including whitespace. If
> there's just one space after the word, it will be consumed with the
> first match. Thus the second keyword can't be matched; the delimiter
> before it has been matched/consumed already. After having realized
> that the syntax markup respectively syntax highlighting doesn't really
> work (yet), I had to disable most of it, which is unfortunate.
> 
> Is there currently a way to specify regex groups which must be matched
> but whose match isn't captured (consumed)?
> 
> If the current draft does not make this possible, I'd ask you to
> consider the addition of the feature, eg to
> 
>   http://www.w3.org/TR/xpath-functions/#regex-syntax
> 
> It's called non-capturing groups in existing regex implementations I
> think. The syntax could look like this:
> 
>   (?:)
>   (?:notcaptured)
> 
> The group's content is prefixed with "?:".
> 
> Also see
> 
>   http://www.google.com/search?q=%22non-capturing%22+perl
>   http://www.google.com/search?q=%22non-capturing%22+java
> 
> etc, eg
> 
>   http://piglet.uccs.edu/~cs301/perl/re.htm
>   http://javaalmanac.com/egs/java.util.regex/NoGroup.html
> 
> I created a simple use case example
> 
>   http://www.pinkjuice.com/xslt2/non_capturing/
> 
> It might be that I miss a straight-forward way to achieve what I want.
> 
> Perhaps off-topic: I don't understand why in some places a space gets
> added by the example transformation (it doesn't happen with the actual
> syntax markup XSLTs). For example, in the input, there are two
> spaces between the two keywords in the first line, and one space
> inbetween in the second line. In the output there are three
> respectively two spaces. I'd appreciate any feedback on this, on- or
> off-list.
> 
> Thanks in advance for considering my potential request,
> Tobi
> 
> [1]
> 
> http://www.pinkjuice.com/howto/vimxml/about.xml#colophon
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/marku
> p_syntax.xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/synta
> x_markup_shared.xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/marku
> p_shell.xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/marku
> p_ruby.xslt
> etc
> 
> [2] With the original XSLTs:
> 
> Test snippet added to ch04.xml:
> 
> def true
> def  true
> 
> Result in moresetup.xml:
> 
> <code class="keyword">def</code> true
> <code class="keyword">def</code>  <code class="keyword">true</code>
> 
> -- 
> to
>   bi
>     as
>   re
> if
> 
> 
Received on Tuesday, 23 November 2004 13:34:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:57:02 UTC