Re: [F&O] regular expressions: non-capturing groups from Svgdeveloper@aol.com on 2004-11-22 (public-qt-comments@w3.org from November 2004)

From: <Svgdeveloper@aol.com>
Date: Mon, 22 Nov 2004 15:01:22 EST
To: tobiasreif@pinkjuice.com
CC: public-qt-comments@w3.org
Message-ID: <1d9.30675860.2ed39f92@aol.com>
Tobi,

Do you really need to capture the whitespace character(s) and do an 
xsl:copy-of of it/them? Would a simple replace with a single space character before and 
after work?

If so the following looks, after a brief look at your use case, to be a 
possible solution:

regex="\s(((while|true|if|else|end)\s*)+)\s"

If that doesn't work perhaps you could explain in greater detail what it is 
you want to match.

Andrew Watt

In a message dated 11/22/2004 6:34:53 PM GMT Standard Time, 
tobiasreif@pinkjuice.com writes:

> Hi
> 
> A while ago I wrote some XSLT2 to add syntax markup to code listings
> (while transforming DocBook to XHTML) [1].
> 
> This basically worked well (enabling syntax highlighting), until I hit
> a road block: I noticed that when two keywords appear in direct
> succession (just one, shared delimiter), the second one won't be
> matched (and thus won't be marked up) [2]. My regex matches keywords
> by specifying any possible delimiters, often including whitespace. If
> there's just one space after the word, it will be consumed with the
> first match. Thus the second keyword can't be matched; the delimiter
> before it has been matched/consumed already. After having realized
> that the syntax markup respectively syntax highlighting doesn't really
> work (yet), I had to disable most of it, which is unfortunate.
> 
> Is there currently a way to specify regex groups which must be matched
> but whose match isn't captured (consumed)?
> 
> If the current draft does not make this possible, I'd ask you to
> consider the addition of the feature, eg to
> 
>  http://www.w3.org/TR/xpath-functions/#regex-syntax
> 
> It's called non-capturing groups in existing regex implementations I
> think. The syntax could look like this:
> 
>  (?:)
>  (?:notcaptured)
> 
> The group's content is prefixed with "?:".
> 
> Also see
> 
>  http://www.google.com/search?q=%22non-capturing%22+perl
>  http://www.google.com/search?q=%22non-capturing%22+java
> 
> etc, eg
> 
>  http://piglet.uccs.edu/~cs301/perl/re.htm
>  http://javaalmanac.com/egs/java.util.regex/NoGroup.html
> 
> I created a simple use case example
> 
>  http://www.pinkjuice.com/xslt2/non_capturing/
> 
> It might be that I miss a straight-forward way to achieve what I want.
> 
> Perhaps off-topic: I don't understand why in some places a space gets
> added by the example transformation (it doesn't happen with the actual
> syntax markup XSLTs). For example, in the input, there are two
> spaces between the two keywords in the first line, and one space
> inbetween in the second line. In the output there are three
> respectively two spaces. I'd appreciate any feedback on this, on- or
> off-list.
> 
> Thanks in advance for considering my potential request,
> Tobi
> 
> [1]
> 
> http://www.pinkjuice.com/howto/vimxml/about.xml#colophon
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/markup_syntax.xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/syntax_markup_shared
> .xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/markup_shell.xslt
> http://www.pinkjuice.com/howto/vimxml/xslt/tinydbk2xhtml/markup_ruby.xslt
> etc
> 
> [2] With the original XSLTs:
> 
> Test snippet added to ch04.xml:
> 
> def true
> def  true
> 
> Result in moresetup.xml:
> 
> <code class="keyword">def</code> true
> <code class="keyword">def</code>  <code class="keyword">true</code>
> 
>
Received on Monday, 22 November 2004 20:02:09 UTC