W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2001

Re: Is <pattern value="(.)+\.(gif|jpg|jpeg|bmp)"/> allowed?

From: Stanley Guan <Stanley.Guan@oracle.com>
Date: Wed, 17 Oct 2001 15:11:06 -0700
Message-ID: <3BCE01FA.EB72429D@oracle.com>
To: Ross Thompson <rthompson@contivo.com>
CC: xmlschema-dev@w3.org
Ross,

So, you're saying that
  given a <pattern value=".+\.(gif|jpg|jpeg|bmp)"/>
  and a string to be validated such as "foo.bmp"
The matcher should do something like this:
      Use the first pattern piece (i.e., ".+") in the matching
      and because it matches up the whole string "foo.bmp"
      and there are other pattern pieces remain unused. So,
      the first matching try was not successful.  The matcher
      will then try to back up one code point (i.e., move "p"
      back for further matching)

      "p" doesn't match the pattern piece (".").
      So, the matcher back up one more code point (i.e, move
     "m" back for further matching) and so on.

      Not until the matcher move ".bmp" back for further
      matching, will the matcher find a good match using
      the WHOLE pattern. Namely,
          "foo"  matched by ".+",
          "." matched by "\.", and
          "bmp" matched by "(gif|jpg|jpeg|bmp)".

Is that what a matcher supposed to do?

thx,

-Stanley


Ross Thompson wrote:

> Stanley Guan writes:
>  > Ross,
>  >
>  > Actually, it can be more succinctly represented as:
>  >   ".+"
>  > because "." will match the rest of the pattern string.
>
> Sorry, no.  ".+" does not match the same set of
> strings as the original expression.  It matches "foo.txt", for
> example, which ".+\.(gif|jpg|jpeg|bmp)" does not.
>
>  > My point is:
>  >   For the particles, they are subject to ambiguity constraints (
>  >   Unique Attribution ($3.8.6)).  For example, if an instance
>  >    element could match either an explicit particle and a wildcard
>  >    that model is in error.
>  >
>  > Do we have something similar to Unique Attribution for patterns?
>  > In my original posting, there was a typo.  The better specification
>  > is
>  >    <pattern value="[^\.]+\.(gif|jpg|jpeg|bmp)"/>
>  >
>  > which is not ambiguous.  And I don't think most of the schema
>  > processor will try to roll back and find a better matching as described
>  > in Kongyi's response!
>
> I missed the early part of this discussion, so I'm not sure what
> you're talking about.  However, you can't write a regular expression
> matcher that behaves correctly without considering the case where the
> first two characters in ".+\.gif" match less than the entire string.
>
> - Ross
>
> ---
> I have the heart of a little child.  I keep it in a jar on my desk.
>                                         -- unknown
Received on Wednesday, 17 October 2001 18:11:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:24 GMT