Re: regular expression in XML Schema from Hans Teijgeler on 2003-09-25 (xmlschema-dev@w3.org from September 2003)

From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Date: Thu, 25 Sep 2003 23:06:21 +0200
To: fe.sola@infomed.sld.cu
Cc: "xml-schema, mailing list" <xmlschema-dev@w3.org>, "weitz, edi" <edi@agharta.de>, "paap, onno" <onno.paap@ezzysurf.com>
Message-id: <3F7358CD.BB66511B@quicknet.nl>
Dear Lizet,

Thank again you for your help!

I'll repeat your replies and react on that:

   * Are you trying to match this middle character  on your text as well?

          All I want is to allow any program to easily detect where the identifier stops and
the suffix starts.
          For example you might see an identifier like
FLUOR__HAA__P3712-05__ME00__40293u0d·ME14
          that stops at 40283u0d. Then we get the suffix ME14, and in between some kind of
weird character
          that almost certainly is never used in an identifier (alternatives are welcome!)

   * I should read the first email of the thread but what text are you trying to match?

          The question was whether the requirement of a middle dot in the expression
          ([a-zA-Z][a-zA-Z0-9]*__)*[a-zA-Z0-9\.\-]+(&#x00B7;[a-zA-Z0-9\.\-]+)?
          was properly expressed

   * escape the dot, ie <Pena\.Lizet> will match Pena.Lizet

          That's not the point, the difficulty is in the fact that it is not a simple dot
(period),
          but a Unicode #x00B7 middle dot (or any other allowable Unicode character, for
that matter)

   * It works with plain text.

          The point is: If Unicode characters are allowable, how then do you enter them in a

          fill-in-the-blanks XML document? (See my reply to Jeni Tennison)

Regards,
Hans

====================================

fe.sola@infomed.sld.cu wrote:

> Here are two links with  tutorials:
> http://publish.ez.no/article/articleprint/11/
> http://www.melonfire.com/community/columns/trog/article.php?id=2
>
> I remember about the dot character that it matches "any" character, you should scape it
> to match the dot. For asserting the beginnig and end of a word you could use the pair <>
> and for character sequences the pair []
> 3. I want to separate the first part of the identifier
> >      ([a-zA-Z][a-zA-Z0-9-]*__)*[a-zA-Z0-9.-]+  from the second (optional) part
> >      ([a-zA-Z0-9.-]+)? by means of a character that normally isn't used in
> >      system identifiers. So I chose the "middle dot" (#x00B7). I have three
> >      questions:
> Are you trying to match this middle character  on your text as well?
>
> >        1. Is the way it has now been introduced in the above RegEx correct?
> I should read the first email of the thread but what text are you trying to match?
>
> >        2. If I make an XML document based on an XML Schema (e.g. in Spy), how
> >           can I fill in such a middle dot as part of a Name?
>        escape the dot, ie <Pena\.Lizet> will match Pena.Lizet
> >           I have tried everything I could think of, but with no success
> >        3. In how far does the font type play a role? I found a middle dot in the
> >           Windows Character Map under Trebuchet MS (called U+00B7 Middle Dot),
> >           but Spy didn't accept that
> It works with plain text. For implementing the regexps in C# see:
> http://windows.oreilly.com/pub/a/oreilly/windows/news/csharp_0101.html
>
> hth,
> Lizet
>
> Mensaje citado por Hans Teijgeler <hans.teijgeler@quicknet.nl>:
>
> > Dear Experts,
> >
> > This is the continuing sage of the Regular Expressions. Last time I thought I
> > had the answer, but alas!
> >
> > Thanks to the good help of Edi Weitz I got a bit further, and we arrived at the
> > following RegEx for an identifier of the type Name:
> >
> >
> > ([a-zA-Z][a-zA-Z0-9]*__)*[a-zA-Z0-9\.\-]+(&#x00B7;[a-zA-Z0-9\.\-]+)?
> >
> > Everything works, the suffix at the end is now optional. BUT I still have some
> > problems/questions:
> >
> >   1. I still need some document in which the whole subject of the Regualar
> >      Expressions in XML Schema is explained. I read through the concept book of
> >      Eric van der Vlist
> >      (http://books.xmlschemata.org/relaxng/RngBookWxsRegExp.html ) but that book
> >      assumes that I know much more than I do. I need something that starts at
> >      zero, for dummies, with MANY examples. Any suggestions?
> >   2. What is a "combiningchar" and what an "extender"? It is being talked about
> >      in XML as being an allowable part of Namechar, but nowhere I can find what
> >      it really IS and what it is used for. You guys/gals must have read
> >      something that I haven't, so apparently you know it (if not, why didn't you
> >      ask or complain?)
> >   3. I want to separate the first part of the identifier
> >      ([a-zA-Z][a-zA-Z0-9-]*__)*[a-zA-Z0-9.-]+  from the second (optional) part
> >      ([a-zA-Z0-9.-]+)? by means of a character that normally isn't used in
> >      system identifiers. So I chose the "middle dot" (#x00B7). I have three
> >      questions:
> >        1. Is the way it has now been introduced in the above RegEx correct?
> >        2. If I make an XML document based on an XML Schema (e.g. in Spy), how
> >           can I fill in such a middle dot as part of a Name? I have tried
> >           everything I could think of, but with no success
> >        3. In how far does the font type play a role? I found a middle dot in the
> >           Windows Character Map under Trebuchet MS (called U+00B7 Middle Dot),
> >           but Spy didn't accept that
> >
> > Please enlighten me!
> >
> > Regards,
> > Hans
> >
>
> -------------------------------------------------
> Este mensaje fue enviado usando el servicio de correo en web de Infomed
> http://webmail.sld.cu
Received on Thursday, 25 September 2003 17:02:53 UTC