W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2003

Re: regular expression in XML Schema

From: <fe.sola@infomed.sld.cu>
Date: Thu, 25 Sep 2003 23:04:03 -0400
Message-ID: <1064545443.3f73aca3b1506@webmail.sld.cu>
To: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Cc: "xml-schema, mailing list" <xmlschema-dev@w3.org>, "weitz, edi" <edi@agharta.de>, "paap, onno" <onno.paap@ezzysurf.com>

Hello Hans,
>           All I want is to allow any program to easily detect where the identifier
> stops and
> the suffix starts.
>           For example you might see an identifier like
> FLUOR__HAA__P3712-05__ME00__40293u0dME14
>           that stops at 40283u0d. Then we get the suffix ME14, and in between some kind
> of
> weird character
>           that almost certainly is never used in an identifier (alternatives are
> welcome!)
> 
Ok, I got the idea, I guess I simplified your requirements

>           The question was whether the requirement of a middle dot in the expression
>           ([a-zA-Z][a-zA-Z0-9]*__)*[a-zA-Z0-9\.\-]+(&#x00B7;[a-zA-Z0-9\.\-]+)?
>           was properly expressed
> 

In the regex coach this expression matches the sufix ME14, so if you want your program 
to find all sufixes, it could work.

>           That's not the point, the difficulty is in the fact that it is not a simple
> dot
> (period),
>           but a Unicode #x00B7 middle dot (or any other allowable Unicode character,
> for
> that matter)
> 
>    * It works with plain text.
> 
>           The point is: If Unicode characters are allowable, how then do you enter them
> in a
> 
>           fill-in-the-blanks XML document? (See my reply to Jeni Tennison)
> 
I'm going to check that post again, and maybe this is a dummy idea, but if you are going 
to fill in the XML file by hand the use the Alt+# combination of the keyboard, if your 
program will generate the character, then use a function like Char(&#x00B7) (I think 
that's VBScript) and concatenate it to the selected string. 
Are you sure that middle dot is in the UTF-8 encoding? It might be in the UTF-16 and 
maybe that's why the Spy's processor can't recognize it. This might be a fatal error 
because the XML processor encounters an entity with an encoding that it is unable to 
process.
Anyway, hth
Lizet


-------------------------------------------------
Este mensaje fue enviado usando el servicio de correo en web de Infomed
http://webmail.sld.cu
Received on Thursday, 25 September 2003 23:15:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:39 GMT