Re: forbiddenCharacters data category - related to [ACTIOn-189]

Hi Yves,

2012/8/27 Yves Savourel <ysavourel@enlaso.com>

> Hi Felix,
>
> > - the escaping mechanism with \uHHHH would need to be
> > converted to numeric character references &#xHHHH;
>
> Using &#xHHHH; for all would be fine as it would work with all engines.
> I was using it to work around the ever-problematic issue of XML invalid
> characters.
>
>
> > - <> need to be converted to &lt;&gt;
>
> Do you mean that in the XML source I need to have
> forbiddenCharacters="&lt;&gt;" or forbiddenCharacters="&amp;lt;&amp;gt;" ?
>
> I didn't see anything special about < and > in the XML regex (besides that
> < literal must be &lt; when in an XML file, but that
>
>
> > - Both \u0000 and \u001F are forbidden characters in XML.
>
> U+0000 and U+001F are, not in a \uHHHH notation. That's why I wasn't using
> &#xHHHH;
> But I see your point.
>
> My question then is: how do you work with such character and XML regex?
> If you can't then that's one more reason to avoid using XML regex.
>

I would propose to avoid the regex completely then, since it seems that
then the proposal from Jirka at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0280.html
wouldn't be a solution too.

We had concerns about the regex before, and Michael said this data category
would fulfil his needs without the regex. So let's go forward with that.
Otherwise we will create regex that don't work with the content we want
them to work on.

Best,

Felix



>
>
> > We should either drop the regex at all, use XML Schema
> > regex (I say your counter arguments, so this is probably no option)
> > or define a clear specification about what to do when one
> > uses XML Schema regex, e.g. have a pointer to characters that are
> > disallowed in XML and XML Schema regex anyway.
>
> It seems to me that the third option would be the way. I'll try to post
> something asap.
>
> Thanks for the feedback.
> -yves
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Monday, 27 August 2012 11:56:48 UTC